AgentForge: Terminal AI Coding-Agent Harness

Overview

AgentForge is an open-source terminal AI coding-agent harness built in Python. It is designed as a learning lab for how modern coding agents are assembled: the agent loop, typed tool registry, model provider abstraction, MCP integrations, safety approvals, skills, subagents, persistence, context management, and terminal UI all live in one inspectable codebase.

The project is published as agentforge-harness on PyPI and supports OpenRouter, OpenAI, Anthropic, and OpenAI-compatible providers.

Why I Built This

I built AgentForge to understand the real engineering surface behind coding agents instead of treating them as chatbot wrappers. The goal is to make the harness internals visible and extensible enough that someone can study, modify, and test the pieces that make agents reliable.

The project focuses on:

how an agent alternates between model calls, tool calls, observations, and final responses
how schema-first tools are registered, validated, approved, executed, and reported
how MCP servers expose external tools to the model
how safety belongs in the harness, not only in prompt text
how session snapshots, checkpoints, and event logs make long agent runs recoverable

Key Features

Hybrid agent loop that streams responses, collects tool calls, executes them, and continues until the model is done
Schema-first tool registry with built-in file, search, shell, memory, todo, and web tools
MCP integration with server-scoped tool naming to avoid collisions
Approval policies for mutating tools and shell operations
Prompt-injection boundary handling that wraps tool observations as untrusted data
Secret redaction and output hygiene before tool results enter model-visible context
Subagents for bounded specialist investigation
Progressive skill loading so task-specific guidance is loaded only when useful
Session snapshots, checkpoints, event logs, resume, restore, and export commands
Rich terminal UI for interactive work

Technical Architecture

AgentForge is organized around a long-lived session object. The session coordinates model calls, tool execution, approvals, context updates, persistence, and recovery surfaces so every agent run can be inspected instead of disappearing into a chat transcript.

Technical architecture

Core components, runtime flow, and durability boundaries.

This structure keeps the important harness decisions close together while still separating the runtime loop from durability, inspection, and recovery surfaces.

Engineering Decisions

Tool Observations

Tool output quality directly affects model recovery. AgentForge treats tool results as structured observations with summaries, artifacts, recovery hints, and safety metadata rather than dumping raw strings into the context whenever possible.

Safety Boundaries

AgentForge separates several safety layers:

approval checks for mutating operations
secret redaction for model-visible and UI-visible output
output hygiene for terminal control characters
prompt-injection boundaries around untrusted tool data

None of these layers replaces sandboxing, but together they make agent behavior easier to reason about.

Persistence

The harness stores session snapshots, event logs, and checkpoints so long agent runs can be inspected, resumed, and eventually replayed. This makes the project useful as a debugging and research artifact, not only as an interactive CLI.

Current Status

The project currently supports the core agent loop, provider abstraction, streaming responses, built-in tools, MCP tools, skills, subagents, persistence, checkpoints, plan/build modes, config hot reload, and package metadata for the agentforge CLI.

Planned work includes deterministic replay, browser-assisted local QA, local evals, richer cost tracking, and read-only swarm orchestration.

What This Shows

AgentForge demonstrates my ability to build AI infrastructure where the hard parts are not just model calls, but the surrounding system: tools, context, state, safety, user experience, and recoverability.

AgentForge

Timeline

Role

Team

Status

Technology Stack

Key Challenges

Key Learnings