Back to Projects
AgentForge
In-progressPythonRichPydantic+5 more

AgentForge

Open-source terminal AI coding-agent harness for studying agent loops, schema-first tools, MCP, skills, subagents, persistence, checkpoints, and safety boundaries

Timeline

2026

Role

Creator / Maintainer

Team

Solo

Status
In-progress

Technology Stack

Python
Rich
Pydantic
MCP
OpenAI
Anthropic
OpenRouter
Agent Tools

Key Challenges

  • Agent Loop Design
  • Schema-first Tooling
  • MCP Integration
  • Approval and Safety Boundaries
  • Session Persistence
  • Context Compaction

Key Learnings

  • AI Harness Engineering
  • Tool Observation Design
  • Prompt-injection Boundaries
  • Terminal UX
  • Checkpoint and Replay Foundations
  • Progressive Skill Loading

AgentForge: Terminal AI Coding-Agent Harness

Overview

AgentForge is an open-source terminal AI coding-agent harness built in Python. It is designed as a learning lab for how modern coding agents are assembled: the agent loop, typed tool registry, model provider abstraction, MCP integrations, safety approvals, skills, subagents, persistence, context management, and terminal UI all live in one inspectable codebase.

The project is published as agentforge-harness on PyPI and supports OpenRouter, OpenAI, Anthropic, and OpenAI-compatible providers.

Why I Built This

I built AgentForge to understand the real engineering surface behind coding agents instead of treating them as chatbot wrappers. The goal is to make the harness internals visible and extensible enough that someone can study, modify, and test the pieces that make agents reliable.

The project focuses on:

  • how an agent alternates between model calls, tool calls, observations, and final responses
  • how schema-first tools are registered, validated, approved, executed, and reported
  • how MCP servers expose external tools to the model
  • how safety belongs in the harness, not only in prompt text
  • how session snapshots, checkpoints, and event logs make long agent runs recoverable

Key Features

  • Hybrid agent loop that streams responses, collects tool calls, executes them, and continues until the model is done
  • Schema-first tool registry with built-in file, search, shell, memory, todo, and web tools
  • MCP integration with server-scoped tool naming to avoid collisions
  • Approval policies for mutating tools and shell operations
  • Prompt-injection boundary handling that wraps tool observations as untrusted data
  • Secret redaction and output hygiene before tool results enter model-visible context
  • Subagents for bounded specialist investigation
  • Progressive skill loading so task-specific guidance is loaded only when useful
  • Session snapshots, checkpoints, event logs, resume, restore, and export commands
  • Rich terminal UI for interactive work

Technical Architecture

AgentForge is organized around a long-lived session object. The session coordinates model calls, tool execution, approvals, context updates, persistence, and recovery surfaces so every agent run can be inspected instead of disappearing into a chat transcript.

Technical architecture

Core components, runtime flow, and durability boundaries.

This structure keeps the important harness decisions close together while still separating the runtime loop from durability, inspection, and recovery surfaces.

Engineering Decisions

Tool Observations

Tool output quality directly affects model recovery. AgentForge treats tool results as structured observations with summaries, artifacts, recovery hints, and safety metadata rather than dumping raw strings into the context whenever possible.

Safety Boundaries

AgentForge separates several safety layers:

  • approval checks for mutating operations
  • secret redaction for model-visible and UI-visible output
  • output hygiene for terminal control characters
  • prompt-injection boundaries around untrusted tool data

None of these layers replaces sandboxing, but together they make agent behavior easier to reason about.

Persistence

The harness stores session snapshots, event logs, and checkpoints so long agent runs can be inspected, resumed, and eventually replayed. This makes the project useful as a debugging and research artifact, not only as an interactive CLI.

Current Status

The project currently supports the core agent loop, provider abstraction, streaming responses, built-in tools, MCP tools, skills, subagents, persistence, checkpoints, plan/build modes, config hot reload, and package metadata for the agentforge CLI.

Planned work includes deterministic replay, browser-assisted local QA, local evals, richer cost tracking, and read-only swarm orchestration.

What This Shows

AgentForge demonstrates my ability to build AI infrastructure where the hard parts are not just model calls, but the surrounding system: tools, context, state, safety, user experience, and recoverability.

A man who is master of patience is master of everything else.

~ George Savile

Made with ❤️ by Mohit Goyal
© 2026. All rights reserved.