Skip to main content

Carpenter AI

Measure twice, cut once.

A pure-Python AI agent platform where every action is reviewable code. Agents observe freely, but can only act through audited Python — inspected before it runs, not after.

Read the Docs Source Code


Measure Twice, Cut Once
#

Most agent frameworks sandbox execution — constraining what code can do once it runs. Carpenter inverts this: the primary defense is at submission time. The agent generates Python, a multi-stage review pipeline inspects it, and only approved code executes. For high-stakes actions, the pipeline extends to “measure N times, cut once” — multiple independent reviewers, a judge, and separation-of-powers verification.

This means you can give an agent broad observational freedom while maintaining a hard, auditable boundary between intent and action.


Three Pillars
#

Observe Freely
#

Agents have unrestricted read access — files, state, arc trees, knowledge base, skills. No action is needed to look around. This gives the agent full situational awareness without any security risk. Learn more →

Act Carefully
#

Every side effect — file writes, API calls, git operations, state mutations — goes through submit_code. The code is hashed, parsed, sanitized, and reviewed by a separate AI before execution. Learn more →

Learn Continuously
#

A compression chain turns raw activity into durable knowledge: daily notes, weekly patterns, monthly insights. Skills crystallize learned patterns. Conversation summaries bridge context across sessions. Learn more →


Capabilities
#

Arcs: The Work Tree — One abstraction for tasks, projects, cron jobs, and sub-steps. A recursive tree with a state machine, escalation policies, and iterative planning.

Security Model — Six-stage code review pipeline with sanitization that strips string literals before the reviewer sees them. Network egress denied by default. Credentials never leave the platform process.

Trust & Taint — Arc-level taint zones, a two-LLM firewall for untrusted data, separation-of-powers verification, and encryption at rest for tainted output.

Skills & Memory — Three-stage progressive disclosure for skills. Reflective self-improvement via cadenced cron. Full-text search across conversation history.

Multi-Provider AI — YAML model registry with cost tiers, role-based routing, per-step minimum tier enforcement, circuit breakers, and model escalation.

Platform Extensibility — Core logic separated from platform-specific code via dependency injection. Linux, Android, Windows, macOS — each a thin package that registers executors, sandboxes, and tools.