The Problem#
An autonomous agent that fetches web content, processes webhooks, or calls external APIs will inevitably encounter attacker-controlled data. The question isn’t whether — it’s how that data flows through the system without contaminating trusted operations.
Taint Levels#
Every arc carries a taint_level:
| Level | Meaning | Restrictions |
|---|---|---|
| clean | Default. No exposure to untrusted data. | Cannot access untrusted data tools. |
| tainted | Has been exposed to untrusted data. | Restricted from modifying trusted state. |
| review | Bridge zone for evaluating tainted output. | Read tools + submit verdict only. |
Taint does not propagate upward: a clean parent can orchestrate tainted children without becoming tainted itself. Clean arcs that attempt to use untrusted data tools receive HTTP 403.
Taint Isolation at Submit#
When submit_code executes code that imports untrusted tools (e.g., act/web), the raw output — which may contain attacker-controlled content — is never returned to the chat agent’s context. Instead, the agent receives structured metadata: status, output_key, output_bytes, and exit_code. The actual output is stored in arc state for retrieval by review arcs only.
This is enforced fail-closed: if the taint check itself fails, output is withheld. The invariant is absolute — no AI sees tainted data unless it is in a designated review arc.
The Two-LLM Firewall#
When tainted output needs to be trusted, the system creates a review arc (taint_level=review, agent_type=REVIEWER) as a sibling. Individual reviewer verdicts are advisory. A separate JUDGE arc renders the authoritative verdict.
On judge approval, the target arc’s taint_level is promoted to clean. The judge’s authority is scoped to the target arc only — parent arcs are not automatically promoted.
Review arcs are enforced at creation time: every tainted arc must have at least one reviewer and a judge.
Separation of Powers#
After a coding-change arc completes, the platform auto-creates verification sibling arcs:
- A correctness check — does it work?
- A quality check — is it well-structured? (for platform/tool code)
- A judge — synthesizes results into an authoritative verdict
- A documentation arc — updates docs if needed
Each verification arc carries arc_role="verifier" and a verification_target_id pointing to the implementation arc. Self-verification is blocked at creation time: the agent that wrote the code cannot be the agent that judges it.
This is “measure N times, cut once” applied to the full development cycle — not just code review, but correctness and quality verification by independent agents.
Encryption at Rest#
Untrusted output can be Fernet-encrypted at rest. Keys are generated per reviewer-target pair and stored in review_keys. Only designated reviewers (or anyone after trust promotion) can decrypt.
Trust Audit Log#
A dedicated append-only table records all boundary decisions: taint assignments, access denials, review verdicts, trust promotions, decryption grants. Separate from arc history to ensure the trust record is always complete and tamper-evident.