Trust & Taint

Table of Contents

The Problem
#

An autonomous agent that fetches web content, processes webhooks, or calls external APIs will inevitably encounter attacker-controlled data. The question isn’t whether — it’s how that data flows through the system without contaminating trusted operations.

Taint Levels
#

Every arc carries a taint_level:

Level	Meaning	Restrictions
clean	Default. No exposure to untrusted data.	Cannot access untrusted data tools.
tainted	Has been exposed to untrusted data.	Restricted from modifying trusted state.
review	Bridge zone for evaluating tainted output.	Read tools + submit verdict only.

Taint does not propagate upward: a clean parent can orchestrate tainted children without becoming tainted itself. Clean arcs that attempt to use untrusted data tools receive HTTP 403.

Taint Isolation at Submit
#

When submit_code executes code that imports untrusted tools (e.g., act/web), the raw output — which may contain attacker-controlled content — is never returned to the chat agent’s context. Instead, the agent receives structured metadata: status, output_key, output_bytes, and exit_code. The actual output is stored in arc state for retrieval by review arcs only.

This is enforced fail-closed: if the taint check itself fails, output is withheld. The invariant is absolute — no AI sees tainted data unless it is in a designated review arc.

The Two-LLM Firewall
#

When tainted output needs to be trusted, the system creates a review arc (taint_level=review, agent_type=REVIEWER) as a sibling. Individual reviewer verdicts are advisory. A separate JUDGE arc renders the authoritative verdict.

On judge approval, the target arc’s taint_level is promoted to clean. The judge’s authority is scoped to the target arc only — parent arcs are not automatically promoted.

Review arcs are enforced at creation time: every tainted arc must have at least one reviewer and a judge.

Separation of Powers
#

After a coding-change arc completes, the platform auto-creates verification sibling arcs:

A correctness check — does it work?
A quality check — is it well-structured? (for platform/tool code)
A judge — synthesizes results into an authoritative verdict
A documentation arc — updates docs if needed

Each verification arc carries arc_role="verifier" and a verification_target_id pointing to the implementation arc. Self-verification is blocked at creation time: the agent that wrote the code cannot be the agent that judges it.

This is “measure N times, cut once” applied to the full development cycle — not just code review, but correctness and quality verification by independent agents.

Encryption at Rest
#

Untrusted output can be Fernet-encrypted at rest. Keys are generated per reviewer-target pair and stored in review_keys. Only designated reviewers (or anyone after trust promotion) can decrypt.

Trust Audit Log
#

A dedicated append-only table records all boundary decisions: taint assignments, access denials, review verdicts, trust promotions, decryption grants. Separate from arc history to ensure the trust record is always complete and tamper-evident.

The Problem#

Taint Levels#

Taint Isolation at Submit#

The Two-LLM Firewall#

Separation of Powers#

Encryption at Rest#

Trust Audit Log#