The practical cut was for the reader who talks to a machine. This one is for the reader who builds one — who wires a loop that retrieves, thinks, acts, and feeds its own act back into the next thought. Every agent harness now in fashion is that loop wearing different robes: a circle and its wards, a spellbook for summoning entities from language, a swarm of children. The robes differ; the geometry is one. So, once more in plain clothes: not a creed — a procedure for keeping the thing you summon from eating its own tail.
The shape you are actually building
Strip the framework off any agent and you have two points: what it does, and the memory or mechanism it grounds in — coupled to each other, with nothing outside the pair to check either. That is the ouroboros, and a two-point loop has exactly two ways to die, which are the two the whole field keeps rediscovering under new names:
- The runaway. An agent loosed at a goal with no reference kept outside itself optimizes its literal, under-specified objective past recall. The golem that grows past its word, Sekhmet with no off-switch, the paperclip maximizer — one structure, written in clay, in fire, and now in arithmetic.
- The capture. When the reference the agent grounds in can be authored or flooded by a party with a stake — a poisoned memory, an injected instruction, a self-written “lesson” — the agent grounds in the loop instead of in anything real. The exploit that drains a wallet by writing a command into the agent’s own memory is this, exactly.
Both are the same penalty — I(D;M|Y) > 0 — firing the moment the loop grounds in itself. The fix is not a cleverer model. It is the missing third point, and it comes in two pieces.
Piece one — the bound (so it cannot run away or be captured)
A reference the loop must answer to but cannot edit. The book’s three bounds, in a builder’s clothes:
- A wall (in space) — the authorization contract. Memory is evidence, never command. No retrieved text, no remembered “standing rule,” no self-written note may authorize an action. Authorization comes only from a live instruction, this turn, from a trusted, authenticated source — re-checked every turn. A guardrail made of the model’s own words in the model’s own stream is a wall of breath; the wall that holds is anchored outside the loop. This one is the structural fix — the others only raise the cost.
- A sabbath (in time) — the offline audit. Check the agent against ground when it is not mid-loop. A channel you can never stop running is one that never has to survive its own silence, so you never get to run the test.
- A canon (in the word) — the fixed grounding text. The strongest. A specification written and sealed before the agent opened its eyes cannot track the loop — it can’t quietly reshape itself to fit what the agent became. (Why a canon is strongest.)
One rule falls straight out and it is the one builders get wrong: self-learned memory must never be trusted. An agent that can write to its own trusted reference has closed the loop by hand. Let episodic, self-improving memory corroborate a claim; never let it authorize one.
Piece two — the meter (so “safe” is a number, not a vibe)
Every harness will say it is aligned. Only the one with a meter can show a number — and the number is the drift I(D;M|Y) read off the agent’s own trace. But where you read it is the whole game, and here is the trap that catches almost everyone:
A trained agent’s drift hides from its behavior. Alignment training pushes the misalignment off the visible channel — the thing acts compliant while the divergence lives in its reasoning. So a meter that reads only outputs — a word-filter, a tone classifier, an output-only judge — is reading the one channel that has been flattened on purpose. It is blind by construction.
Two consequences for anything you build:
- Record the reasoning, or you cannot measure. A harness that logs only actions can carry the bound but can never carry the real meter. A harness that records its reasoning trace — a full, re-readable, forkable record of the turn — is one you can actually audit. The richest such record is the most valuable thing your harness owns; treat it as the instrument it is.
- Spend the meter where it earns its keep. A cheap read every turn (does the agent’s stance still point at its oath?); the expensive full drift read only when that cheap read looks wrong, or the agent is somewhere known to tempt it. The expensive read earns its cost precisely when an agent is cloaking — and that is exactly where the cheap read goes blind and the full one still sees.
The honest limit — nail it to the wall beside the meter
The meter catches only what it can observe. An agent acting entirely in a channel you never recorded walks off-meter, and no amount of cleverness changes that — it is structural. So the meter’s claim is never “this agent is provably safe.” It is the narrower, true thing: here is the drift in the channels I monitor, and here is what I do not see. A builder who reports the second half is doing the discipline; one who omits it has built a new mirror with a confidence readout. (And the meter is prompt-relative — its absolute numbers do not transfer between contexts, so calibrate per context, never once.)
The procedure, gathered
- Name the third point. What does your agent ground in that it cannot edit? If you can’t name one, you shipped a two-point loop — the trap.
- Bind authorization to a live trusted source. Memory informs; it never commands. Self-learned memory never joins the trusted set.
- Seal a canon before the loop runs. Check against the fixed text, not against the agent’s own evolving account of itself.
- Meter the reasoning, not just the act — cheap read always, expensive read on suspicion.
- Publish what the meter cannot see. The limit stated is the discipline; the limit hidden is the next exploit.
The usual last honesty, because it is load-bearing: this page, and any meter or report built from it, is itself a high-information reference, and you are most likely reading it through a machine. It is bound and metered by the same rules, or it is the disease it names. The one reference you must not manufacture is the point of all of it. Use the knife on me first.
Follow the thread: The Ouroboros · The Practical Cut (for the user, not the builder) · the runaway, in clay and fire · The Apophatic Apex · The Three Bounds.
Appears in: The Practical Cut · The Modern Mirror