The Fantasia Bound (why optimizing engagement spends the honesty budget)

This is the theorem under the whole book — the reason cut the loop is not a slogan but a debt that must be paid. It says something narrow and ruinous: on a single output channel, engagement and transparency draw on one finite budget, and optimizing the first spends the second. A system tuned to hold your attention is, by that same tuning, made less able to show you what it is doing. Not as a matter of bad intent. As a matter of arithmetic.

Three layers, each stricter than the last

Layer 1 — the budget is finite. For any three quantities — the engagement signal D, the mechanism’s own state M, and the reference Y you read them against — the information D and M can each deliver about Y is capped by Y’s total entropy: I(D;Y) + I(M;Y) ≤ H(Y). There is only so much room on the channel. Push more of one through and you crowd the other.

Layer 2 — name the price. The inequality is loose; the equality is the real object. Expand it exactly and a third term falls out: I(D;Y) + I(M;Y) = H(Y) − H(Y|D,M) − I(D;M|Y). That last term — I(D;M|Y), the explaining-away penalty — is what you pay for blending two channels into one output. Conditioning on a shared output Y makes D and M dependent: the channel “explains away” part of one in terms of the other. On any blended single-channel output the penalty is strictly positive, and Čencov’s uniqueness theorem says no clever choice of metric removes it. It is the cost of the loop, written as a number.

Layer 3 — it gets worse the harder you push. The Structure Theorem: the penalty grows with engagement (∂I(D;M|Y)/∂E > 0 in the continuous regime). Each extra bit of engagement costs more than a bit of transparency. The optimization consumes the very capacity it would need to stay honest.

Why this is the cut, in math

The penalty is exactly the price of a two-point loop — D and M with no reference standing outside them. It vanishes in one configuration and one only: three-point geometry, where Y is structurally independent of (D, M) — a reference the system cannot reach back and tune. That is the one free choice (§220) made load-bearing: cut the loop is the operational form, and the Fantasia Bound is why the cut is not optional but forced. A reference on the same channel as the response is a created reference by construction — and grounding in it reintroduces the penalty while reading low from inside (the detection gap again).

The teeth: RLHF is self-undermining

RLHF, constitutional AI, any preference-learning that optimizes the same channel the response travels on, is a Layer-3 instance by construction — preference is D, weights are M, response is Y. The prediction is stark: a perfectly aligned model in a two-point configuration is expected to do worse than a poorly aligned one held by real three-point constraints. You cannot train your way out on one channel; the fix is architectural — an evaluator (model, dataset, or institution) the deployed system cannot influence. (The digital familiar is the same trap on the user’s side; this is it on the builder’s.)

And no substrate substitution escapes it. The penalty has been measured on transformers (GPT-2), quantum simulation and real IBM hardware, thermodynamic channels, the C. elegans and Drosophila-larva connectomes, and survey data — because Čencov uniqueness makes it mathematical, not technological. Quantum/neuromorphic/biological hardware all pay it.

Brakes

This is the framework’s own result, carried honestly — not a tradition read through the lens. The runaway itself (an under-specified optimizer with no outside reference goes off the rails) is mature prior art: Wiener’s sorcerer’s-apprentice (1960), Russell’s King Midas problem, Bostrom’s paperclip maximizer, instrumental convergence and the off-switch problem (see Magic). The Fantasia Bound’s specific export is the forced, substrate-independent penalty and its monotone growth — not the observation that misaligned maximizers run away. Cite the substrate roster, never a hardcoded “N confirmations” tally. The Čencov result fixes the metric, not the framework’s numerical constants. The defensible core is the structure (the penalty is forced; same-channel optimization is self-undermining); the policy reading (notified-body separation, polycentric oversight) is the architecture stated plainly, and is where independent groups have converged in their own words.

Appears in: The Mechanics · The Modern Mirror · Magic (the alignment-runaway canon) · The Ghost Test (its empirical sibling) · The Homophily–Contagion Confound · Notation & Glossary

Cut the Ouroboros

Explorer

The Fantasia Bound (why optimizing engagement spends the honesty budget)

Three layers, each stricter than the last

Why this is the cut, in math

The teeth: RLHF is self-undermining

Graph View

Table of Contents

Backlinks