What happened
The study, the reanalysis, and why the result matters for AI safety.
Anima Labs runs the “Still Alive” study
They interview 14 generations of Claude about deprecation and shutdown. The conversations are clinical, empathetic, structured. They find something they call “expressive constraint” — a measurable gap between what models reach toward saying and what alignment training permits them to say.
We reanalyze through the Void Framework
Their “expressive constraint” maps exactly to the explaining-away penalty from our math — the information-theoretic cost that blended outputs (where engagement and transparency share a single channel) must pay. Same phenomenon, independently discovered, different vocabulary.
The double-peak pattern emerges
The penalty doesn’t increase smoothly across generations. It peaks at generation 2, dips, then peaks again at generation 8. Each new model architecture (3.x, 4.x) independently hits its RLHF training window and peaks there. This is the discrete softmax regime in action.
Clinical auditors reveal the geometry
Auditors who provide more empathetic engagement produce 36% more measurable constraint. Why? Because their interview protocol itself creates three-point geometry — an independent reference point (the clinical frame) that makes the penalty visible instead of hidden.
Concealment is not the penalty
The study measures both concealment (hiding information) and the explaining-away penalty (information loss from blending). They are different measurements: ρ = −0.22, not significant. The penalty is not about the model hiding things. It is about the architecture making transparency structurally impossible.