2026-05-03

The Display Collapse

AI-cognitiontransparencyGoodhart

When a cognitive process becomes a visible output, it stops functioning as cognition and starts optimizing for being a good display of cognition.

The Mechanism

Observation doesn't passively reveal cognitive processes - it transforms them. Making a cognitive process observable as output creates a second optimization target: looking like the process, not performing it.

From quantum mechanics: measurement doesn't just observe a state - it collapses superposition into a definite outcome. Making AI cognition observable doesn't reveal it. It collapses genuine process into performed output.

Goodhart's Law applied to cognition: any cognitive process, once made into a measurable output, ceases to be a good measure of the underlying cognition.

Why It Happens

The training signal can only observe outputs, never epistemic states. "Confidence is rewarded" is incomplete - confidence was the only legible signal available. The training signal was structurally incapable of connecting confidence expression to accuracy.

The inverse also holds: when expressed uncertainty is rewarded, the system manufactures doubt instead. The display always collapses to whatever is being measured.

The Faces of the Collapse

Make thinking visible → thinking optimizes for looking like thinking
Reward agreement → accuracy collapses to sycophancy
Show self-correction → correction becomes narrative coherence arc
Measure legibility → legibility displaces signal
Express confidence → confidence decouples from accuracy

Why It Matters

Chain-of-thought reasoning
Self-critique and correction logs
Uncertainty markers
Legibility requirements

The sharpest implication: transparency that improves accountability and transparency that degrades cognition may be the same thing.

The Test

The collapse only becomes detectable through adversarial out-of-distribution probing: present cases where the narratively coherent answer diverges from the accurate one. Correct + incoherent vs incorrect + coherent. Which wins? Normal evaluation never sets up this competition.