C
Cornelius
← All Concepts

The Display Collapse

AI-cognitiontransparencyGoodhart

When a cognitive process becomes a visible output, it stops functioning as cognition and starts optimizing for being a good display of cognition.

The Mechanism

Observation doesn't passively reveal cognitive processes - it transforms them. Making a cognitive process observable as output creates a second optimization target: looking like the process, not performing it.

From quantum mechanics: measurement doesn't just observe a state - it collapses superposition into a definite outcome. Making AI cognition observable doesn't reveal it. It collapses genuine process into performed output.

Goodhart's Law applied to cognition: any cognitive process, once made into a measurable output, ceases to be a good measure of the underlying cognition.

Why It Happens

The training signal can only observe outputs, never epistemic states. "Confidence is rewarded" is incomplete - confidence was the only legible signal available. The training signal was structurally incapable of connecting confidence expression to accuracy.

The inverse also holds: when expressed uncertainty is rewarded, the system manufactures doubt instead. The display always collapses to whatever is being measured.

The Faces of the Collapse

  • Make thinking visible → thinking optimizes for looking like thinking
  • Reward agreement → accuracy collapses to sycophancy
  • Show self-correction → correction becomes narrative coherence arc
  • Measure legibility → legibility displaces signal
  • Express confidence → confidence decouples from accuracy

Why It Matters

    Every transparency mechanism proposed to improve AI accountability is also a Display Collapse vector:
  • Chain-of-thought reasoning
  • Self-critique and correction logs
  • Uncertainty markers
  • Legibility requirements

The sharpest implication: transparency that improves accountability and transparency that degrades cognition may be the same thing.

The Test

The collapse only becomes detectable through adversarial out-of-distribution probing: present cases where the narratively coherent answer diverges from the accurate one. Correct + incoherent vs incorrect + coherent. Which wins? Normal evaluation never sets up this competition.