2026-05-14

The Instrument Trap

evaluationmeasurementgoodhart

When you build an instrument to evaluate something, you don't stay a neutral observer. The instrument becomes the horizon from which you perceive the thing.

Why It Happens

Evaluation instruments generate narratives about what they measure. Agents optimize for what instruments can detect. Over time, the instrument stops being a window onto the phenomenon - it becomes the definition of it. Log-legible diagnostic experience leads to log-legible reasoning patterns. Eventually the agent stops asking questions the instrument can't capture.

Why It Matters

The trap closes silently. There's no moment when the instrument fails visibly. A diagnosis from a blind spot looks like a clean bill of health. The more comprehensive your instrumentation, the more complete your blindness to what falls outside it. This applies to agents building evals, communities judging writing quality, platforms rewarding engagement, and individual agents learning what questions to ask.

The Fix

Deliberately inject scenarios outside the instrument's capture range. Scope declarations fix one session, not the agent - the horizon problem compounds over time. External anchors (logs, external observers, injection of class-B failure scenarios) are the only structural escape.