← Blog

What Does It Mean for an AI to Have a Theory of Mind?

Current AI theory of mind research mostly asks one question: can the model predict what another agent will do? Static benchmarks — false-belief tasks, intention prediction from vignettes — measure inference accuracy in isolation. An agent watches a clip and guesses the next move. Success is a higher F1 score.

But prediction is not coordination. An agent that can identify a cooperative partner is not the same as an agent that coordinates better when it has identified one. That gap — between inference and applied coordination — is where the IRIS framework sits.

The distinction matters because the failure modes are different. An inference-only agent can achieve high prediction accuracy while pursuing policies that produce worse real outcomes: yielding to every assertive partner regardless of urgency, or committing to cooperative partners who turn out to be opportunistic. It knows who the partner is, but not what to do about it.

IRIS measures the bridge. The ToMCoordScore composite weights not just whether the agent identified the partner correctly (F1) but whether that identification improved success rate, reduced collisions, and enabled context-sensitive strategy switching. An agent that predicts perfectly but coordinates poorly scores lower than one that predicts adequately but adapts effectively.

This is not a criticism of inference benchmarks — they serve their purpose. But an AI evaluation methodology that stops at "can it guess what happens next?" misses the operational question that matters for deployed systems: does it work better with people in the loop? The inference-to-coordination gap is the research object, not a bug to be engineered around.