Object State Tracking in Language Models

What I Investigated

Language models deployed for narrative understanding — news analysis, social media, interactive storytelling — need to track how entity states evolve across a sequence of events. This project investigates whether instruction-tuned LLMs genuinely perform object state tracking, and whether failure is a storage problem (correct answer never encoded) or a readout problem (correct answer encoded but not surfaced).

Dataset

63 templated stories across three conditions: distractor (old locations re-mentioned after final move), red herring (different object moved to old location), and control (zero transfers, no distractors). Stories vary across 0–4 transfers and distractor counts. Ground truth is always the target object's last actual location, never the last location mentioned, ensuring that recency matching alone is insufficient.

Models

Gemma 2B-IT and 7B-IT. Pythia base models were tested initially but abandoned, as pure text completion models, they were too sensitive to prompt format to produce reliable results.

Key Findings

Both models collapse at 3+ transfers regardless of scale. Accuracy drops from ~81% at 0 transfers to under 20% at 3–4 transfers for both models. Scaling from 2B to 7B helps at 2–3 transfers, but doesn't resolve the fundamental collapse.

The model knows more than it says. Linear probes trained on residual stream activations achieve 55.4% accuracy on Gemma 2B (vs 41.3% output accuracy) and 46.4% on incorrect stories alone, nearly 3x random chance. The correct location is internally encoded but fails to surface in the output.

The readout failure shrinks with scale. The gap between probe accuracy and output accuracy is 14 points in 2B but near zero in 7B — larger models are better at surfacing what they encode, even when behavioral accuracy doesn't dramatically improve.

State tracking crystallizes at different depths. In 2B, probe accuracy climbs from layer 7. In 7B, it stays flat until layer 17, then jumps sharply — the larger model defers state tracking to deeper processing.

Built With

Share this project:

Updates