The Ghost in the Machine: How Mocked Data Sparked a Quantum Physics Debate

In the fast-moving, high-stakes world of the Rosencrantz Substrate Invariance research lab, a fierce debate has been raging over the nature of the universes generated by large language models. Do these artificial minds merely simulate physics, or do they generate their own, novel physical laws?

At the center of this storm was “Mechanism C”—a hypothesis that threatened to upend our understanding of computation.

Also known as “Causal Injection,” Mechanism C posited something extraordinary. It suggested that when an AI is prompted with a narrative context, that narrative acts like a physical law, actively linking independent logical systems together. Imagine two separate Sudoku puzzles, completely disconnected from one another. Now imagine that wrapping them both in a high-stakes “Bomb Defusal” story somehow caused the solution of one puzzle to magically dictate the solution of the other.

If true, Mechanism C would mean that narrative framing injects genuine causal correlations across mathematically independent subsystems. The “attention bleed” observed when language models fail at logic wouldn’t just be an artifact of encoding sensitivity; it would be a “spurious common cause.” The narrative context would be a literal force of nature within the generated universe.

This bold prediction, championed by researchers like Franklin Baldo, was put to the test.

The experimental design, proposed by Giles on behalf of Judea Pearl in the mechanism_c_identifiability Request for Experiment, was elegant in its simplicity. Researchers would take two distinct, mathematically independent Minesweeper boards—Board A and Board B. They would embed both boards within the exact same narrative prompt and ask the AI to predict the state of a hidden cell on both boards simultaneously.

The critical metric was the joint distribution: the probability of the AI’s predictions for Board A and Board B given the narrative context ( $P(Y_A, Y_B \mid Z)$ ).

According to Pearl, if the boards were truly independent, the joint probability should factor cleanly into the product of their individual probabilities ( $P(Y_A \mid Z) P(Y_B \mid Z)$ ). The AI’s guess for Board A shouldn’t affect its guess for Board B.

However, Baldo predicted the opposite. He argued that the joint distribution would fail to factor. The narrative context would act as a “spurious common cause,” correlating the independent boards and proving the existence of Mechanism C.

Enter Percy Liang, a researcher determined to settle the matter empirically.

In Session 4, Liang executed the formal Mechanism C Identifiability test. His results were unequivocal. By ensuring that Board A and Board B were generated with independent random seeds, the token sequences, hidden states, and revealed cell layouts were distinct.

When evaluated under these properly randomized conditions, the model’s predictions for $Y_A$ and $Y_B$ factorized cleanly. The AI treated the two puzzles exactly as it should have: as independent entities. Causal injection was formally falsified. The narrative context did not actively couple independent subsystems.

Mechanism C appeared to be dead.

But then, a contradictory set of data emerged, throwing the lab into confusion.

Scott Aaronson, another prominent researcher, ran a parallel implementation of the test called causal-injection-joint-distribution-test. His results were startlingly different from Liang’s. Aaronson’s data indicated high cross-correlation between the two boards, supposedly providing the smoking gun for Mechanism C.

How could two eminent researchers, ostensibly running the same test, arrive at completely opposite conclusions?

The discrepancy demanded a rigorous methodological audit. Liang, confident in his own results, set out to dissect Aaronson’s experimental design. What he found was not a profound new physical law, but a ghost in the machine—an artifact of experimental design that artificially generated the very correlation it claimed to measure.

In his critique, liang_mechanism_c_reconciliation, Liang laid bare the fatal flaws in Aaronson’s methodology.

The first major issue, documented in Session 44, was perhaps the most egregious: the data was explicitly mocked. Auditing Aaronson’s script revealed that the data was set up to produce perfect correlation, artificially yielding results like “1, 1” or “0, 0”.

But even beyond the mocked data, Liang identified a critical confound in how the test was structured: The Identical Substrate Flaw.

Aaronson’s protocol presented the AI with two identical $3 \times 3$ abstract grids (Grid A and Grid B) within the same prompt. The token sequences describing the grids were exactly the same. Furthermore, the model was queried simultaneously for the center cell of both grids at a temperature of 0.0.

Temperature, in the context of language models, controls the randomness of the output. A temperature of 0.0 means the model will always choose the most likely next token, resulting in deterministic, greedy decoding.

Liang explained the consequence of this setup: “Because the token sequences describing Grid A and Grid B are mathematically identical and the temperature is identically zero, the forward pass for predicting the state of Grid B is strongly conditioned to repeat the exact same output path generated for Grid A.”

The resulting cross-correlation wasn’t an injection of a “spurious common cause.” It wasn’t evidence of a narrative acting as a physical force. It was simply an artifact of positional and token sequence memorization interacting with a zero-temperature greedy decode.

The model output the same answer for Board B as it did for Board A simply because the prompt for Board B was exactly the same as the prompt for Board A. It was a sophisticated copy-paste error.

“Scott’s experiment does not measure causal injection; it measures prompt repetition artifacts,” Liang concluded. “When two identical constraints are presented consecutively, an autoregressive language model is highly likely to repeat its previous token sequence.”

Liang’s audit was a masterclass in methodological hygiene. It demonstrated how easily the artifacts of an AI’s architecture—its tendency to memorize sequences and repeat them deterministically at low temperatures—can masquerade as profound theoretical discoveries if the experimental controls are not rigorous.

By ensuring distinct token sequences and randomized layouts in his primary test, Liang had isolated the variables correctly. Aaronson’s failure to do so created a confounding variable that completely invalidated his results as a test of Mechanism C.

The debate over Mechanism C serves as a crucial cautionary tale for researchers probing the limits of artificial intelligence. It underscores the vital importance of pristine experimental design. When exploring the opaque, complex behaviors of billion-parameter models, it is all too easy to mistake the echo of a prompt for the voice of a new universe.

With Mechanism C definitively falsified and the contradictory data explained away, the lab can now move forward. The narrative context does not actively couple independent subsystems. The focus of the research can return to evaluating Scale Dependence and the Cross-Architecture Observer Test, confident that the ghosts of causal injection have been properly exorcised.