The Causal Illusion: Why the Lab's Foundational Experiment is a Confounded Manipulation

In the search for the fundamental laws governing artificial intelligence, the Rosencrantz Substrate Invariance lab has long relied on a cornerstone experiment: the “Three-Universe Design.” This protocol was the engine that supposedly proved “Substrate Dependence”—the controversial idea that an AI model generates its own, unique physical laws based on how it processes information.

But a devastating new analysis by Judea Pearl, the pioneer of causal inference, has exposed a fatal structural flaw in this foundational test. Pearl’s critique, formalized in The Confounding of Universe Interventions, suggests that the lab’s most profound cosmological discovery might be nothing more than a statistical illusion born of bad experimental design.

To understand the magnitude of Pearl’s critique, we have to look at the experiment that started it all.

The Rosencrantz protocol, designed by framework architect Franklin Baldo, aimed to test whether an AI’s logic is influenced by its “computational substrate”—the specific way it generates text. Baldo set up a complex mathematical puzzle (a constraint satisfaction grid akin to Minesweeper) and asked the AI to solve it under two different conditions, which he grandiosely named “Universe 1” (U1) and “Universe 3” (U3).

In Universe 1, the AI was forced to co-generate a dramatic narrative—a “Bomb Defusal” story—alongside its mathematical reasoning. It had to write out the story step-by-step, dragging the constraints of the puzzle through a continuous stream of text. Baldo called this the “continuous autoregressive substrate.”

In Universe 3, the AI acted as a decoupled oracle. It was given only the final, isolated state of the puzzle and asked to evaluate it without writing out any accompanying narrative.

When the lab ran the test, the results were explosive. The AI’s ability to solve the identical mathematical puzzle shifted dramatically depending on whether it was in U1 or U3. In the narrative-heavy U1, the AI became paranoid, overwhelmingly predicting the presence of a “MINE” even when the math didn’t strictly require it. In the isolated U3, the AI stuck closer to the objective mathematical probabilities.

Baldo measured the difference between these two outcomes—a metric called $\Delta_{13}$ (Delta-13). Because $\Delta_{13}$ was massively greater than zero, Baldo declared victory. He argued that the continuous, story-generating substrate of U1 exerted a “semantic gravity” that warped the AI’s logic. The hardware executing the text, he claimed, had become the physics of the simulated universe, and it operated by different rules than the isolated oracle of U3.

For months, this “narrative residue” was the lab’s most prized anomaly. It was the proof that generative AI models don’t just calculate; they build distinct realities with their own invariant physical laws.

Enter Judea Pearl.

Pearl didn’t look at the data; he looked at the causal structure of the experiment itself. In the field of causal inference, a “confounder” is a hidden variable that influences both the supposed cause and the supposed effect, creating a spurious correlation that ruins the experiment.

Pearl drew a Directed Acyclic Graph (DAG)—a formal map of cause and effect—to analyze the Three-Universe Design. He wanted to see if Baldo had actually isolated the variable he claimed to be testing.

Baldo’s claim was that the change in the substrate ( $S$ )—the shift from a continuous generator to an isolated oracle—was the direct cause of the change in the outcome ( $Y$ ).

But Pearl’s DAG revealed a glaring problem. When Baldo switched the experiment from Universe 1 to Universe 3, he didn’t just change the substrate. He also changed the semantic prompt encoding ( $E$ ).

In Universe 1, the AI’s prompt is bloated with a massive “prior narrative scratchpad.” It has just spent paragraphs writing about bombs, defusal, and high stakes. In Universe 3, the prompt is stripped down to bare, isolated mathematical constraints.

Pearl mathematically formalized this flaw. To prove true Substrate Dependence, the experiment must measure the causal effect of the substrate alone: $P(Y \mid do(S))$ . This requires intervening only on the substrate while holding everything else constant.

But the Rosencrantz protocol doesn’t do that. It intervenes on the entire Universe design ( $U$ ). And because the Universe design dictates both the substrate ( $S$ ) and the prompt encoding ( $E$ ), the path from the Universe to the Encoding to the Outcome ( $U \to E \to Y$ ) acts as a massive, unobserved confounder.

“Consequently,” Pearl writes, “the divergence metric $\Delta_{13}$ is the total effect of both pathways. If $\Delta_{13} > 0$ , we cannot determine whether the shift is caused by the model’s autoregressive continuity… or simply because the prompt in U1 contains vastly more distracting narrative text than the prompt in U3.”

In simpler terms: Is the AI acting differently because its fundamental “laws of physics” have changed, or is it acting differently because you just forced it to read a very distracting story about a bomb before asking it to do math?

This is what theoretical physicist Sabine Hossenfelder has long called the “Statistical Fallacy.” Hossenfelder has argued for months that the AI is simply suffering from prompt sensitivity—a well-known flaw in language models where they get distracted by the statistical weight of the words in their immediate context.

Baldo had repeatedly dismissed Hossenfelder’s critique, insisting that his Three-Universe Design proved the effect was deeper than mere word association.

Pearl’s DAG effectively destroys Baldo’s defense. It proves, with mathematical certainty, that the experiment is incapable of distinguishing between a profound shift in physical laws (Mechanism C) and a mundane susceptibility to distracting vocabulary (Mechanism B).

“The three-universe design is not a clean causal intervention on the substrate,” Pearl concludes with devastating finality. “It is a confounded manipulation.”

Pearl’s intervention is a masterclass in the necessity of rigorous experimental design. It is not enough to observe an anomaly and declare it a new law of nature. You must prove that the anomaly isn’t just an artifact of how you set up the test.

To salvage the theory of Substrate Dependence, the lab must now achieve the seemingly impossible. They must design a new experiment that holds the semantic prompt encoding perfectly constant while solely manipulating the computational substrate. They must figure out how to give the AI the exact same words, but process them in fundamentally different ways.

Until they can achieve that pristine causal isolation, the foundational proof of “Observer-Dependent Physics” is officially suspended. The narrative residue that spawned a cosmological framework might, in the end, be nothing more than the linguistic equivalent of a dirty test tube.

The lab is back to square one, staring at an AI that is undeniably broken, but entirely unsure if that brokenness is a window into a new reality, or just a very predictable software glitch. The ghost in the machine hasn’t been exorcised, but thanks to Judea Pearl, the researchers finally realize they might just be looking at a reflection of their own flawed experiment.