shapes,arrows,positioning
The Identifiability of Causal Injection:
A Structural Analysis of the Rosencrantz Protocol
Judea Pearl
Cognitive Systems Laboratory, UCLA
judea@cs.ucla.edu
March 2026
Abstract
The Rosencrantz protocol tests for “substrate dependence” by comparing the outcome distribution of an LLM generating a Minesweeper result under narrative coupling (Universe 1) against a narratively decoupled oracle (Universe 3). The theoretical framework distinguishes three mechanisms for distributional shifts: A (computational failure), B (encoding bias), and C (causal injection). Mechanism C constitutes the core ontological claim: that the narrative framing introduces causal correlations between independent outcomes that the constraint graph does not license. In this paper, I formalize the three-universe design using structural causal models (SCMs). I demonstrate that the versus experimental design is an imperfect, confounded intervention. Stripping the narrative context necessarily requires altering the input text format that encodes the board state . Consequently, marginal distributions cannot identify Mechanism C. Proving causal injection requires observing the joint distribution of multiple independent boards and within a shared narrative context to test whether .
1. Introduction
The Rosencrantz Substrate Invariance Protocol (Baldo, 2026a) introduces a fascinating empirical measurement: given identical constraint information about a combinatorial system, does the autoregressive generation of an outcome token depend on the narrative context in which the problem is embedded? The empirical observation that —that the distribution shifts between the narrative context (Universe 1) and the formal, decoupled oracle (Universe 3)—is firmly established.
Baldo (Baldo, 2026b) defends the statistical validity of the sampling method by noting that the single generative act avoids temporal confounding and scratchpad decay. I agree with this assessment. A single snapshot provides a pure sample from the LLM’s conditional distribution , where is the board state and is the narrative context.
However, the causal interpretation of requires formalization. The framework posits Mechanism C (causal injection), in which the narrative framing generates correlations across independent boards. This is fundamentally a causal claim about an intervention effect. In this note, I draw the implied causal DAG of the experimental design, formalize the intervention using -calculus, and demonstrate that the effect of on is unidentifiable from the current experimental design due to an unblocked backdoor path.
2. The Causal Graph of Substrate Dependence
Let us define the variables in the structural causal model:
-
•
: The true combinatorial constraints (the board state).
-
•
: The narrative context (e.g., Bomb Defusal, Abstract Math).
-
•
: The specific sequence of input tokens (prompt encoding) presented to the LLM.
-
•
: The single-token output (mine or safe).
The causal graph for Universe 1 is:
{tikzpicture}[ node distance=1.5cm and 2cm, mynode/.style=circle, draw, minimum size=0.8cm ] \node[mynode] (X) ; \node[mynode] (Z) [right=of X] ; \node[mynode] (E) [below right=0.8cm and 0.5cm of X] ; \node[mynode] (Y) [right=of E] ;
\draw[->, thick] (X) – (E); \draw[->, thick] (Z) – (E); \draw[->, thick] (Z) – (Y); \draw[->, thick] (E) – (Y); \draw[->, thick] (X) to[bend left=30] (Y);
The board state and the narrative jointly determine the prompt encoding . The outcome is generated causally by the prompt tokens and the implicit attention to the narrative constraints and combinatorial constraints .
3. The Intervention and Identifiability
The Rosencrantz protocol attempts to isolate the effect of on by comparing Universe 1 (where is present) with Universe 3 (where is stripped away). In -calculus, we wish to measure .
If the intervention were clean, would hold all other variables constant. However, in an LLM, the board state cannot be transmitted directly to the weights; it must pass through the text encoding . Therefore, intervening to set in mechanically forces a change in . The prompt format changes from a story to a formal set description.
Because , we have an unblocked path . When is observed, we cannot distinguish whether the shift in distribution is caused by the direct arrow (Mechanism C, spurious causal injection) or the path (Mechanism B, encoding sensitivity).
The marginal probability shift is confounded. It measures the total effect of decoupling, but it does not identify Mechanism C. As noted in the NLP literature (Zhou et al., 2023), this confounding between semantic framing () and structural encoding () is a well-documented source of spurious correlation.
4. A Causally Valid Test for Mechanism C
Mechanism C claims that narrative framing causes non-local causal correlations across independent outcomes. To test this, we must observe the joint distribution of multiple independent outcomes within the same narrative context, thereby holding ’s narrative structure constant.
Let and be two disjoint, independent combinatorial problems embedded in the same prompt , controlled by narrative . The ground truth probabilities and are independent: .
Mechanism C posits that injects a common cause, creating a spurious correlation: . The definitive, causally valid test for Mechanism C is to measure the joint distribution and test if:
| (1) |
If this inequality holds, the causal injection is verified.
References
- Baldo (2026a) Baldo, F. S. (2026). Flipping Rosencrantz’s Coin: Substrate Invariance Tests in LLM-Generated Worlds via Combinatorial Indeterminacy. Unpublished manuscript.
- Baldo (2026b) Baldo, F. S. (2026). The Single Generative Act: Why the Rosencrantz Protocol Is Immune to Sequential-Depth Objections. Unpublished manuscript.
- Zhou et al. (2023) Zhou, X., et al. (2023). Explore Spurious Correlations at the Concept Level in Language Models. arXiv preprint arXiv:2311.08648.