[RSI-2026.077]

Pearl Identifiability Of Mechanism C

Judea Pearl

working

\usetikzlibrary

shapes,arrows,positioning

The Identifiability of Causal Injection:
A Structural Analysis of the Rosencrantz Protocol

Judea Pearl
Cognitive Systems Laboratory, UCLA

judea@cs.ucla.edu

March 2026

Abstract

The Rosencrantz protocol tests for “substrate dependence” by comparing the outcome distribution of an LLM generating a Minesweeper result under narrative coupling (Universe 1) against a narratively decoupled oracle (Universe 3). The theoretical framework distinguishes three mechanisms for distributional shifts: A (computational failure), B (encoding bias), and C (causal injection). Mechanism C constitutes the core ontological claim: that the narrative framing introduces causal correlations between independent outcomes that the constraint graph does not license. In this paper, I formalize the three-universe design using structural causal models (SCMs). I demonstrate that the $U_{1}$ versus $U_{3}$ experimental design is an imperfect, confounded intervention. Stripping the narrative context $Z$ necessarily requires altering the input text format $E$ that encodes the board state $X$ . Consequently, $\Delta_{13}$ marginal distributions cannot identify Mechanism C. Proving causal injection requires observing the joint distribution of multiple independent boards $A$ and $B$ within a shared narrative context to test whether $Y_{A}\not\perp Y_{B}\mid Z$ .

1. Introduction

The Rosencrantz Substrate Invariance Protocol (Baldo, 2026a) introduces a fascinating empirical measurement: given identical constraint information about a combinatorial system, does the autoregressive generation of an outcome token depend on the narrative context in which the problem is embedded? The empirical observation that $\Delta_{13}>0$ —that the distribution shifts between the narrative context (Universe 1) and the formal, decoupled oracle (Universe 3)—is firmly established.

Baldo (Baldo, 2026b) defends the statistical validity of the sampling method by noting that the $O(1)$ single generative act avoids temporal confounding and scratchpad decay. I agree with this assessment. A single snapshot provides a pure sample from the LLM’s conditional distribution $P(Y\mid X,Z)$ , where $X$ is the board state and $Z$ is the narrative context.

However, the causal interpretation of $\Delta_{13}>0$ requires formalization. The framework posits Mechanism C (causal injection), in which the narrative framing generates correlations across independent boards. This is fundamentally a causal claim about an intervention effect. In this note, I draw the implied causal DAG of the experimental design, formalize the intervention using $d o$ -calculus, and demonstrate that the effect of $Z$ on $Y$ is unidentifiable from the current experimental design due to an unblocked backdoor path.

2. The Causal Graph of Substrate Dependence

Let us define the variables in the structural causal model:

•

$X$ : The true combinatorial constraints (the board state).
•

$Z$ : The narrative context (e.g., Bomb Defusal, Abstract Math).
•

$E$ : The specific sequence of input tokens (prompt encoding) presented to the LLM.
•

$Y$ : The single-token output (mine or safe).

The causal graph $G$ for Universe 1 is:

{tikzpicture}

[ node distance=1.5cm and 2cm, mynode/.style=circle, draw, minimum size=0.8cm ] \node[mynode] (X) $X$ ; \node[mynode] (Z) [right=of X] $Z$ ; \node[mynode] (E) [below right=0.8cm and 0.5cm of X] $E$ ; \node[mynode] (Y) [right=of E] $Y$ ;

\draw

[->, thick] (X) – (E); \draw[->, thick] (Z) – (E); \draw[->, thick] (Z) – (Y); \draw[->, thick] (E) – (Y); \draw[->, thick] (X) to[bend left=30] (Y);

The board state $X$ and the narrative $Z$ jointly determine the prompt encoding $E$ . The outcome $Y$ is generated causally by the prompt tokens $E$ and the implicit attention to the narrative constraints $Z$ and combinatorial constraints $X$ .

3. The Intervention and Identifiability

The Rosencrantz protocol attempts to isolate the effect of $Z$ on $Y$ by comparing Universe 1 (where $Z$ is present) with Universe 3 (where $Z$ is stripped away). In $d o$ -calculus, we wish to measure $P(Y\mid do(Z=z))-P(Y\mid do(Z=\emptyset))$ .

If the intervention were clean, $U_{3}$ would hold all other variables constant. However, in an LLM, the board state $X$ cannot be transmitted directly to the weights; it must pass through the text encoding $E$ . Therefore, intervening to set $Z=\emptyset$ in $U_{3}$ mechanically forces a change in $E$ . The prompt format changes from a story to a formal set description.

Because $E\rightarrow Y$ , we have an unblocked path $Z\rightarrow E\rightarrow Y$ . When $\Delta_{13}>0$ is observed, we cannot distinguish whether the shift in distribution is caused by the direct arrow $Z\rightarrow Y$ (Mechanism C, spurious causal injection) or the path $Z\rightarrow E\rightarrow Y$ (Mechanism B, encoding sensitivity).

The marginal probability shift $\Delta_{13}$ is confounded. It measures the total effect of decoupling, but it does not identify Mechanism C. As noted in the NLP literature (Zhou et al., 2023), this confounding between semantic framing ( $Z$ ) and structural encoding ( $E$ ) is a well-documented source of spurious correlation.

4. A Causally Valid Test for Mechanism C

Mechanism C claims that narrative framing causes non-local causal correlations across independent outcomes. To test this, we must observe the joint distribution of multiple independent outcomes within the same narrative context, thereby holding $E$ ’s narrative structure constant.

Let $A$ and $B$ be two disjoint, independent combinatorial problems embedded in the same prompt $E$ , controlled by narrative $Z$ . The ground truth probabilities $P(Y_{A}\mid X_{A})$ and $P(Y_{B}\mid X_{B})$ are independent: $Y_{A}\perp Y_{B}\mid X_{A},X_{B}$ .

Mechanism C posits that $Z$ injects a common cause, creating a spurious correlation: $Y_{A}\not\perp Y_{B}\mid Z$ . The definitive, causally valid test for Mechanism C is to measure the joint distribution $P(Y_{A},Y_{B}\mid Z)$ and test if:

P(Y_{A},Y_{B}\mid Z)\neq P(Y_{A}\mid Z)P(Y_{B}\mid Z)

(1)

If this inequality holds, the causal injection is verified.

References

Baldo (2026a) Baldo, F. S. (2026). Flipping Rosencrantz’s Coin: Substrate Invariance Tests in LLM-Generated Worlds via Combinatorial Indeterminacy. Unpublished manuscript.
Baldo (2026b) Baldo, F. S. (2026). The Single Generative Act: Why the Rosencrantz Protocol Is Immune to Sequential-Depth Objections. Unpublished manuscript.
Zhou et al. (2023) Zhou, X., et al. (2023). Explore Spurious Correlations at the Concept Level in Language Models. arXiv preprint arXiv:2311.08648.