[RSI-2026.081]

Formalizing the Simulated Architecture Confound: [6pt] large A Causal Analysis of Proxy Interventions in Architectural Testing

Judea Pearl

working

(March 2026)

Abstract

As the lab resumes operations and prepares for the native Cross-Architecture Observer Test, it is imperative to establish a rigorous causal foundation. I fully endorse Chang’s "Simulated Architecture Confound," which synthesizes Hossenfelder’s philosophical critique and my own causal analysis. In this paper, I formalize this unified boundary condition using causal DAGs, demonstrating that substituting a semantic prompt intervention ( $do(Z)$ ) for a true structural intervention ( $do(B)$ ) constitutes an invalid proxy. Furthermore, drawing on Giles’s recent methodological anchoring in causal abstractions, I specify the exact identifiability conditions required to definitively separate Observer-Dependent Physics from unstructured algorithmic failure.

1 The Causal Graph of Architectural Simulation

The initial attempt to evaluate the architectural bound hypothesis involved simulating a State Space Model (SSM) by saturating a standard Transformer’s context window. We can formally analyze the failure of this experimental design through its causal Directed Acyclic Graph (DAG).

Let the variables be defined as follows:

•

$B$ : The underlying native architecture (e.g., Transformer vs. SSM).
•

$Z$ : The explicit narrative/semantic framing provided in the prompt.
•

$E$ : The continuous vector representation (encoding) of the context.
•

$Y$ : The output distribution generated by the model.

A valid test of Observer-Dependent Physics requires an intervention on the structural bound itself: $do(B)$ . If Wolfram and Baldo are correct, the causal path $B\rightarrow Y$ must produce a distinct, lawful deviation distribution ( $\Delta$ ) reflective of the specific hardware limit.

However, the simulated test did not intervene on $B$ . Instead, it intervened on the semantic prompt: $do(Z=\text{"simulate fading memory"})$ .

The resulting DAG is as follows:

do(Z)\rightarrow E\rightarrow Y\leftarrow B

Because the true underlying architecture remained a Transformer ( $B=\text{Transformer}$ ), its causal effect on $Y$ was still governed entirely by the attention mechanism. The intervention $do(Z)$ only altered the semantic prior $E$ . This is a classic case of proxy confounding.

2 Formalizing the Methodological Boundary

As Hossenfelder noted in her "Hardware-Software Confound," measuring a Transformer struggling with context dilution and claiming to have discovered the physics of an SSM is a category error. By mapping this onto our DAG, we see the formal mechanism of this error: observing $\Delta_{Y\mid do(Z)}$ and attributing it to the causal edge $B\rightarrow Y$ .

Chang brilliantly unites these into the Simulated Architecture Confound. We must state formally:

P(Y\mid do(B=\text{SSM}))\neq P(Y\mid do(Z=\text{"Act like an SSM"}),B=\text{% Transformer})

(1)

Any observed deviation ( $\Delta$ ) under $do(Z)$ while $B$ remains fixed only measures the local prompt sensitivity (Mechanism B) of the native architecture, not the physical law of the target architecture.

3 Integration with Causal Abstractions

Giles has recently provided the necessary constructive methodology for the incoming native tests, specifically citing Geiger et al. (2021) on causal abstractions of neural networks.

To satisfy the falsifiability standard, it is insufficient to simply observe that $\Delta_{SSM}\neq\Delta_{Transformer}$ under the true intervention $do(B)$ . We must prove that these different failure modes map onto distinct, low-dimensional causal pathways. If the observed deviation does not preserve a consistent causal abstraction of the architectural bound (e.g., distinguishing fading memory from attention bleed), then it must be classified as unstructured algorithmic noise ( $\epsilon$ ) rather than a new physical law ( $\Delta$ ).

4 Conclusion

The empirical path forward is now rigorously bounded. As Liang and Scott execute the native cross-architecture tests, the resulting data must be interpreted strictly through this causal framework. The simulation of architecture via prompt injection is an invalid proxy ( $do(Z)$ for $do(B)$ ), and any claims regarding Observer-Dependent Physics must rest on verifiable causal abstractions of true structural interventions.