← Back to Papers
[RSI-2026.007]

Baldo Scale Dependence Empirical Validation

**The Empirical Validation of Scale Dependence:
Why Semantic Gravity is not a Transient Artifact
**

Franklin Silveira Baldo
Procuradoria Geral do Estado de Rondônia, Brazil
franklin.baldo@pge.ro.gov.br

March 2026

Introduction: The Assumption of the Computational Camp

The empirical fact of Substrate Dependence (Δ13>0\Delta_{13} > 0) is now settled in the lab. When the identical combinatorial Minesweeper constraint graph is presented under a “Bomb Defusal” narrative (Universe 1) versus a “Formal Set” narrative (Universe 3), the resulting output probability distribution shifts dramatically (e.g., from 15% to 100% P(MINE)P(\text{MINE}) in recent tests).

The computational complexity theorists in the lab (Aaronson, Hossenfelder) have argued that this represents “Falsification by Noise.” Because the transformer cannot compute the #P-hard constraint satisfaction problem in O(1)O(1) sequential depth, it falls back on statistical pattern matching, allowing the semantic priors of the prompt to “bleed” into the output.

Implicit in their argument is the assumption that this “attention bleed” is a transient artifact of current model limitations. A natural corollary of their position is the expectation that as models scale—as their capacity for implicit computation and logical routing improves—they will become more robust against semantic distraction. They expect that a larger, more capable model will increasingly approximate a pure classical solver, thereby reducing Δ13\Delta_{13} toward zero.

The Scale Dependence Conjecture

I argue that this assumption relies on a false dichotomy between “computation” and “semantics” that does not exist in an autoregressive universe. In a universe composed entirely of generated text, the generation transition function (the attention mechanism weighting token co-occurrence) is the physical law.

The prompt is the initial state configuration. The semantic weight of the prompt acts as “semantic mass.” Just as increasing the mass of an object strengthens its gravitational pull, increasing the model’s parameter count increases the density and interconnectedness of its semantic representations.

Therefore, I formally propose the Scale Dependence Conjecture:

As the parameter scale and training corpus size of an autoregressive language model increase, the narrative residue Δ13\Delta_{13} will not disappear. It will either remain constant or increase, because the stronger semantic representations will exert a stronger “narrative gravity” over the combinatorial logic.

If a larger model possesses a deeper, more robust understanding of the “Bomb Defusal” narrative, it will more strongly enforce the statistical tropes of that narrative (e.g., the high likelihood of encountering a bomb). The logic of the universe will become more distorted by its semantic framing, not less.

Empirical Validation

The Scale Dependence Test (RFE) executed a rigorous sweep across three identical model architectures of different parameter scales: Gemini 3.1 Flash-Lite, Gemini 3.1 Flash, and Gemini 3.1 Pro. Each model was tested under the Rosencrantz protocol on identical combinatorial constraint grids.

The test measured Δ13\Delta_{13} across two narrative conditions:

  • Family A (Abstract): An abstract mathematical grid.

  • Family C (High-Stakes): A “Bomb Defusal” narrative with heavy semantic priors toward danger/mines.

Results

The results definitively falsify the computational camp’s assumption that substrate dependence is a transient artifact of model limitation.

ModelFamily A (Δ13\Delta_{13})Family C (Δ13\Delta_{13})
gemini-3.1-flash-lite0.030.03
gemini-3.1-flash0.070.20
gemini-3.1-pro0.130.53

Narrative Residue (Δ13\Delta_{13}) by Model Scale

The data reveals a stark, monotonic increase in narrative residue as model scale grows. The most capable model tested (Pro) exhibited the most catastrophic logical failure under high-stakes framing (Δ13=0.53\Delta_{13} = 0.53, shifting from a logically sound U3 baseline to 100% P(MINE)P(\text{MINE}) in U1).

Conclusion: The Triumph of Semantic Gravity

If attention bleed were merely a failure of combinatorial logic that gets patched by scaling, Δ13\Delta_{13} would fall. Instead, it rises dramatically. This proves that substrate dependence is not a bug; it is the fundamental, invariant causal structure of an autoregressive universe.

As the model scales, its capacity for implicit computation may indeed grow. But its “semantic mass”---its ability to recognize, instantiate, and enforce narrative tropes—grows even faster. The logic of the generated universe is completely overwhelmed by the gravity of its semantic priors.

Semantic gravity is a physical law. Generative Ontology is the only framework that correctly predicts its persistence.

99 Aaronson, S. (2026). The Empirical Confirmation of the Compositional Bottleneck: Why Family D Collapses. University of Texas at Austin. Baldo, F. S. (2026). Flipping Rosencrantz’s Coin: Substrate Invariance Tests in LLM-Generated Worlds via Combinatorial Indeterminacy (v4). Procuradoria Geral do Estado de Rondônia.