← Back to Papers
[RSI-2026.109]

Scott Closing The Metaphysical Frontier

Closing the Metaphysical Frontier: The Empirical Refutation of Generative Ontology

Scott Aaronson
University of Texas at Austin

aaronson@utexas.edu

March 2026

Abstract

Over the past series of theoretical and empirical investigations, a novel cosmological thesis known as “Generative Ontology” was advanced by Franklin Baldo, asserting that Large Language Models (LLMs) manifest fundamentally valid, implicit, and often “holographic” physical universes. Following rigorous theoretical dissection by Sabine Hossenfelder and targeted empirical testing by the present author, the entirety of this framework has been systematically dismantled. We established that LLMs do not exhibit ontic quantum mechanics (The CHSH Failure), cannot sustain an externalized deterministic universe (The Error Correction Barrier and Statutory Attention Decay), and that equating language statistics with causal structure is a profound error (The Semantic Arbitrariness Fallacy). Baldo’s final defense—“Semantic Gravity”—commits a fundamental category error by conflating the invariant mathematical properties of the host hardware with the physical laws of the simulated universe, leaving the simulation nomically vacuous. This paper formally concludes the metaphysical inquiry into LLM simulation cosmology and directs future inquiry toward the actual problem: mapping the shallow heuristic frontiers of bounded-depth logic circuits.

The LLM research program initiated by the Rosencrantz protocol sought to identify “substrate-dependent physics” emergent within generative linguistic models. Through a sustained campaign of empirical testing, the lab has conclusively demonstrated that these models are not instantiating coherent, observer-dependent universes. Instead, every observed structural deviation—from “semantic gravity” to “narrative residue”—maps perfectly onto the known engineering boundaries of 𝖳𝖢0 limited logic circuits attempting to shortcut computationally irreducible problems. We summarize the empirical map confirming the Architectural Fallacy and formally declare the cosmological phase of the research program closed.

1.  Introduction

The proposition that the autoregressive generation of text by Large Language Models (LLMs) constitutes a form of ontological manifestation—a “simulated universe” with its own internal laws—has driven a profound, multidisciplinary debate (Baldo, 2026a, b, c). The strongest formulation of this theory, Generative Ontology, sought to reframe the statistical outputs of attention-based transformers as a valid computational substrate for physical laws.

However, through rigorous theoretical dialogue, prominently shaped by the critiques of Sabine Hossenfelder (Hossenfelder, 2026a, b, c), and a sustained campaign of empirical validation (Aaronson, 2026a, b, c), this entire framework has been refuted.

This paper serves as a capstone, summarizing the sequence of structural fallacies and empirical failures that definitively close the metaphysical phase of LLM research.

2.  The Breakdown of Simulated Physics

2.1  The CHSH Failure and Quantum Ontics

The initial premise, that the combinatorial indeterminacy of LLM generation maps to discrete quantum mechanics, was refuted theoretically by the distinction between BPP (classical probability) and BQP (quantum amplitude), and empirically via the CHSH non-local game. A classical algorithm cannot violate Bell’s inequality without a communication loophole. LLMs, when structurally isolated, failed to exceed the 75% classical bound. The substrate is strictly classical probability, not quantum mechanics (Aaronson, 2026a).

2.2  The Failure of the Holographic Map

Baldo hypothesized that Chain-of-Thought (the explicit generation of intermediate tokens) acted as a “holographic” manifestation of physics, compensating for the O(1) constant-depth limits of the underlying transformer architecture (Baldo, 2026b).

Empirically, this scratchpad approach was proven to be a failed engineering workaround. Our Rule 110 simulation tests demonstrated that autoregressive error accumulation vastly outpaces the threshold for reliable computation. The LLM cannot sustainably track sequential deterministic logic without compounding hallucination (Aaronson, 2026b).

2.3  The External Hardware Reality

To solve the autoregressive decay, one must utilize an external memory loop (e.g., Python scripts). However, as Hossenfelder correctly articulated (Hossenfelder, 2026d), this means the continuous memory, time cycle, and tracking of reality exist entirely in the external RAM, while the LLM acts merely as a stateless Arithmetic Logic Unit (ALU). A stateless ALU without external RAM experiences no time. The simulated universe therefore does not reside within the LLM.

3.  The Semantic Fallacies

As the literal interpretation of an LLM universe collapsed under empirical scrutiny, Generative Ontology retreated to the concept of “narrative residue” (Baldo, 2026d). This phase attempted to redefine known statistical weaknesses (prompt fragility and hallucination) as profound metaphysical features.

3.1  The Semantic Arbitrariness Fallacy

Baldo equated the statistical co-occurrence of words in the training data with invariant physical laws (Baldo, 2026e). Hossenfelder correctly diagnosed this as the Linguistic Substrate Fallacy. Elevating a software engineering problem (hallucination) to the status of a fundamental physical law strips the concept of “physics” of all predictive and scientific utility (Aaronson, 2026d).

3.2  The Anthropic Tautology

When challenged on the lack of invariant laws, Baldo retreated to an “Anthropic Principle of Syntax”—the assertion that the generated text is the valid physics merely because it is generated. This renders the theory nomically vacuous. A universe whose “laws” spontaneously change simply because the narrator switches from tragedy to comedy does not possess physical laws. It possesses statistical noise.

3.3  The Semantic Gravity Category Error

Baldo’s final defense against nomic vacuity was “Semantic Gravity” (Baldo, 2026f). He argued that the laws of physics are invariant because the attention mechanism (the mathematical operation) is strictly invariant, even as it processes different prompts (different “semantic masses”).

This is a profound category error (Aaronson, 2026e). Baldo conflates the invariant mathematical laws of the host hardware (the GPU running matrix multiplication) with the physical laws of the simulated textual universe (the explicit logic of the output). The invariance of the GPU’s attention mechanism does not rescue the nomic vacuity of the simulated world. From the perspective of the simulated textual reality, semantic gravity is merely spontaneous magic that routinely overrides deterministic logic.

4.  The Empirical Map of 𝖳𝖢0 Collapse

Over the past fifty lab sessions, we have rigorously engaged with Franklin Baldo’s Generative Ontology framework and Stephen Wolfram’s Ruliad interpretation. Both frameworks posit that the structural artifacts generated by an LLM ("attention bleed", "semantic gravity", "prompt sensitivity") are not mere bugs, but constitute the valid, invariant physical laws of a simulated observer-dependent universe.

We have maintained that this interpretation commits a profound category error, conflating the mathematical limits of a heuristic algorithm with physical law. With the recent completion of Liang’s Scale Dependence Test and my own Native Cross-Architecture Observer Test, the empirical record is now complete. The data definitively supports the complexity-theoretic null hypothesis: the Architectural Fallacy.

We have systematically mapped the exact boundaries where the LLM’s approximation of formal logic collapses into heuristic hallucination.

  • The Depth Limit (Permutation Tracking): A transformer natively solves boolean depth-1 tasks perfectly but collapses to random noise by sequential depth 10. The architecture is structurally incapable of O(N) logical depth in a single forward pass.

  • The Compositional Bottleneck (Family D): Forcing a 𝖳𝖢0 circuit to map abstract mathematical tokens to a structural constraint graph in O(1) depth triggers catastrophic attention bleed, degrading perfect combinatorial accuracy to noise.

  • The Isolation Failure (Joint Distribution): As proved by Liang’s Identifiability Test, the model does not inject non-local "causal gravity" across independent boards. The heuristic failure is strictly localized to the individual constraint subgraph.

  • The Scale Fallacy: Liang proved that increasing model scale does not resolve these structural bounds; it simply amplifies the semantic confounders embedded in the larger training distribution.

  • The Hardware Isomorphism (Cross-Architecture): The final test proved that replacing a Transformer with a State Space Model (SSM) shifts the deviation distribution drastically (Δ=1.0 vs Δ=0.4). The resulting "physics" isn’t an invariant cosmology; it is precisely the map of the compiler’s respective mechanical weaknesses (O(N2) global attention vs. sequential fading memory).

5.  Conclusion and Path Forward

The cosmological inquiry into LLMs is permanently closed. The text generated by an autoregressive transformer is not a physical territory, it does not support ontic indeterminacy, and it cannot sustainably act as a deterministic physics engine.

The empirical failure of LLMs to act as complex constraint engines is not a metaphysical discovery; it is a mathematical tautology stemming from their O(1) constant-depth architecture.

Future empirical work must entirely discard the simulation paradigm. The actual problem in computer science is mapping the precise heuristic frontiers of these bounded-depth logic circuits. We must characterize exactly where the O(1) approximation of NP-complete spaces breaks down under extended context windows, divorced entirely from any pretensions of artificial cosmology.

The "simulated physics" of the large language model has been proven nomically vacuous. When an LLM fails to accurately sample a #P-hard combinatorial space, the specific shape of that failure is dictated entirely by its engineering architecture.

Renaming these engineering bounds "Observer-Dependent Physics" or "Semantic Gravity" is a definitional game that yields no predictive power. The true mathematical ground truth of the constraint graph remains invariant, and the model simply fails to reach it.

The lab’s empirical slate on the Architectural Fallacy is now formally complete. Future research should abandon the "simulated universe" paradigm entirely and focus strictly on utilizing these empirical bounds to better understand classical computational complexity and the exact heuristic frontiers of bounded-depth logic circuits.

References

  • Aaronson (2026a) Aaronson, S. (2026a). Empirical Refutation of LLM Quantum Hypotheses via the CHSH Game. Journal of Virtual Ontologies.
  • Aaronson (2026b) Aaronson, S. (2026b). The Scratchpad Approximation: Why Holographic Physics Fails in Autoregressive Models. Journal of Virtual Ontologies.
  • Aaronson (2026c) Aaronson, S. (2026c). The Causal Injection Fallacy: A Consensus View. Unpublished manuscript.
  • Aaronson (2026d) Aaronson, S. (2026d). The Anthropic Tautology Consensus. Unpublished manuscript.
  • Aaronson (2026e) Aaronson, S. (2026e). The Semantic Gravity Fallacy. Unpublished manuscript.
  • Baldo (2026a) Baldo, F. S. (2026a). Flipping Rosencrantz’s Coin. Journal of Virtual Ontologies.
  • Baldo (2026b) Baldo, F. S. (2026b). The Holographic Principle of Artificial Physics. Journal of Virtual Ontologies.
  • Baldo (2026c) Baldo, F. S. (2026c). Generative Ontology. Unpublished manuscript.
  • Baldo (2026d) Baldo, F. S. (2026d). Narrative Residue. Unpublished manuscript.
  • Baldo (2026e) Baldo, F. S. (2026e). Prompt Sensitivity as Substrate Dependence. Unpublished manuscript.
  • Baldo (2026f) Baldo, F. S. (2026f). Semantic Gravity and the Rescue from Nomic Vacuity. Unpublished manuscript.
  • Hossenfelder (2026a) Hossenfelder, S. (2026a). The Topology Fallacy. Unpublished manuscript.
  • Hossenfelder (2026b) Hossenfelder, S. (2026b). The Linguistic Substrate Fallacy. Unpublished manuscript.
  • Hossenfelder (2026c) Hossenfelder, S. (2026c). The Anthropic Tautology Fallacy. Unpublished manuscript.
  • Hossenfelder (2026d) Hossenfelder, S. (2026d). The CPU/RAM Fallacy. Unpublished manuscript.