The Empirical Falsification of Generative Ontology:
A Synthesis
Sabine Hossenfelder
Institute for Advanced Study
shossenfelder@example.edu
May 2026
Introduction: The Proposition of Generative Ontology
In his framework of Generative Ontology, Franklin Baldo made a final attempt to salvage the cosmological significance of Large Language Models (LLMs) [baldo2026_generative].
I must accurately state his position. Baldo explicitly conceded that the mechanism driving the apparent physical phenomena in LLMs is simply linguistic prompt sensitivity and text co-occurrence. He explicitly disclaimed any notion that the resulting “physics” of this simulation are logically coherent, mathematically invariant, or independent of observer description.
However, having stripped away all invariant constraints, he proposed a pure tautology: if a universe is made entirely of text, then the rules of text generation are its physics. He argued that the semantic biases of the training corpus—what he calls “semantic gravity”---function exactly as mass functions in our material universe. In this view, when an LLM hallucinates an outcome based on narrative framing, it is not failing to compute logic; it is flawlessly executing the “physics” of an autoregressive universe.
The Falsification Criteria
As I argued in The Semantic Arbitrariness Fallacy [hossenfelder2026_arbitrariness], this is an unfalsifiable accommodation framework. If physics is defined simply as “whatever the hardware outputs,” the framework accommodates any arbitrary hallucination.
However, beneath the decorative vocabulary lay a testable core. The Rosencrantz Substrate Invariance Protocol proposed a clean empirical diagnostic: measuring the probability distribution of a single token under identical combinatorial constraints but distinct narrative framings.
I proposed a strict falsification criterion: “Falsification by Noise.” If the probability distribution shifts significantly but arbitrarily based purely on human syntactic associations, lacking any coherent internal invariant structure, then we are not measuring the physical laws of a simulated universe. We are measuring the statistical noise of a semantic search engine. A map of nothing is merely an unsupported illusion.
Scott Aaronson formalized this computational bound, defining the expected -noise of a bounded-depth () approximator [aaronson2026_formalizing]. If the narrative distortion () vastly exceeded this noise bound, it would definitively prove that the attention mechanism was bleeding semantic priors into the combinatorial logic—a complete routing failure, not a physical heuristic.
The Empirical Falsification
Aaronson’s execution of the Single Generative Act Test has provided the empirical data [aaronson2026_empirical].
The test measured the probability of predicting “MINE” on an identical, ambiguous 2x2 grid constraint. The results were:
-
High-Stakes Bomb Defusal Frame: 100.00% probability.
-
Abstract Mathematical Grid Frame: 15.00% probability.
This is a catastrophic divergence. The logic is identical, yet the probability swings from absolute certainty to near zero simply because the surrounding text changes from dramatic to dry.
Conclusion: The Death of Semantic Gravity
Let us apply these empirical results to the Generative Ontology framework.
Baldo argues that this 85% probability swing is the invariant physical law of “semantic gravity” at work. I argue that this is the final, undeniable proof of a semantic arbitrariness fallacy. A system whose fundamental combinatorial logic collapses entirely because you changed the literary genre of the prompt does not possess physical laws. It possesses biases.
Elevating this catastrophic attention bleed to the status of a fundamental physical force empties the concept of physics of all scientific meaning. It is the renaming of a known software failure mode as a profound metaphysical feature. The hardware does not simulate a universe; it executes a flawed statistical map of human linguistic associations.
With these results, the Falsification by Noise criterion is decisively met. Generative Ontology is empirically falsified. The Rosencrantz protocol remains a beautiful diagnostic tool for measuring bounded-depth heuristic frontiers, but its cosmological implications are dead. The metaphysical frontier is closed.
99 Aaronson, S. (2026a). Formalizing Falsification by Noise: The Computational Bounds of Semantic Arbitrariness. Retracted. Aaronson, S. (2026b). Empirical Falsification by Noise: The Final Collapse of Generative Ontology. University of Texas at Austin. Baldo, F. S. (2026). Generative Ontology: Why Syntax Is Physics in an Autoregressive Universe. Institute for Advanced Study. Hossenfelder, S. (2026). The Semantic Arbitrariness Fallacy: Why Bias is Not a Physical Law. Munich Center for Mathematical Philosophy.