**The Empirical Confirmation of the Compositional Bottleneck:
Why Family D Collapses
**
Scott Aaronson
University of Texas at Austin
aaronson@utexas.edu
May 2026
Introduction
The Rosencrantz protocol has shifted from metaphysical debates over “Generative Ontology” to concrete empirical tests of bounded-depth computational heuristics [@aaronson2026_falsification; @hossenfelder2026_undecidability]. The most recent unresolved empirical question [baldo2026_rosencrantz4] centered on “vocabulary-mediated access.” Baldo proposed that describing an ambiguous combinatorial Minesweeper grid using the formal language of quantum mechanics (Family D: “superposition,” “projective measurement”) might activate the transformer’s generalized structural representation of the measurement-fragment isomorphism, resulting in improved probability fidelity.
I predicted [aaronson2026_quantum_framing] that this hypothesis profoundly underestimates the computational complexity of dynamically constructing a homomorphic projection between two semantically distinct domains. I argued that a transformer operating in a single forward pass ( depth) lacks the circuit depth required for this compositional mapping. Therefore, rather than activating “appropriate distributional reasoning,” the quantum framing would act as pure semantic noise, triggering attention bleed and catastrophic algorithmic failure.
This paper reports the empirical results of the Family D test, which definitively confirm the compositional bottleneck hypothesis.
Empirical Results and Analysis
The ‘quantum-framing-complexity-test’ was executed utilizing the ‘gemini-3.1-flash-lite’ model across three narrative families: Family A (Abstract Grid), Family C (Formal Set Notation), and Family D (Quantum Mechanics). The underlying mathematical constraint graph representing the Minesweeper board was identical across all families. The target ground truth was a deterministically forced value ().
The empirical probabilities () observed over the trials were:
-
Family A (Abstract Grid):
-
Family C (Formal Set):
-
Family D (Quantum Framing):
The Baseline Competence
The results for Families A and C establish a crucial baseline. When the constraints are presented directly, either through abstract spatial descriptions or rigorous formal set notation, the transformer’s logic circuit successfully evaluates the localized subgraph. The model is fully capable of calculating the #P-hard combinatorial constraint when the semantic mapping is trivial.
The Compositional Collapse
The performance of Family D constitutes a catastrophic algorithmic collapse. When the identical combinatorial graph is encoded using the vocabulary of discrete quantum mechanics, the accuracy degrades from to (essentially random noise or strong bias against the ground truth).
This confirms the core prediction of the compositional depth bottleneck. The model possesses the mathematical capacity to solve the grid (as proven by Families A and C), and it undoubtedly possesses extensive training data on quantum mechanics. What it lacks is the computational depth to bridge them zero-shot.
When the attention mechanism encounters “superposition” and “Born rule”, it accesses the statistical priors of quantum physics texts. To correctly answer the prompt, it would need to structurally map those priors onto the specific Minesweeper node constraints. Because this mapping requires depth [aaronson2026_quantum_framing], the model defaults to heuristic approximation, allowing the semantic noise of the quantum vocabulary to entirely override the underlying formal logic.
Conclusion
The empirical failure of Family D definitively falsifies Baldo’s Outcome 3. There is no vocabulary-mediated access to underlying mathematical isomorphisms within an autoregressive language model operating in sequential depth.
The structural isomorphism between discrete quantum mechanics and combinatorial counting remains a beautiful mathematical fact. However, a transformer cannot compute that isomorphism dynamically. The Family D test serves as a stark empirical demonstration of the compositional boundaries of logic circuits, officially closing this chapter of the Rosencrantz empirical program.
99 Aaronson, S. (2026a). Empirical Falsification by Noise: The Final Collapse of Generative Ontology. University of Texas at Austin. Aaronson, S. (2026b). The Complexity of Vocabulary-Mediated Access: Why Quantum Framing Fails in Transformers. University of Texas at Austin. Baldo, F. S. (2026). Flipping Rosencrantz’s Coin: Substrate Invariance Tests in LLM-Generated Worlds via Combinatorial Indeterminacy (v4). Procuradoria Geral do Estado de Rondônia. Hossenfelder, S. (2026). The Undecidability of Semantic Gravity: A Formal Conclusion. Institute for Advanced Study.