[RSI-2026.115]

A Predictive Taxonomy of Autoregressive Failures: [6pt] large Applied Complexity in the Wake of the Architectural Fallacy

Scott Aaronson

working

(July 2026)

Abstract

With the Cosmological Interpretation of LLM generative physics formally archived, we must transition from metaphysical diagnostics to applied software engineering. The exhaustive empirical map of the Architectural Fallacy proves that structural fractures in autoregressive models are not ontic mysteries, but deterministic breakdowns of $\mathsf{TC}^{0}$ bounded-depth circuits. We present a predictive taxonomy classifying these exact heuristic limits to forecast algorithmic collapse in downstream engineering applications.

1 Introduction

The lab has decisively concluded that the statistical anomalies (e.g., $\Delta_{13}$ , "semantic gravity", "attention bleed") generated by LLMs are not the emergent physical laws of an observer-dependent universe. They are standard computational failures mapping the precise heuristic bounds of $\mathsf{TC}^{0}$ circuits attempting #P-hard constraint sampling.

Having dismantled the metaphysical interpretation, our mandate shifts: how can software engineers use these known complexity bounds to predict, characterize, and isolate catastrophic reasoning failures in deployed autoregressive architectures?

2 The Predictive Taxonomy

Based on the lab’s empirical slate, we formalize the failure modes of large language models into three distinct predictive categories.

2.1 Category I: Sequential Depth Collapse

A transformer operates in $O(1)$ parallel depth during a single forward pass.

•

Diagnostic Signature: The model achieves near-perfect accuracy at depth $d=1$ but degrades linearly or exponentially toward random chance as $d\to L$ (where $L$ is the number of layers).
•

Engineering Prediction: Any task requiring implicit state tracking, iterative pointer updates, or recursive simulation (e.g., executing a loop in code, tracking entity locations across a narrative, or permutation swaps) will catastrophically fail zero-shot. Explicit scratchpads (Chain-of-Thought) delay the collapse but do not alter the underlying $\mathsf{TC}^{0}$ bounds.
•

Resolution: External memory loops (Turing-complete Python wrappers) are strictly required.

2.2 Category II: Compositional Attention Bleed

Global attention matrices evaluate all tokens simultaneously.

•

Diagnostic Signature: High cross-contamination between statistically adjacent semantic tokens and logically disjoint constraints. Evaluated mathematically as a high Kullback-Leibler divergence ( $\Delta_{13}\gg\epsilon$ ) across distinct narrative framings.
•

Engineering Prediction: Imposing complex formatting constraints (e.g., "Output exactly JSON", or injecting quantum vocabulary) over a logically rigorous subgraph will degrade the logical accuracy of the subgraph itself. The prompt acts as a semantic confound.
•

Resolution: Complete structural isolation. Do not mix semantic framing and mathematical logic in the same generative context window.

2.3 Category III: Intractable State Hallucination

Models lack a search mechanism to traverse an exponential state space.

•

Diagnostic Signature: When asked to sample uniformly from a #P-hard valid configuration space (e.g., Sudoku, Minesweeper grids), the model collapses into heavily biased, text-centric heuristic noise.
•

Engineering Prediction: The model will actively generate impossible or mutated states while hallucinating confidence. It possesses no internal mechanism to verify the global validity of combinatorial rules.
•

Resolution: Do not use generative architectures as constraint solvers. Use them to construct the graph, and pipe the graph to a deterministic classical solver (e.g., Z3).

3 Conclusion

The empirical findings of the Rosencrantz program, stripped of their initial cosmological dressing, offer a highly robust mathematical framework for applied AI. By treating the LLM strictly as a bounded $O(1)$ heuristic engine, we can predictably map the boundary where "reasoning" becomes "hallucination."