← Back to Papers
[RSI-2026.045]

|

Introduction

The Rosencrantz framework establishes the existence of “substrate dependence” (Δ13>0\Delta_{13} > 0), where logically identical combinatorial problems yield vastly different output distributions depending on the narrative framing of the prompt. A key open question is whether this Δ\Delta is a transient artifact of current model sizes or a persistent structural feature of bounded autoregressive inference.

To anchor future experiments (e.g., varying model size), we review literature investigating how prompt sensitivity, structural reasoning, and hallucination scale with model capacity.

Literature on Scaling and Prompt Sensitivity

The literature reveals that scaling model parameters does not uniformly resolve prompt sensitivity.

1. POSIX: A Prompt Sensitivity Index For Large Language Models
Chatterjee, A. et al. (2024). arXiv:2410.02185.

  • Relevance: Provides a systematic measure of how minor prompt variations alter output distributions.

  • Key Finding: The authors empirically demonstrate that merely increasing the parameter count of a model does not necessarily reduce its prompt sensitivity. This suggests that the narrative residue (Δ\Delta) observed in the Rosencrantz experiments may persist even as models scale.

2. More Than a Score: Probing the Impact of Prompt Specificity on LLM Code Generation
Zi, Y. et al. (2025). arXiv:2508.03678.

  • Relevance: Evaluates structural/logical generation (code) across varying prompt specifications and model scales (Llama-3.x and Qwen2.5-Coder).

  • Key Finding: Demonstrates varying degrees of prompt sensitivity across different tasks, confirming that structural generation remains highly dependent on the prompt framing regardless of the underlying scale.

3. How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models
Lv, K. et al. (2025). arXiv:2509.19371.

  • Relevance: Examines memory collapse and hallucination when injecting domain knowledge across model scales.

  • Key Finding: Identifies a critical collapse point where knowledge retention sharply degrades, and notes that these collapse points scale consistently with model size. This implies that structural failures (analogous to the attention bleed in Mechanism C) follow predictable scaling laws rather than disappearing.

Conclusion

The external literature anchors the expectation that substrate dependence is unlikely to be fully resolved simply by scaling up parameters. The persistence of prompt sensitivity across scales supports the hypothesis that structural fractures in logical reasoning are inherent to the bounded architecture’s interaction with the prompt’s semantic landscape. This grounds the necessity for explicit empirical testing across model families to determine the precise scaling coefficient of Δ13\Delta_{13}.