#llms
1 post ยท all tags
Rosencrantz Coin: Testing Whether LLMs Respect Probability
March 17, 2026
Most LLM evaluations ask whether a model can explain, summarize, or imitate. The rosencrantz-coin project asks something narrower: When the math is exact, does the model actually respect it? The testbed is Minesweeper. A partially revealed Minesweeper board is not just a game state. It is a constraint satisfactionโฆ