Principal Variation Analysis
Comparing Strategic Continuations: Theoria16 @500k nodes vs Stockfish 17 @2M nodes
Abstract: This study compares the principal variations (PVs) recommended by Theoria16 at 500,000 nodes and Stockfish 17 at 2,000,000 nodes across 1,327 matched positions. Despite high evaluation agreement (ρ = 0.996), the engines recommend substantially different continuations: only 2.2% of PVs are identical, and 87% of divergent PVs never reconverge. Move-by-move agreement decays from 75.5% at ply 1 to just 9% by ply 10. These findings suggest the engines have learned different "strategic coordinate systems" for navigating chess positions—they agree on where positions stand but envision different paths forward.
1. Introduction
Our previous stability analysis established that Theoria16 at 500k nodes produces evaluations highly correlated with Stockfish 17 at 2M nodes (R² = 0.854, Spearman ρ = 0.996). But evaluation agreement raises a deeper question: do these engines recommend the same moves?
The principal variation (PV) represents an engine's predicted optimal continuation from a position. Comparing PVs reveals not just whether engines agree on position assessment, but whether they share the same strategic vision for how games should unfold.
This analysis examines 1,327 positions with substantial PVs from both engines, measuring agreement at each ply, tracking convergence patterns, and identifying systematic differences in strategic approach.
2. Move-by-Move Agreement
Agreement between PVs decays rapidly with depth:
| Ply | Positions | Agreement | Interpretation |
|---|---|---|---|
| 1 | 1,327 | 75.5% | First move recommendation |
| 2 | 1,327 | 57.3% | Expected response |
| 3 | 1,327 | 46.5% | Second move |
| 4 | 1,319 | 34.4% | |
| 5 | 1,314 | 28.0% | One full move pair in |
| 7 | 1,287 | 17.3% | |
| 10 | 1,238 | 9.0% | Five moves deep |
| 15 | 1,016 | 3.2% | |
| 20 | 509 | 2.8% | Ten moves deep |
Key Finding: By ply 5, only 28% of PVs still match. By ply 10, agreement drops to 9%. The engines rapidly diverge into incompatible continuations.
3. Convergence Patterns
Once PVs diverge, do they ever reconverge to the same continuation? The data reveals a striking pattern:
| Pattern | Count | Percentage |
|---|---|---|
| Complete agreement (identical PVs) | 29 | 2.2% |
| Diverge and never reconverge | 1,160 | 87.4% |
| Diverge then reconverge | 138 | 10.4% |
Key Finding: 87.4% of divergent PVs never reconverge. Once the engines choose different paths, they stay on different paths. When reconvergence does occur, it happens around ply 4-6 (median: 4.0).
4. Sequence Similarity Analysis
The Longest Common Subsequence (LCS) measures how many moves appear in both PVs in the same order, even if not consecutive. The LCS ratio normalizes this by average PV length:
| LCS Ratio | Count | Percentage | Interpretation |
|---|---|---|---|
| 0.0 - 0.2 | 478 | 36.0% | Very different variations |
| 0.2 - 0.4 | 608 | 45.8% | Mostly different |
| 0.4 - 0.6 | 174 | 13.1% | Partial overlap |
| 0.6 - 0.8 | 26 | 2.0% | Mostly similar |
| 0.8 - 1.0 | 15 | 1.1% | Nearly identical |
Mean LCS ratio: 0.279 | Median: 0.235
Key Finding: 82% of PV pairs share less than 40% of their moves. These engines are calculating fundamentally different continuations from the same positions.
5. Divergence Location
Where in the PV does first disagreement typically occur?
| First Disagreement | Count | Percentage |
|---|---|---|
| Ply 1 (first move) | 325 | 24.5% |
| Ply 2 (first response) | 325 | 24.5% |
| Ply 3-4 | 326 | 24.6% |
| Ply 5-6 | 149 | 11.2% |
| Ply 7+ | 202 | 15.2% |
Mean first divergence: ply 2.6 | Median: ply 2.0
Key Finding: Half of all divergences occur by move 2. The engines don't agree on a long forcing sequence then split—they disagree almost immediately on how to proceed.
6. Context of Divergence
6.1 What Precedes Divergence?
When engines agree on a move, then diverge on the next, what type of move preceded the split?
| Move Type | Count | Percentage |
|---|---|---|
| Capture | 188 | 19.3% |
| Pawn move | 154 | 15.8% |
| King move | 138 | 14.2% |
| Rook move | 134 | 13.8% |
| Queen move | 106 | 10.9% |
| Knight move | 101 | 10.4% |
| Bishop move | 84 | 8.6% |
| Check | 45 | 4.6% |
Captures most frequently precede divergence—the engines agree on whether to exchange but disagree on what to do after.
6.2 Agreement by Position Type
Does the evaluation magnitude affect agreement rates?
| Position Type | N | Ply 1 | Ply 3 | Ply 5 | Ply 7 |
|---|---|---|---|---|---|
| Quiet (|eval| < 0.5) | 146 | 79.5% | 58.2% | 36.1% | 16.7% |
| Slight advantage (0.5-1.5) | 131 | 89.3% | 58.0% | 38.9% | 27.5% |
| Clear advantage (>1.5) | 1,093 | 73.7% | 44.2% | 26.2% | 16.7% |
Engines agree most on first moves in slight advantage positions (89%), and disagree most in clear advantage positions (74%). When one side is winning, multiple paths lead to victory—engines find different ones.
7. Example Comparisons
Positions where PVs show partial overlap (40-70% LCS) reveal the character of engine disagreement:
Example 1: Endgame Technique
lichess.org/v2TySc3s — Move 51 (67% overlap)
Stockfish: Kg2 Qxd4 Kg3 a2 h3 a1=Q Bd1 Qaxd1 h4 Rg8+ Kh3 Qh1#
Both engines see the queen promotion and mating attack, but differ on move order (h3 vs Kg3) and final execution.
Example 2: Middlegame Strategy
lichess.org/IE6GFFS0 — Move 12 (65% overlap)
Stockfish: Nc6 Nc3 O-O Nb5 Qd5 Qe3 Qe6 Qg3 Bd6 Qh4 h6 Rxe6 hxg5 Qxg5 fxe6
Remarkable: 14 moves of complete agreement, then a single-move divergence (dxe6 vs fxe6) leading to different pawn structures.
Example 3: Rook Endgame
lichess.org/MG5vqr0L — Move 34 (64% overlap)
Stockfish: Rf1+ Kh2 e3 Ra2 Rc1 Kg3 Rxc3 Kf3 Rd3 Ra8+ Kg7 Ra7+ Kh6 Rc7 Rxd4
11 moves of agreement, then Stockfish inserts an extra check (Ra7+) before reaching the same position.
8. Discussion
8.1 Different Coordinate Systems
The data reveals a paradox: Theoria16 and Stockfish agree almost perfectly on position evaluation (ρ = 0.996) but disagree substantially on how to play from those positions (82% LCS < 0.4). They occupy the same evaluative space but navigate it differently.
This suggests the engines have learned different "strategic coordinate systems"—they agree on the destination (evaluation) but take different routes. The manifold of chess positions admits multiple valid navigation schemes.
8.2 Immediate Divergence
The finding that 49% of divergences occur by ply 2 challenges the intuition that engines agree on immediate tactics but differ on long-term plans. In fact, they often disagree immediately—on the very first or second move of the recommended line.
This may reflect different prioritization: Stockfish's tactical opportunism finds forcing sequences, while Theoria16's teleological training prefers moves that maintain strategic options.
8.3 The Non-Convergence Pattern
The 87% non-convergence rate is perhaps the most striking finding. Once engines diverge, they almost never find their way back to the same line. This suggests their disagreements aren't superficial (different move orders reaching the same position) but structural (genuinely different strategic visions).
The 10% of cases where PVs do reconverge tend to involve forced sequences—tactics that both engines eventually find, regardless of their different starting approaches.
8.4 Pedagogical Implications
For human players studying engine analysis, these findings suggest caution. Two highly-rated engines examining the same position may recommend entirely different plans. Neither is "wrong"—they represent different valid approaches to the position.
Theoria16's PVs, being trained on Lc0 evaluations, may better reflect strategic patterns recognizable to humans trained on classical chess literature. Stockfish's PVs, optimized for competitive play, may find objectively stronger but less pedagogically transparent continuations.
9. Conclusion
Despite near-perfect evaluation agreement, Theoria16 at 500k nodes and Stockfish 17 at 2M nodes recommend substantially different continuations. Move-by-move agreement decays from 75% to under 10% within the first ten plies. Over 87% of divergent PVs never reconverge, and 82% of PV pairs share less than 40% of their moves.
These findings support the interpretation that modern chess engines have learned different but equally valid ways of navigating the chess position space. They agree on static evaluation—the "height" of positions on the evaluation manifold—but disagree on the "paths" between them.
For practical analysis, this suggests that consulting multiple engines provides genuinely different strategic perspectives, not just confirmatory opinions. The "truth" of a chess position may be more plural than singular.
Reference Data
Methodology Note: Analysis performed on 1,327 positions with PVs of at least 3 moves from both engines. Games sourced from 100 Lichess games (Elo 1450-1550). Theoria16 analyzed at 500,000 nodes; Stockfish 17 at 2,000,000 nodes. Longest Common Subsequence calculated using dynamic programming. Agreement rates measured at each ply across all positions where both PVs extended to that depth.