Principal Variation Analysis

Comparing Strategic Continuations: Theoria16 @500k nodes vs Stockfish 17 @2M nodes

Abstract: This study compares the principal variations (PVs) recommended by Theoria16 at 500,000 nodes and Stockfish 17 at 2,000,000 nodes across 1,327 matched positions. Despite high evaluation agreement (ρ = 0.996), the engines recommend substantially different continuations: only 2.2% of PVs are identical, and 87% of divergent PVs never reconverge. Move-by-move agreement decays from 75.5% at ply 1 to just 9% by ply 10. These findings suggest the engines have learned different "strategic coordinate systems" for navigating chess positions—they agree on where positions stand but envision different paths forward.

1. Introduction

Our previous stability analysis established that Theoria16 at 500k nodes produces evaluations highly correlated with Stockfish 17 at 2M nodes (R² = 0.854, Spearman ρ = 0.996). But evaluation agreement raises a deeper question: do these engines recommend the same moves?

The principal variation (PV) represents an engine's predicted optimal continuation from a position. Comparing PVs reveals not just whether engines agree on position assessment, but whether they share the same strategic vision for how games should unfold.

This analysis examines 1,327 positions with substantial PVs from both engines, measuring agreement at each ply, tracking convergence patterns, and identifying systematic differences in strategic approach.

2. Move-by-Move Agreement

Agreement between PVs decays rapidly with depth:

Ply Positions Agreement Interpretation
1 1,327 75.5% First move recommendation
2 1,327 57.3% Expected response
3 1,327 46.5% Second move
4 1,319 34.4%
5 1,314 28.0% One full move pair in
7 1,287 17.3%
10 1,238 9.0% Five moves deep
15 1,016 3.2%
20 509 2.8% Ten moves deep

Key Finding: By ply 5, only 28% of PVs still match. By ply 10, agreement drops to 9%. The engines rapidly diverge into incompatible continuations.

3. Convergence Patterns

Once PVs diverge, do they ever reconverge to the same continuation? The data reveals a striking pattern:

Pattern Count Percentage
Complete agreement (identical PVs) 29 2.2%
Diverge and never reconverge 1,160 87.4%
Diverge then reconverge 138 10.4%

Key Finding: 87.4% of divergent PVs never reconverge. Once the engines choose different paths, they stay on different paths. When reconvergence does occur, it happens around ply 4-6 (median: 4.0).

4. Sequence Similarity Analysis

The Longest Common Subsequence (LCS) measures how many moves appear in both PVs in the same order, even if not consecutive. The LCS ratio normalizes this by average PV length:

LCS Ratio Count Percentage Interpretation
0.0 - 0.2 478 36.0% Very different variations
0.2 - 0.4 608 45.8% Mostly different
0.4 - 0.6 174 13.1% Partial overlap
0.6 - 0.8 26 2.0% Mostly similar
0.8 - 1.0 15 1.1% Nearly identical

Mean LCS ratio: 0.279 | Median: 0.235

Key Finding: 82% of PV pairs share less than 40% of their moves. These engines are calculating fundamentally different continuations from the same positions.

5. Divergence Location

Where in the PV does first disagreement typically occur?

First Disagreement Count Percentage
Ply 1 (first move) 325 24.5%
Ply 2 (first response) 325 24.5%
Ply 3-4 326 24.6%
Ply 5-6 149 11.2%
Ply 7+ 202 15.2%

Mean first divergence: ply 2.6 | Median: ply 2.0

Key Finding: Half of all divergences occur by move 2. The engines don't agree on a long forcing sequence then split—they disagree almost immediately on how to proceed.

6. Context of Divergence

6.1 What Precedes Divergence?

When engines agree on a move, then diverge on the next, what type of move preceded the split?

Move Type Count Percentage
Capture 188 19.3%
Pawn move 154 15.8%
King move 138 14.2%
Rook move 134 13.8%
Queen move 106 10.9%
Knight move 101 10.4%
Bishop move 84 8.6%
Check 45 4.6%

Captures most frequently precede divergence—the engines agree on whether to exchange but disagree on what to do after.

6.2 Agreement by Position Type

Does the evaluation magnitude affect agreement rates?

Position Type N Ply 1 Ply 3 Ply 5 Ply 7
Quiet (|eval| < 0.5) 146 79.5% 58.2% 36.1% 16.7%
Slight advantage (0.5-1.5) 131 89.3% 58.0% 38.9% 27.5%
Clear advantage (>1.5) 1,093 73.7% 44.2% 26.2% 16.7%

Engines agree most on first moves in slight advantage positions (89%), and disagree most in clear advantage positions (74%). When one side is winning, multiple paths lead to victory—engines find different ones.

7. Example Comparisons

Positions where PVs show partial overlap (40-70% LCS) reveal the character of engine disagreement:

Example 1: Endgame Technique

lichess.org/v2TySc3s — Move 51 (67% overlap)

Theoria16: Kg2 Qxd4 h3 a2 Bh5 a1=Q Bd1 Qaxd1 h4 Q4g1+ Kh3 e5#
Stockfish: Kg2 Qxd4 Kg3 a2 h3 a1=Q Bd1 Qaxd1 h4 Rg8+ Kh3 Qh1#

Both engines see the queen promotion and mating attack, but differ on move order (h3 vs Kg3) and final execution.

Example 2: Middlegame Strategy

lichess.org/IE6GFFS0 — Move 12 (65% overlap)

Theoria16: Nc6 Nc3 O-O Nb5 Qd5 Qe3 Qe6 Qg3 Bd6 Qh4 h6 Rxe6 hxg5 Qxg5 dxe6
Stockfish: Nc6 Nc3 O-O Nb5 Qd5 Qe3 Qe6 Qg3 Bd6 Qh4 h6 Rxe6 hxg5 Qxg5 fxe6

Remarkable: 14 moves of complete agreement, then a single-move divergence (dxe6 vs fxe6) leading to different pawn structures.

Example 3: Rook Endgame

lichess.org/MG5vqr0L — Move 34 (64% overlap)

Theoria16: Rf1+ Kh2 e3 Ra2 Rc1 Kg3 Rxc3 Kf3 Rd3 Ra8+ Kg7 Rc8 Rxd4
Stockfish: Rf1+ Kh2 e3 Ra2 Rc1 Kg3 Rxc3 Kf3 Rd3 Ra8+ Kg7 Ra7+ Kh6 Rc7 Rxd4

11 moves of agreement, then Stockfish inserts an extra check (Ra7+) before reaching the same position.

8. Discussion

8.1 Different Coordinate Systems

The data reveals a paradox: Theoria16 and Stockfish agree almost perfectly on position evaluation (ρ = 0.996) but disagree substantially on how to play from those positions (82% LCS < 0.4). They occupy the same evaluative space but navigate it differently.

This suggests the engines have learned different "strategic coordinate systems"—they agree on the destination (evaluation) but take different routes. The manifold of chess positions admits multiple valid navigation schemes.

8.2 Immediate Divergence

The finding that 49% of divergences occur by ply 2 challenges the intuition that engines agree on immediate tactics but differ on long-term plans. In fact, they often disagree immediately—on the very first or second move of the recommended line.

This may reflect different prioritization: Stockfish's tactical opportunism finds forcing sequences, while Theoria16's teleological training prefers moves that maintain strategic options.

8.3 The Non-Convergence Pattern

The 87% non-convergence rate is perhaps the most striking finding. Once engines diverge, they almost never find their way back to the same line. This suggests their disagreements aren't superficial (different move orders reaching the same position) but structural (genuinely different strategic visions).

The 10% of cases where PVs do reconverge tend to involve forced sequences—tactics that both engines eventually find, regardless of their different starting approaches.

8.4 Pedagogical Implications

For human players studying engine analysis, these findings suggest caution. Two highly-rated engines examining the same position may recommend entirely different plans. Neither is "wrong"—they represent different valid approaches to the position.

Theoria16's PVs, being trained on Lc0 evaluations, may better reflect strategic patterns recognizable to humans trained on classical chess literature. Stockfish's PVs, optimized for competitive play, may find objectively stronger but less pedagogically transparent continuations.

9. Conclusion

Despite near-perfect evaluation agreement, Theoria16 at 500k nodes and Stockfish 17 at 2M nodes recommend substantially different continuations. Move-by-move agreement decays from 75% to under 10% within the first ten plies. Over 87% of divergent PVs never reconverge, and 82% of PV pairs share less than 40% of their moves.

These findings support the interpretation that modern chess engines have learned different but equally valid ways of navigating the chess position space. They agree on static evaluation—the "height" of positions on the evaluation manifold—but disagree on the "paths" between them.

For practical analysis, this suggests that consulting multiple engines provides genuinely different strategic perspectives, not just confirmatory opinions. The "truth" of a chess position may be more plural than singular.

Reference Data


Methodology Note: Analysis performed on 1,327 positions with PVs of at least 3 moves from both engines. Games sourced from 100 Lichess games (Elo 1450-1550). Theoria16 analyzed at 500,000 nodes; Stockfish 17 at 2,000,000 nodes. Longest Common Subsequence calculated using dynamic programming. Agreement rates measured at each ply across all positions where both PVs extended to that depth.