UsefulData - xoto10/fishtest GitHub Wiki
Collection of useful data concerning SF
Contempt measurements
Data that shows dependance of elo difference between SFdev of october 2018 and older versions of Stockfish depending on contempt value (The SFdev used is approx. 40Elo above SF9). Upper and lower bounds represent value with maximum error.
opponent | STC | LTC |
---|---|---|
7 | ||
8 | ||
9 |
Full data with values https://docs.google.com/spreadsheets/d/1R_eopD8_ujlBbt_Q0ygZMvuMsP1sc4UyO3Md4qL1z5M/edit#gid=1878521689
Depth vs. TC
roughly speaking completedDepth is ~16 at STC, and ~21 at LTC around move 25.
Obviously, this depends a bit on hardware, and the variance is pretty large (sigma=+-3)
Elo change with respect to TC
Here is the result of some scaling tests with the 2moves book. 40000 games each (STC=10+0.1, LTC=60+0.6)
sf7->sf8 | sf8->s9 | sf9->sf10 | |
---|---|---|---|
elo STC | 95.91 +-2.3 | 58.28 +-2.3 | 71.03 +-2.4 |
elo LTC | 100.40 +-2.1 | 68.55 +-2.1 | 65.55 +-2.2 |
So we see that the common wisdom that increased TC causes elo compression is not always true.
see https://github.com/official-stockfish/Stockfish/issues/1859#issuecomment-449624976
TC dependence of certain terms in search
discussed here https://github.com/official-stockfish/Stockfish/pull/2401#issuecomment-552768526
Elo contributions from various evaluation terms
See spreadsheet at: https://github.com/official-stockfish/Stockfish/files/3828738/Stockfish.Feature.s.Estimated.Elo.worth.1.xlsx
*The estimated elo worth for various features might be outdated, or might get outdated soon.
Elo gain using syzygy
Tested at 10+0.1, with all syzygy WDL files on tmpfs (i.e. RAM), testing using none(0), 4, 5, and 6 man TB in a round-robin tournament (SF10dev).
Rank | Name | Elo | +/- | Games | Score | Draws |
---|---|---|---|---|---|---|
1 | syzygy6 | 13 | 2 | 82591 | 51.8% | 59.5% |
2 | syzygy5 | 2 | 2 | 82590 | 50.3% | 59.4% |
3 | syzygy4 | -7 | 2 | 82591 | 49.0% | 59.3% |
4 | syzygy0 | -7 | 2 | 82592 | 48.9% | 59.4% |
Tested at 60+0.6, with all syzygy WDL files on tmpfs (i.e. RAM), testing using none(0) against 6 man TB:
Score of syzygy6 vs syzygy0: 4084 - 3298 - 18510 [0.515] 25892 Elo difference: 10.55 +/- 2.25
Elo from threading
LTC
Playing 8 threads vs 1 thread at LTC (60+0.6, 8moves_v3.pgn):
Score of t8 vs seq: 476 - 3 - 521 [0.737] 1000
Elo difference: 178.6 +/- 14.0, LOS: 100.0 %, DrawRatio: 52.1 %
Playing 1 thread at 8xLTC (480+4.8) vs (60+0.6) (8moves_v3.pgn):
Score of seq8 vs seq: 561 - 5 - 434 [0.778] 1000
Elo difference: 217.9 +/- 15.8, LOS: 100.0 %, DrawRatio: 43.4 %
Which is roughly 82% efficiency (178/218).
STC
Playing 8 threads vs 1 thread at STC (10+0.1):
Score of threads vs serial: 1606 - 15 - 540 [0.868] 2161
Elo difference: 327.36 +/- 14.59
Playing 8 threads @ 10+0.1 vs 1 thread @ 80+0.8:
Score of threads vs time: 348 - 995 - 2104 [0.406] 3447
Elo difference: -66.00 +/- 7.15
So, 1 -> 8 threads has about 83% scaling efficiency (327 / (327 + 66)) using this test.
Elo from speedups
for small speedups (<~5%) the linear estimate can be used that gives Elo gain as a function of speedup percentage (x) as:
Elo_stc(x) = 2.10 x
Elo_ltc(x) = 1.43 x
To have 50% passing chance at STC{-0.5,1.5}, we need a 0.24% speedup, while at LTC{0.25,1.75} we need 0.70% speedup. A 1% speedup has nearly 85% passing chance at LTC.
and raw data:
tc 10+0.1:
16 32.42 3.06
8 13.67 3.05
4 8.99 3.04
2 3.52 3.05
tc 60+0.6:
16 20.85 2.59
8 12.20 2.57
4 4.67 2.57
note that numbers will depend on the precise hardware. The model was verified quite accurately on fishtest see https://github.com/locutus2/Stockfish/commit/82958c97214b6d418e5bc95e3bf1961060cd6113#commitcomment-38646654
Distribution of lengths of games at LTC (60+0.6) on fishtest
in a collection of a few million games, the longest was 902 plies.
Win-Loss-Draw statistics of LTC games on fishtest
The following graphs give information on the Win-Loss-Draw (WLD) statistics, relating them to score, move number, and material count. They answer the question 'What fraction of positions that have a given score + (move number/ material) in fishtest LTC, have a Win a Loss or a Draw ?'.
for all positions
for positions grouped by move number
Win | Draw |
---|---|
for positions grouped by material value (summing pieces using values 1, 3, 3, 5, 9)
Win | Draw |
---|---|
Elo gain with time odds
see also https://github.com/official-stockfish/Stockfish/discussions/3402
One year of NNUE speed improvements
Presents nodes per second (nps) measurements for all SF version between the first NNUE commit (SF_NNUE, Aug 2th 2020) and end of July 2021 on a AMD Ryzen 9 3950X compiled with make -j ARCH=x86-64-avx2 profile-build
. The last nps reported for a depth 22 search from startpos using NNUE (best over about 20 measurements) is shown in the graph. For reference, the last classical evaluation (SF_classical, July 30 2020) has 2.30 Mnps.