UsefulData - xoto10/fishtest GitHub Wiki

Collection of useful data concerning SF

Contempt measurements

Data that shows dependance of elo difference between SFdev of october 2018 and older versions of Stockfish depending on contempt value (The SFdev used is approx. 40Elo above SF9). Upper and lower bounds represent value with maximum error.

opponent	STC	LTC
7
8
9

Full data with values https://docs.google.com/spreadsheets/d/1R_eopD8_ujlBbt_Q0ygZMvuMsP1sc4UyO3Md4qL1z5M/edit#gid=1878521689

Depth vs. TC

roughly speaking completedDepth is ~16 at STC, and ~21 at LTC around move 25.

Obviously, this depends a bit on hardware, and the variance is pretty large (sigma=+-3)

Elo change with respect to TC

Here is the result of some scaling tests with the 2moves book. 40000 games each (STC=10+0.1, LTC=60+0.6)

	sf7->sf8	sf8->s9	sf9->sf10
elo STC	95.91 +-2.3	58.28 +-2.3	71.03 +-2.4
elo LTC	100.40 +-2.1	68.55 +-2.1	65.55 +-2.2

So we see that the common wisdom that increased TC causes elo compression is not always true.

see https://github.com/official-stockfish/Stockfish/issues/1859#issuecomment-449624976

TC dependence of certain terms in search

discussed here https://github.com/official-stockfish/Stockfish/pull/2401#issuecomment-552768526

Elo contributions from various evaluation terms

See spreadsheet at: https://github.com/official-stockfish/Stockfish/files/3828738/Stockfish.Feature.s.Estimated.Elo.worth.1.xlsx

*The estimated elo worth for various features might be outdated, or might get outdated soon.

Elo gain using syzygy

Tested at 10+0.1, with all syzygy WDL files on tmpfs (i.e. RAM), testing using none(0), 4, 5, and 6 man TB in a round-robin tournament (SF10dev).

Rank	Name	Elo	+/-	Games	Score	Draws
1	syzygy6	13	2	82591	51.8%	59.5%
2	syzygy5	2	2	82590	50.3%	59.4%
3	syzygy4	-7	2	82591	49.0%	59.3%
4	syzygy0	-7	2	82592	48.9%	59.4%

Tested at 60+0.6, with all syzygy WDL files on tmpfs (i.e. RAM), testing using none(0) against 6 man TB:

Score of syzygy6 vs syzygy0: 4084 - 3298 - 18510 [0.515] 25892 Elo difference: 10.55 +/- 2.25

Elo from threading

LTC

Playing 8 threads vs 1 thread at LTC (60+0.6, 8moves_v3.pgn):

Score of t8 vs seq: 476 - 3 - 521  [0.737] 1000
Elo difference: 178.6 +/- 14.0, LOS: 100.0 %, DrawRatio: 52.1 %

Playing 1 thread at 8xLTC (480+4.8) vs (60+0.6) (8moves_v3.pgn):

Score of seq8 vs seq: 561 - 5 - 434  [0.778] 1000
Elo difference: 217.9 +/- 15.8, LOS: 100.0 %, DrawRatio: 43.4 %

Which is roughly 82% efficiency (178/218).

STC

Playing 8 threads vs 1 thread at STC (10+0.1):

Score of threads vs serial: 1606 - 15 - 540  [0.868] 2161
Elo difference: 327.36 +/- 14.59

Playing 8 threads @ 10+0.1 vs 1 thread @ 80+0.8:

Score of threads vs time: 348 - 995 - 2104  [0.406] 3447
Elo difference: -66.00 +/- 7.15

So, 1 -> 8 threads has about 83% scaling efficiency (327 / (327 + 66)) using this test.

Elo from speedups

for small speedups (<~5%) the linear estimate can be used that gives Elo gain as a function of speedup percentage (x) as:

Elo_stc(x) = 2.10 x
Elo_ltc(x) = 1.43 x

To have 50% passing chance at STC{-0.5,1.5}, we need a 0.24% speedup, while at LTC{0.25,1.75} we need 0.70% speedup. A 1% speedup has nearly 85% passing chance at LTC.

and raw data:

tc 10+0.1:
16   32.42  3.06
 8   13.67  3.05
 4    8.99  3.04
 2    3.52  3.05

tc 60+0.6:
16   20.85  2.59
 8   12.20  2.57
 4    4.67  2.57

note that numbers will depend on the precise hardware. The model was verified quite accurately on fishtest see https://github.com/locutus2/Stockfish/commit/82958c97214b6d418e5bc95e3bf1961060cd6113#commitcomment-38646654

Distribution of lengths of games at LTC (60+0.6) on fishtest

in a collection of a few million games, the longest was 902 plies.

Win-Loss-Draw statistics of LTC games on fishtest

The following graphs give information on the Win-Loss-Draw (WLD) statistics, relating them to score, move number, and material count. They answer the question 'What fraction of positions that have a given score + (move number/ material) in fishtest LTC, have a Win a Loss or a Draw ?'.

for all positions

for positions grouped by move number

Win	Draw

for positions grouped by material value (summing pieces using values 1, 3, 3, 5, 9)

Win	Draw

Elo gain with time odds

One year of NNUE speed improvements

Presents nodes per second (nps) measurements for all SF version between the first NNUE commit (SF_NNUE, Aug 2th 2020) and end of July 2021 on a AMD Ryzen 9 3950X compiled with make -j ARCH=x86-64-avx2 profile-build. The last nps reported for a depth 22 search from startpos using NNUE (best over about 20 measurements) is shown in the graph. For reference, the last classical evaluation (SF_classical, July 30 2020) has 2.30 Mnps.