Fishtest Mathematics - official-stockfish/fishtest GitHub Wiki

Statistical Methods and Algorithms in Fishtest

This document outlines the core statistical models, testing methodologies, and algorithms employed by Fishtest for chess engine evaluation and parameter tuning.

Engine Match Modeling

Fishtest utilizes advanced models to accurately represent the outcome probabilities in chess engine matches:

  • Pentanomial Model:
    • Fishtest employs the pentanomial model for analyzing engine match results.
    • This model considers five outcomes (e.g., ll, ld, dd/wl, wd, ww), providing a more accurate representation than the traditional trinomial model (win, draw, loss).
    • Benefit: Leads to a substantial saving of testing resources compared to the trinomial model.
  • Opening Book Bias Estimation:
    • The difference between the results predicted by the pentanomial and trinomial models allows for an estimation of the Root Mean Square (RMS) value of biases present in the opening book.
    • This calculation is based on the accounting identity [archive].

Sequential Testing Framework (GSPRT)

For efficient sequential testing (determining if a change is statistically significant as quickly as possible), Fishtest uses the Generalized Sequential Probability Ratio Test (GSPRT):

  • Core Principle (GSPRT):
    • Fishtest uses the GSPRT, a generalization of the standard SPRT.
    • In GSPRT, unknown parameters within the statistical hypotheses (H0 and H1) are replaced by their maximum likelihood estimates.
    • It's shown (loc. cit.) that GSPRT asymptotically behaves like SPRT when the error probabilities approach zero.
    • The asymptotic guarantees of GSPRT do not depend on the Elo differences being tested being small.
  • Normalized Elo for Bounds:
    • Test bounds in Fishtest are expressed in terms of "normalized Elo".
    • Advantage: This normalization makes the expected duration of a test dependent only on the chosen bounds, independent of the specific draw ratio or opening book used.
    • See this document [archive] for details.
  • Elo Estimation from GSPRT Results:
    • To estimate Elo differences from completed GSPRT tests, Fishtest uses formula (6.1) in this document [archive].
    • The underlying model approximates the GSPRT process as a random walk, specifically using formula (2.1) from this document [archive].
    • This approximation essentially treats the GSPRT like an SPRT for the drift of a Brownian motion, where the infinitesimal variance is estimated from the sample data.
  • SPRT Calculator and Resource Estimation:

Parameter Tuning

Fishtest uses the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm for parameter tuning tasks.

Quality Control

To ensure the reliability of test results contributed by distributed workers, Fishtest uses a χ2 (Chi-squared) test to detect statistically anomalous workers whose results deviate significantly from the norm.

Validation and Simulation

Many of the formulas and statistical models used in Fishtest have been validated through simulation:

⚠️ **GitHub.com Fallback** ⚠️