Database codec SCID5 - benini/scid GitHub Wiki

A database for chess games can be stored in different ways. This page describes the SCID5 codec, an high-perfomance codec aiming for maximum speed and minimum size.

SCID5 database structure

Index file (extension .si5)

This file stores the Game table plus StoredLineID, FinalMaterial and HomePawn.
Each game is packed into a record with a fixed size (56 bytes). Using a fixed size have some advantages, mainly the ability to overwrite a record in the middle of the file, but creates a few limit:

  • Maximum number of games: 4 billion
  • Maximum size of the games' data file (.sg5): 128TB
  • Maximum size of each game's data (extra tags, moves, comments): 128KB
  • Maximum number of unique player names: 268 million (2^28 = 268435456)
  • Maximum number of unique values for the tag "Event": 268 million (2^28).
  • Maximum number of unique values for the tag "Round": 2 billion (2^31).

NameBase file (extension .sn5)

This file stores the DatabaseInfo, Player, Event, Round, Site tables.
The data is just a sequence of strings with an associated type (PLAYER, EVENT, SITE, ROUND, DB_INFO).
The string are encoded as a varint (length * 8 + type) followed by the data.
For example the tag pair [Event "Olympic games"] is stored as iOlympic games (length 13 * 8 + type 1 = 105 = ascii char i).
It is not necessary to store the IDs because new strings can be appended to the file, but older records are never changed.

Games file (extension .sg5) [TODO]

Compacting the database [TODO]