Database codec SCID5 - benini/scid GitHub Wiki
A database for chess games can be stored in different ways. This page describes the SCID5 codec, an high-perfomance codec aiming for maximum speed and minimum size.
Index file (extension .si5)
This file stores the Game table plus StoredLineID, FinalMaterial and HomePawn.
Each game is packed into a record with a fixed size (56 bytes). Using a fixed size have some advantages, mainly the ability to overwrite a record in the middle of the file, but creates a few limit:
- Maximum number of games: 4 billion
- Maximum size of the games' data file (.sg5): 128TB
- Maximum size of each game's data (extra tags, moves, comments): 128KB
- Maximum number of unique player names: 268 million (2^28 = 268435456)
- Maximum number of unique values for the tag "Event": 268 million (2^28).
- Maximum number of unique values for the tag "Round": 2 billion (2^31).
NameBase file (extension .sn5)
This file stores the DatabaseInfo, Player, Event, Round, Site tables.
The data is just a sequence of strings with an associated type (PLAYER, EVENT, SITE, ROUND, DB_INFO).
The string are encoded as a varint (length * 8 + type) followed by the data.
For example the tag pair [Event "Olympic games"]
is stored as iOlympic games
(length 13 * 8 + type 1 = 105 = ascii char i).
It is not necessary to store the IDs because new strings can be appended to the file, but older records are never changed.