CB2 Database - lil-lab/cb2 GitHub Wiki

CB2 Database Overview

CB2 uses Sqlite as a database backend because it's fast, optimized, and commonly used in other open source software. As an interface to the backend, we use Peewee as the ORM. This is a high-level interface for querying the database in Python.

CB2 Database location

By default, the CB2 database is located in a directory named cb2-game-dev under the user app directory returned by python library appdirs. You can also specify a custom location for the database by modifying the data_prefix field in the config

When starting a CB2 server, it prints out the location of the database. For example, the Database path line here:

❯ python3 -m cb2game.server.main --config_filepath="server/config/local-covers-config.yaml"
[2023-06-30 12:08:36,951] root INFO [main:main:1391] Config file parsed.
[2023-06-30 12:08:36,952] root INFO [main:main:1392] data prefix:
[2023-06-30 12:08:36,952] root INFO [main:main:1393] Log directory: /Users/username/Library/Application Support/cb2-game-dev/game_records
[2023-06-30 12:08:36,952] root INFO [main:main:1394] Assets directory: /Users/username/Library/Application Support/cb2-game-dev/assets
[2023-06-30 12:08:36,952] root INFO [main:main:1395] Database path: /Users/username/Library/Application Support/cb2-game-dev/game_data.db

You can also find the location of the database by invoking the provided db_location script:

❯ python3 -m cb2game.server.db_location --config_filepath="server/config/local-covers-config.yaml"
Database path: /Users/username/Library/Application Support/cb2-game-dev/game_data.db

Downloading CB2 data from the server.

The CB2 server provides a password-protected interface to download all server data. Simply navigate to http://hostname:ip/data/download and enter the password. This will download a zip file containing the database and all other server data. You can then pair this with the appropriate config file to use with CB2's provided database utilities. To download a server's configuration file, see http://hostname:ip/data/config -- however you'll need to make a change to the config file before using it locally (see next section) so that it points to the local database.

Creating a config file for downloaded data.

Most CB2 utilities work with config files instead of taking a filepath to the database directly. The config file contains the path to the database, as well as other server configuration. To create a config file for downloaded data, you can start with:

❯ python3 -m cb2game.server.generate_config --all_defaults

This will create a config file named default.yaml in the current directory. You then want to modify the line:

data_prefix: ''

To instead point to the directory containing the database. For example, if you downloaded the database to /Users/username/Downloads/cb2-data/game_data.db, you would modify the line to:

data_prefix: '/Users/username/Downloads/cb2-data'

If you didn't save the database as game_data.db, you should also modify the database_path_suffix field to point to the correct file. For example, if you saved the database as mydb.db, you would modify the line to:

database_path_suffix: 'mydb.db'

Though having the original config can be desirable to creating your own. For example, when running Eval, using the original server config can be useful for reproducibility of eval results. If you do download the config file from the server, you'll need to modify data_prefix to point to the local database.

Manually browsing the data.

There's a few ways to inspect the released data.

CB2 Server Instance

The best way to view the database is to use the /view/games URL endpoint on the original server instance. You can also always create your own config file that points to the downloaded DB and launch a server instance locally with that. See section titled Creating a config file for downloaded data for instructions on creating a config file.

Sqlite DB Browser

If you just want to browse the data, we highly recommend using Sqlite DB Browser. This can be used to view the database directly and makes it simple to manually peruse the records. The experience is much better than reading a raw JSON file. First, choose a game in the Game table, then go to the Event table and filter all records by the game ID.

JSON

CB2 doesn't use JSON by default for a number of reasons:

  • Sqlite DB file takes up far less space than JSON
  • Sqlite is much faster than JSON
  • Sqlite is more flexible than JSON (e.g. you can query it efficiently)
  • Sqlite integrates with CB2 better.

If you still want to use JSON, we released our dataset in both sqlite and JSON (in the same release). If you're using CB2 to collect your own dataset, you can easily convert from our DB format to JSON using the db_to_json.py script:

python3 -m cb2game.server.db_tools.db_to_json path/to/cb2-data-base/human_human/game_data.db OUTPUT.json  --pretty=True

You can optionally enable pretty printing (--pretty=True)

Since the JSON format takes up so much space, you might want to filter games before exporting to JSON, via:

python3 -m cb2game.server.db_tools.filter_db_games path/to/cb2-data-base/human_human/game path/to/game_ids_to_keep.txt

Where game_ids_to_keep is a text file that contains a comma-separated list of game IDs to keep in the database. Note that this makes destructive changes to the Sqlite database, so you should make a copy of the database before running this script.

For more on the JSON format, see src/cb2game/server/db_tools/db_to_json.py. The JSON format is nearly identical to the Sqlite schema, which is described below.

Writing Software to access the database.

In order to make use of the CB2 database, you need to connect to it. First, you need the filepath to the database. Then, connect to it with:

from cb2game.server.schemas import base

base.SetDatabase(config)
base.ConnectDatabase()

Sqlite Database Schema

The database schema is documented in depth in src/cb2game/server/schemas/game.py and src/cb2game/server/schemas/events.py.

In particular, Each Game is recorded as a series of Events. The Event schema looks like:

class Event(BaseModel):
    """Game event record.

    In CB2, games are recorded as lists of events. Each event is a single
    atomic change to the game state. Events are stored in an sqlite database.

    This class is used to store events in the database. Events are generated by
    the game_recorder.py class. See server/game_recorder.py for that.

    """

    # A UUID unique identifying this event. Unique across database.
    id = UUIDField(primary_key=True, default=uuid.uuid4, unique=True)
    # Pointer to the game this event occurred in
    game = ForeignKeyField(Game, backref="events")
    # Event type. See EventType enum above for the meaning of each value.
    type = IntegerField(default=EventType.NONE)
    # The current turn number. Each turn consists of a leader portion and a follower portion.
    turn_number = IntegerField(null=True)
    # A monotonically increasing integer. Events which happen considered
    # simultaneous in-game are given the same tick. For example, a character
    # stepping on a card and the card set completion event occurring.
    tick = IntegerField()
    # Server time when the event was creating. UTC.
    server_time = DateTimeField(default=datetime.datetime.utcnow)
    # Not currently populated. Local clock of the client when event occurred.
    # Determined by packet transmissions time which is reported by the client.
    # Nullable.  UTC.
    client_time = DateTimeField(null=True)
    # Who triggered the event. See EventOrigin enum above.
    origin = IntegerField(default=EventOrigin.NONE)
    # Who's turn it is, currently.
    role = TextField(default="")  # 'Leader' or 'Follower'
    # If an event references a previous event, it is linked here.  The exact
    # meaning depends on the type of the event. See EventType documentation
    # above for more.
    parent_event = ForeignKeyField("self", backref="children", null=True)
    # A JSON-parseable string containing specific data about the event that
    # occurred. For format for each Event, see EventType documentation above.
    # For every event type with a data field, there's a python dataclass you can
    # import and use to parse data. We use mashumaro for parsing. Example for a
    # map update event:
    #
    #  from cb2game.server.schemas.event import Event, EventType
    #  from cb2game.server.messages.map_update import MapUpdate
    #  # Some peewee query.
    #  map_event = Event.select().where(type=EventType.MAP_UPDATE, ...).get()
    #  map_update = MapUpdate.from_json(map_event.data)
    #
    #  This gives you a CB2 MapUpdate object, defined in
    #  server/messages/map_update.py.
    data = TextField(null=True)
    # A brief/compressed or human readable representation of the event, if
    # possible.  Only defined for some event types, see EventType above for more
    # documentation.
    short_code = TextField(null=True)
    # If applicable, the "location" of an event. For moves, this is the location
    # *before* the action occurred.  For live feedback, this is the follower
    # location during the live feedback.
    location = HecsCoordField(null=True)
    # If applicable, the "orientation" of the agent. For moves, this is the
    # location *before* the action occurred.  For live feedback, this is the
    # follower orientation during the live feedback.
    orientation = IntegerField(null=True)

Each event's type dictates which game action it refers to.

class EventType(IntEnum):
    """Each event is tagged with a type. The type determines what the event is trying to signal.

    ... (comments documenting this enum redacted)

    """

    NONE = 0
    MAP_UPDATE = 1
    INITIAL_STATE = 2
    TURN_STATE = 3
    START_OF_TURN = 4
    PROP_UPDATE = 5
    CARD_SPAWN = 6
    CARD_SELECT = 7
    CARD_SET = 8
    ...

In the source code, each type of this Enum is heavily documented, with instructions on how to decode the Event.data field for each event type. As an example, let's look at some common message types:

Instructions

In the game of CB2, an instruction can generate up to 3 events:

  1. INSTRUCTION_SENT: When the event is sent by the leader. It then gets loaded into the queue of instructions.
  2. INSTRUCTION_ACTIVATED: When an instruction reachs the front of the instruction queue, it is activated.
  3. INSTRUCTION_DONE/INSTRUCTION_CANCELLED: Either the instruction is marked as completed by the follower or it is cancelled by the leader. Note that this last event is optional -- if the game ends, then an event may be neither cancelled nor completed.

Here, the "parent_event" field of the Event schema is used to link the INSTRUCTION_ACTIVATED/INSTRUCTION_DONE/INSTRUCTION_CANCELLED events to the INSTRUCTION_SENT event.

Only INSTRUCTION_SENT contains data about the instruction. This data is stored in the Event.data field, and is a JSON-parseable string in the format of the Objective dataclass in src/cb2game/server/messages/objective.py.

To decode the instruction data, you can use the Objective.from_json("...") method.

Note that Objectives have their own UUIDs, separate from the UUID in the Event record. The Objective UUID is saved in the Event.shortcode of each instruction-related event.

Example:

    from cb2game.server.schemas.event import Event, EventType
    from cb2game.server.schemas.game import Game
    from cb2game.server.messages.objective import ObjectiveMessage

    # Select all games.
    games = Game.select()
    for game in games:
        instructions = Event.select().where(
            Event.type == EventType.INSTRUCTION_SENT
        ).join(Game).where(Event.game == game)
        for instruction in instructions:
            decoded_message = ObjectiveMessage.from_json(instruction.data)
            words.update(decoded_message.text.split(" "))
            instruction_list.append(instruction.text)

Actions

Actions record movements made by the leader and the follower. Actions are marked with an event type of ACTION The action dataclass is relatively straightforward. It's located in src/cb2game/server/messages/action.py. Like with instructions, the Action dataclass contains a Action.from_json("...") method for decoding the Event.data field. Events containing actions also make use of the position_before and orientation_before fields of the Event record. The displacement field of the Action dataclass keeps track of the hexagonal displacement of each move. For more on how hexagonal coordinates are handled in CB2, see the server-side implementation of HecsCoord in src/cb2game/server/hex.py. You can also learn more about the Hexagon efficient coordinate system here.

Map Updates

Map updates are broadcast whenever the game changes. While a normal CB2 game only ever has 1 static map, the map can be changed in custom Scenarios by an attached monitoring script.

Map updates are marked with an event type of MAP_UPDATE. The MapUpdate dataclass is located in src/cb2game/server/messages/map_update.py

To decode a map update, pass the Event.data field to MapUpdate.from_json("..."). The MapUpdate class looks like:

@dataclass
class MapUpdate(DataClassJSONMixin):
    rows: int
    cols: int
    # Tiles are flattened into a linear list.
    # Here's some useful things you can do with a tile:
    # tile.asset_id: Asset ID of the tile.
    # tile.cell.coord: HECS coordinate of the tile.
    # tile.cell.coord.to_offset_coordinates(): (x, y) coords in hex grid.
    # tile.cell.boundary: Walkable boundary of the tile (edges).
    # tile.cell.layer: Z-layer of the tile (water/ground/mountains).
    # tile.rotation_degrees: Rotation of the tile's asset.
    tiles: List[Tile]
    metadata: Optional[MapMetadata] = field(default_factory=MapMetadata)
    fog_start: Optional[int] = None
    fog_end: Optional[int] = None
    # Used in custom scenarios to tint the map.
    color_tint: Color = Color(0, 0, 0, 0)

The map information is stored in a list of Tile objects. Metadata from the map generation algorithm is stored in metadata, and contains a high-level description of the overall map structure. This is used, for example, by gpt_follower.py to create a text-only description of the map to feed to GPT-3.5 and GPT-4. This is specifically implemented in src/cb2game/pyclient/client_utils.py, and is very experimental at the moment.