Understanding The Data Model - internetarchive/openlibrary GitHub Wiki

Infogami

Each Infogami page (i.e. something with a URL) has an associated type. Each type contains a schema that states what fields can be used with it and what format those fields are in. Those are used to generate view and edit templates which can then be further customized as a particular type requires.

Infogami provides a generic way through it's wiki to create new types as needed.

Infogami Database Schema

Aside from the tables listed here, Open Library in essence only really has only two database tables. By default they will have the same pretty basic functionality through Infogami

Thing table

The thing table defines types like editions, works authors, users, languages. The thing table also keeps track of instances of things by their identifiers it basically registers their IDs in the table as an instance.

Entries in a sample thing table

id key type latest_revision created last_modified
2 /type/key 1 1 2013-03-20 10:27:01.322813 2013-03-20 10:27:01.322813
3 /type/string 1 1 2013-03-20 10:27:01.322813 2013-03-20 10:27:01.322813
4 /type/text 1 1 2013-03-20 10:27:01.322813 2013-03-20 10:27:01.322813
5 /type/int 1 1 2013-03-20 10:27:01.322813 2013-03-20 10:27:01.322813

Data table

The data table on the other hand maps one of these types to all of the data associated with it Infogami provides a generic way through it's wiki to create new types as are needed

Entry in a sample data table

thing_id revision data
1 1 {"created": {"type": "/type/datetime", "value": "2013-03-20T10:27:01.223351"}, "last_modified": {"type": "/type/datetime", " value": "2013-03-20T10:27:01.223351"}, "latest_revision": 1, "key": "/type/type", "type": {"key": "/type/type"}, "id": 1, "revision": 1}

Read further about Infogami and type on :

https://openlibrary.org/dev/docs/infogami

Open Library Feature Tables

Open Library has a number of additional tables that are used to support a variety of features. The DDL for these tables can be found here.

Screenshot from 2023-12-14 11-19-48

bookshelves and bookshelves_books

Screenshot from 2023-12-14 11-20-02

These tables are used to store the books that patrons have on their "Want to Read", "Currently Reading", and "Already Read" reading log shelves. The bookshelves_books table holds most of this data, with bookshelves acting as a look-up table for shelf names.

bookshelves.py provides functions which interact with the reading log tables.

yearly_reading_goals

yearly_reading_goals

This table stores the target number of books that a patron commits to reading in a given year. Functions which interact with the yearly_reading_goals table can be found in yearly_reading_goals.py.

bookshelves_events

check_ins

A patron can track the last date that they have finished any book that is on their "Already Read" shelf. The bookshelves_events table stores these dates, and may later be used to store other dates that a patron may want to track (date they started reading the book, start and finish dates of other times that they have read a book, etc.).

Related code can be found in bookshelves_events.py.

observations

observations

Patron's can give structured reviews of books by attaching any number of pre-defined tags to a work. These are stored in the observations table.

The code that interacts with this table, as well as the definitions for the tags, are found in observations.py.

booknotes

booknotes

A patron can add private notes that only they can read to any work. The booknotes table stores these notes. booknotes.py contains the code that interacts with this table.

ratings

ratings

Patrons can submit a star rating for a work. The ratings table holds these star ratings. Consult ratings.py for related code.

community_edits_queue

merge_queue

This table holds librarian requests, which in turn are used to populate the librarian request table at https://openlibrary.org/merges. Code which interacts directly with thus table can be found in edits.py.

Understanding web.ctx.site

web.py (the python micro-web framework we use, similar to flask) maintains a ctx variable which maintains the context of the system during/across a request. Web.py also has a web.db connection to our postgres database.

infogami sits on top of web.py -- it's like a battery pack. One piece of infogami is called infobase which behaves like an ORM (db wrapper) to allow us to define arbitrary data types like works, editions, authors, etc.

At the simplest level, Infobase works by relying on 2 tables: things and data.

things gives every object in our system and ID, a type, and a reference to its data in the data table.

data is just a massive catalog of json data that can be references by querying and joining things

infogami injects a utility called site into web.py's ctx (https://webpy.org/cookbook/ctx) variable (ctx maintains information and connections specific to the current client). The site utility handles all the joins for you so you can request and key from the things table, fetch all its corresponding data, and also leverage and models and functions we have defined for that thing's type.