Loading Production Book Data - internetarchive/openlibrary GitHub Wiki

How to Import Production Data into Your Local Environment

Your local development environment contains a minimal dataset, which is insufficient for testing features on real-world data or debugging issues with specific records.

This guide provides a step-by-step process for using the copydocs.py script to import live data for authors, works, and editions from openlibrary.org into your local instance.

Prerequisites

Before you begin, ensure you have:

  • A running local development environment (Docker).
  • The ID of the record you want to import. You can find this in the URL of any page on openlibrary.org (e.g., /authors/**OL1385865A**).

How to Import a Record

The import process requires two commands: one to enter your web container and one to run the script.

1. Connect to the Web Container

From the root of your project directory, open an interactive shell inside the web Docker container:

docker compose exec -e PYTHONPATH=. web bash

2. Run the copydocs.py Script

Once you have a shell session inside the container, run the script with the ID of the record you want to import.

# General syntax
./scripts/copydocs.py <record_id>

Replace <record_id> with the identifier from the Open Library URL.

Examples

The script is smart: when you import a work or an edition, it automatically fetches all related records, such as its authors.

Example 1: Import a Single Author

This imports only the author record, without their associated works.

# Imports the author record for J. K. Rowling (OL23919A)
./scripts/copydocs.py /authors/OL23919A

Example 2: Import a Work

This imports the work and automatically includes its authors and all its editions.

# Imports the work "The Hobbit", its author J. R. R. Tolkien, and all editions
./scripts/copydocs.py /works/OL27482W

Example 3: Import an Edition

This imports a specific edition and automatically includes its parent work and all associated authors.

# Imports an edition of "Dune", the parent work "Dune", and its author Frank Herbert
./scripts/copydocs.py /books/OL26242482M

Advanced: Importing Multiple Records at Once

To import a batch of records matching a specific query, use the --search and --search-limit flags.

# Imports up to 100 works by the author Edith Nesbit (OL18053A)
./scripts/copydocs.py --search 'author_key:OL18053A' --search-limit 100

Optional: Using Production Book Covers

By default, your local environment does not have access to the production book cover images. If you are testing functionality that involves covers, you can temporarily point your local instance to the production cover service.

  1. Open the configuration file: conf/openlibrary.yml.
  2. Locate the coverstore_url setting.
  3. Change its value to the production URL:
    coverstore_url: https://covers.openlibrary.org

⚠️ Important

To prevent accidental data corruption on the live site, do not attempt to upload new covers while connected to the production service.

Remember to revert this setting to its original value after you finish testing.

⚠️ **GitHub.com Fallback** ⚠️