Loading Production Book Data - internetarchive/openlibrary GitHub Wiki
Your local development environment contains a minimal dataset, which is insufficient for testing features on real-world data or debugging issues with specific records.
This guide provides a step-by-step process for using the copydocs.py
script to import live data for authors, works, and editions from openlibrary.org into your local instance.
Before you begin, ensure you have:
- A running local development environment (Docker).
- The ID of the record you want to import. You can find this in the URL of any page on openlibrary.org (e.g.,
/authors/**OL1385865A**
).
The import process requires two commands: one to enter your web
container and one to run the script.
1. Connect to the Web Container
From the root of your project directory, open an interactive shell inside the web
Docker container:
docker compose exec -e PYTHONPATH=. web bash
2. Run the copydocs.py
Script
Once you have a shell session inside the container, run the script with the ID of the record you want to import.
# General syntax
./scripts/copydocs.py <record_id>
Replace <record_id>
with the identifier from the Open Library URL.
The script is smart: when you import a work or an edition, it automatically fetches all related records, such as its authors.
This imports only the author record, without their associated works.
# Imports the author record for J. K. Rowling (OL23919A)
./scripts/copydocs.py /authors/OL23919A
This imports the work and automatically includes its authors and all its editions.
# Imports the work "The Hobbit", its author J. R. R. Tolkien, and all editions
./scripts/copydocs.py /works/OL27482W
This imports a specific edition and automatically includes its parent work and all associated authors.
# Imports an edition of "Dune", the parent work "Dune", and its author Frank Herbert
./scripts/copydocs.py /books/OL26242482M
To import a batch of records matching a specific query, use the --search
and --search-limit
flags.
# Imports up to 100 works by the author Edith Nesbit (OL18053A)
./scripts/copydocs.py --search 'author_key:OL18053A' --search-limit 100
By default, your local environment does not have access to the production book cover images. If you are testing functionality that involves covers, you can temporarily point your local instance to the production cover service.
- Open the configuration file:
conf/openlibrary.yml
. - Locate the
coverstore_url
setting. - Change its value to the production URL:
coverstore_url: https://covers.openlibrary.org
⚠️ ImportantTo prevent accidental data corruption on the live site, do not attempt to upload new covers while connected to the production service.
Remember to revert this setting to its original value after you finish testing.