Data Importing

Do you have a data set of 10 or even 10,000 books you'd like to import into the Open Library catalog?

This documentation is intended for developers writing scripts to programmatically import books and authors into Open Library using the Import APIs.

Tip

If you are a librarian and you want to add a new book catalog entry, refer to the guide on Importing a Book Record Manually.

Tip

If you're a maintainer and you want to learn more about how to work on the import system, please refer to the Import Pipeline docs.

Import Options

Many of the following options may require applying for a bot account on openlibrary.org:

Import by ISBN or ASIN
Import by JSON
Batch import by JSONL
Import by Archive.org Identifier (Staff only)
Import by MARC
ONIX and OPDS coming soon

Import by ISBN or ASIN

Note

As of 2025-03, its possible the Bookworm (i.e. affiliate) Service is not enabled for importing by ISBN. Check with staff if you have questions.

If you know the ISBN or ASIN of a book that is missing from Open Library, you may be able to import it "automagically" using the https://openlibrary.org/isbn/:identifier: API.

Note

This API only works for ISBN-10 and thus any ISBN-13 provided must be convertible to an ISBN 10.

How ISBN Fetching API Works

First, the Open Library database is searched for the identifier in question, and if a match is found, that matched edition is returned;
If no matching edition is found, the import_item table will be checked for staged entries that match, an import will be attempted based on that match, and a new edition if any will be returned;
Finally, if no matches have been found yet, Bookworm (the "affiliate-server") will be queried for matches; if a match is found, the associated metadata is staged for import at a later date. At this step the server will simply reply that the page, /isbn/<some isbn>, does not exist. If high_priority=true is passed with the query to /isbn (e.g. (/isbn/1234567890?high_priority=true), then an import will be attempted and the resulting edition returned, if any.

Importing JSON

The JSON Import API is the most common and flexible way for sufficiently privileged Open Library accounts to import a book into the Open Library. This is achieved by performing am authenticated POST with a JSON payload that matches our book schema to the /api/import endpoint. Any developer can (and should) first test their import scripts locally by POSTing the same JSON to their local development server at http://localhost:8080/api/import.

Tip

When switching between development and production to run imports, not only will the URL change from from http://localhost:8080/api/import to https://openlibrary.org/api/import, but also the usernames, passwords, and other keys will need to be updated.

Using Javascript

Tip

This approach works well if you're planning to write a bookmarklet to write import logic in javascript

If you are logged-in to Open Library within your web browser (your JavaScript environment should be successfully setup and will use the session from the cookie in your browser) and a member of staff has added your account to the API usergroup, you may enter the following example code within your developer console (by pressing F12 on the keyboard) to attempt an import on development or production:

var bookdata = {'title': 'Ikigai', 'authors': [{'name': 'Souen, Chiemi'}, {'name': 'Kaneshiro, Flor'}], 'publish_date': '2021', 'source_records': ['amazon:195302114X'], 'number_of_pages': 40, 'publishers': ['Brandylane Publishers, Inc.'], 'isbn_10': ['195302114X'], 'isbn_13': ['9781953021144']}

fetch("https://openlibrary.org/api/import?debug=true", {
    method: 'POST',
    headers: {
        'Accept': 'application/json',
        'Content-Type': 'application/json'
    },
    body: JSON.stringify(bookdata)
}).then(response => response.json()) .then(response => console.log(JSON.stringify(response)))

Python import using `requests`:

Note: this requires you to have an existing requests.session to use. See API Authentication details.

This is almost identical to POSTing with openlibrary-client, except session is not a method on olclient.

For more about the schema, see https://github.com/internetarchive/openlibrary-client/blob/master/olclient/schemata/import.schema.json

import json
import_url = "http://localhost:8080/api/import"
import_data = {
    "title": "Beyond The Hundredth Meridian: John Wesley Powell And The Second Opening Of The West",
    "authors": [{"name": "Wallace Stegner"}],
    "isbn_13": ["9780803291287"],
    "languages": ["eng"],
    "publishers": ["University of Nebraska Press"],
    "publish_date": "1954",
    "source_records": ["amazon:B0016VINB0"],
}
r = session.post(import_url, data=json.dumps(data))
r.text  # Verify import

Using openlibrary-client:

First, make sure the olclient library is installed and then configured (see API Authentication) with your archive.org/openlibrary.org credentials (that have been granted API usergroup access):

git clone https://github.com/internetarchive/openlibrary-client.git && cd openlibrary-client && pip -q install .
ol --configure --email [email protected]

Tip

Check out this colab for a full example of using the olclient to update or import records on Open Library.

This is almost identical to POSTing with requests, except here the session is a method of olclient.

For more about the schema, see https://github.com/internetarchive/openlibrary-client/blob/master/olclient/schemata/import.schema.json

from olclient.openlibrary import OpenLibrary
ol = OpenLibrary()
import json
import_url = "http://localhost:8080/api/import"
import_data = {
    "title": "Beyond The Hundredth Meridian: John Wesley Powell And The Second Opening Of The West",
    "authors": [{"name": "Wallace Stegner"}],
    "isbn_13": ["9780803291287"],
    "languages": ["eng"],
    "publishers": ["University of Nebraska Press"],
    "publish_date": "1954",
    "source_records": ["amazon:B0016VINB0"],
}
r = ol.session.post(import_url, data=json.dumps(data))
r.text  # Verify import

# Now we can fetch the book object and make changes!
ol_book = ol.Edition.get(isbn="9780803291287")

Using `curl` (local development example):

This assumes you have a cookies.txt with your session cookie. See API Authentication for details.

❯ curl -X POST http://localhost:8080/api/import \
    -H "Content-Type: application/json" \
    -b cookies.txt \
    -d '{
    "title": "Super Great Book",
    "authors": [{"name": "Author McAuthor", "birth_date": "1806", "death_date": "1871"}],
    "isbn_10": ["1234567890"],
    "publishers": ["Burgess and Hill"],
    "source_records": ["amazon:123456790"],
    "subjects": ["Trees", "Peaches"],
    "description": "A book with a description",
    "publish_date": "1829"
}'
{"authors": [{"key": "/authors/OL15A", "name": "Author McAuthor", "status": "created"}], "success": true, "edition": {"key": "/books/OL19M", "status": "created"}, "work": {"key": "/works/OL9W", "status": "created"}}

Procedure for Bulk Imports

Warning

This section needs to be updated to reflect our new batch-importing system.

Make sure your source data is archived and uploaded to https://archive.org/details/ol_data
Add/register your source to the https://openlibrary.org/dev/docs/data document
Apply for a bot account with API access
Create a new bot script for https://github.com/internetarchive/openlibrary-bots (each bot gets its own directory and contains a README so others in the future can see exactly how your data was imported)
Follow the Bots Getting Started instructions before running your bot.

Historical Reference

Watch this video by @hornc about the foundations of our import pipeline.

Here's a list of sourced we've historically imported. Many of the data sets themselves are archived within the https://archive.org/details/ol_data collection.

https://openlibrary.org/dev/docs/data - a list of major sources we've imported
https://openlibrary.org/about/help

API Authentication for Imports

If you're trying to write a script which uses the import API endpoints on Open Library to import book data, your script will need to first perform authentication.

The manner of your app's authentication depends:

on what mechanism you're using (e.g. curl, JavaScript, python, etc.); and
whether you're importing into the production or the local development environment.

The API examples below assume you have followed the directions in this section to do whatever steps are necessary for authentication.

NEEDS CLEANUP

Python and `requests`

The only difference between production and development is the URL of the host: https://openlibrary.org for production, and http://localhost:8080 for development.

Using requests.session, it's possible to authenticate once, save the cookie, and use the same session object for multiple requests.

Note: this example uses the default credentials for the local development environment. If authenticating against production, you should NOT put your credentials in code you are going to commit somewhere. Instead, consider reading your credentials from the environment, a .env file, etc. There are many ways to do this, including using docker compose to set environment variables and python-dotenv.

# Get a logged in session
import requests
from urllib import parse
login_url = "http://localhost:8080/account/login"
login_headers = {'Content-Type': 'application/x-www-form-urlencoded'}
login_data = parse.urlencode({
    "username": "[email protected]",  # Do not store your real credentials in code you may commit.
    "password": "admin123",
})
session = requests.Session()
session.post(login_url, data=login_data, headers=login_headers)

At this point, you should be able to see your cookie in session.cookies:

>>> session.cookies.get('session')
'/people/openlibrary%2C2024-03-21T05%3A03%3A00%2C10172%5a657abb15a5c469936ec86f420f7b39'

If you see a cookie corresponding your user, you can now use session.post() to POST to the appropriate API.

openlibrary-client

See openlibrary-client configuration for directions on how to install openlibrary-client and log in on both production and development.

Similar to plain requests, olclient will store your cookie in the session object to allow multiple authenticated requests using the same session object.

Follow the configuration directions on the openlibrary-client GitHub repository. Once you see your cookie in ol.session.cookies, you can use ol.session.post() to POST to the appropriate API.

>>> ol.session.cookies.get("session")
'/people/openlibrary%2C2024-03-21T05%3A03%3A00%2C10172%5a657abb15a5c469936ec86f420f7b39'

`curl`

Although curl lacks the concept of a session, it can store a cookie for later use.

This example is for the local development and uses the default credentials. You do not want to store your credentials in your shell history. oh-my-zsh users may wish to use the dotenv plugin.

curl -c cookies.txt -X POST "http://localhost:8080/account/login" -d "[email protected]&password=admin123"

cookies.txt should now have your session cookie within it:

❯ cat cookies.txt  
# Netscape HTTP Cookie File
# https://curl.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.

localhost       FALSE   /       FALSE   0       session /people/openlibrary%2C2024-03-21T05%3A03%3A00%2C10172%5a657abb15a5c469936ec86f420f7b39
localhost       FALSE   /       FALSE   0       pd

Once this is done, call curl with -b cookies.txt to use the cookie.

Import APIs

Submitting ISBNs in Bulk (not currently an option)

Ultimately, we'd like to create an ongoing ISBN import pipeline which discovers book isbns as they are published and imports their metadata into openlibrary.org. This requires 3 steps:

Discovery - writing bots to find isbns in web content such as catalogs and APIs
Submission - uploading these isbns to https://archive.org/details/ol_data and processing them with @hornc's modern-import-bot
Importing - updating isbn import endpoints to prevent duplicating existing authors and importing undesirable content (like CDs)

improve a general purpose openlibrary.org endpoint which takes isbn (10, 13) and imports an item onto Open Library using our Amazon Affiliate API -- this includes logic to make sure only books (i.e. non-book items on amazon, like CDs) get imported and we avoid duplicating existing authors, works, and editions.

Discovering ISBNS

We'll want to write bots to discover isbns in web content, such as catalogs and APIs. These bots will live in a dedicated repo called openlibrary-bots (https://github.com/internetarchive/openlibrary-bots). Each bot will have its own directory in the project with a README.md explaining how to run the bot to harvest isbns. I imagine the community will work on several bots (Amazon bots, Goodreads bots, Google Books bots, Better World Books bots, etc) which continuously look at sources at some regular interval to discover new isbns.

If we plan on creating a new bot, it's probably best practice to first open an issue on the openlibrary-bots repo explaining:

what data source we plan on using
what technology we’d like to use

For instance, let’s say we want to create a bot which discovers isbns from Amazon’s new arrivals:

You’d create an issue on https://github.com/internetarchive/openlibrary-bots/issues describing how it will work, what tech we will use, what the directory will be called
Once we get feedback, we’d create a directory in the project (e.g. amazon-new-arrivals-bot)
We’d add a README.md and a code a bot which discovers isbns
Create a new Archive.org item within the https://archive.org/details/ol_data collection named after your bot or data source, where your isbns will live (e.g. for our example amazon-new-arrivals-bot consider making a new item on Archive.org called amazon-new-arrivals-isbns within the ol_data collection). Add the url of this bot's archive.org item (where isbns will get uploaded/submitted) to the bot README.
Add a few basic unit tests to ensure the software is working correctly (e.g. it works correctly against a static amazon page)

Submitting ISBNs in Bulk

Please do not run this list of isbns directly against our https://openlibrary.org/isbn/:isbn` API because it is highly rate-limited. Instead, @hornc is writing a modern-import-bot designed to take safely lists of isbns (from a specified archive.org item) and queue the isbns up to be safely imported into Open Library.

As a result, once your bot has collected a list of isbns for a certain time period, the first step is to upload the isbns to its Archive.org item (which should have been created in step #1 and named according to the bot which generated the isbns -- e.g. https://archive.org/details/amazon-new-arrivals-isbns within the https://archive.org/details/ol_data collection) as <date>_<source>_isbns.csv -- e.g. 2019-06-25_amazon_isbns.csv

Updating ISBN Import

In order for this entire process to work correctly and not produce bad data, we'll need to first make change to the isbn import endpoint to prevent duplication. For instance, to make sure only books (i.e. non-book items on amazon, like CDs) get imported and we avoid duplicating existing authors, works, and editions.

Import by OCAID

Background

ocaid stands for Open Content Alliance Identifier. The name was coined when Internet Archive and other groups institutions around the world worked together to create an ID system they could universally share for accessing books. Today, ocaid means the equivalent of Archive.org identifier. When you go to, e.g., https://archive.org/details/ol_data, ol_data is the ocaid / Archive.org identifier. This section deals with importing Archive.org book items (by their ocaid / Archive.org identifier) into Open Library.

Ongoing Automatic Imports

Most qualifying Archive.org books are automatically imported into Open Library through a process calls ImportBot. ImportBot is a script in the scripts/ directory which is continuously run via a supervisorctl task on an internal machine called ol-home. The script makes a privileged call to core/ia.py which queries Archive.org for books which meet import criteria (i.e. have a MARC record, have the right digitization repub-state, etc). The call to core/ia.py's get_candidate_ocaids returns a list of ocaids which are batch-submitted using a queue to https://github.com/internetarchive/openlibrary/blob/master/openlibrary/plugins/importapi/code.py -- See #3 on Import Code-Paths above.

Direct Import by OCAID

If you are sufficiently privileged, you can import an Archive.org id into Open Library using the https://openlibrary.org/api/import/ia endpoint (which will require authentication). This endpoint works both on production and in the local development environment, with the development endpoint being located at http://localhost:8080/api/import/ia.

The code for this endpoint is found within the ia_importapi class within openlibrary/plugins/importapi/code.py.

By default, the only required argument is identifier, which specifies the ocaid. The optional arguments include:

require_marc: requires a MARC record. Defaults to true.
force_import: force the record to import. Defaults to false.

JavaScript import using `fetch`

Once you have authenticated and logged into the relevant account on production/development via the web browser in which you're running the JavaScript, you can import an ocaid using the following recipe:

Note: for a production import, simply change the URL to https://openlibrary.org/api/import/ia.

First open your browser development tools (F12 key). Then to import whitenightsother00dostiala for example:

fetch("http://localhost:8080/api/import/ia?identifier=whitenightsother00dostiala", {
  method: 'POST',
})
.then(response => response.json())
.then(response => console.log(JSON.stringify(response)))

Python import using `requests`:

Note: this requires you to have an existing requests.session to use. See API Authentication details.

Importing whitenightsother00dostiala into the local development environment as an example:

import_url = "http://localhost:8080/api/import/ia"
import_data = {
    "identifier": "whitenightsother00dostiala",
    "force_import": "false"
}
import_headers = {'Content-Type': 'application/json'}
response = session.post(import_url, data=import_data, headers=import_headers)
response.text  # Verify output
'{"success": true, "edition": {"key": "/books/OL20M", "status": "matched"}, "work": {"key": "/works/OL10W", "status": "matched"}}'

Using openlibrary-client:

Note: this assumes you already have an authenticated session with olclient. See API Authentication for details.

import json
import_url = "http://localhost:8080/api/import/ia"
import_data = {
    "identifier": "whitenightsother00dostiala",
    "force_import": "false"
}
r = ol.session.post(import_url, data=import_data)
r.text  # Verify import

Shell import using `curl`:

Note: this requires your session cookie in cookies.txt. see API Authentication for details.

Importing whitenightsother00dostiala as an example and finding it already exists in Open Library:

❯ curl -X POST "http://localhost:8080/api/import/ia" \ 
     -b cookies.txt \
     -d "identifier=whitenightsother00dostiala"

{"success": true, "edition": {"key": "/books/OL20M", "status": "matched"}, "work": {"key": "/works/OL10W", "status": "matched"}}

Bulk Import by OCAID

Assuming you have API write access, you can use ImportBot's import_ocaids method directly to import a list of Archive.org items into Open Library by ocaid.

NB As of 2019-07, the ocaid import process is being revised by @charles to relax some outdated ocaid import constraints which are currently preventing Open Library from importing tens of thousands of valid book items and their records. e.g. right now MARC records and repub-states are required, which we may want to relax. Also, we run into problems when Archive.org has multiple books for the same isbn (this causes a collision where a valid duplicate editions is rightfully caught by our duplication detector and prevented from being imported as a new edition). Ideally, perhaps, editions on Open Library would have multiple ocaids, but we're not there yet in our data model.

Batch importing JSONL

Sufficiently privileged patrons can make use of a bulk import endpoint at /import/batch/new, which accepts a plain text file of JSONL, with one import per line, e.g.:

{"identifiers": {"open_textbook_library": ["1581"]}, "source_records": ["open_textbook_library:1581"], "title": "Legal Fundamentals of Healthcare Law", "languages": ["eng"], "subjects": ["Medicine", "Law"], "publishers": ["University of West Florida Pressbooks"], "publish_date": "2024", "authors": [{"name": "Tiffany Jackman"}], "lc_classifications": ["RA440", "KF385.A4"]}
{"identifiers": {"open_textbook_library": ["1580"]}, "source_records": ["open_textbook_library:1580"], "title": "Introduction to Literature: Fairy Tales, Folk Tales, and How They Shape Us", "languages": ["eng"], "subjects": ["Humanities", "Literature, Rhetoric, and Poetry"], "publishers": ["University of West Florida Pressbooks"], "publish_date": "2023", "authors": [{"name": "Judy Young"}], "lc_classifications": ["PE1408"]}

Such imports will be pre-validated so one can see in advance which records may be problematic. # characters are also allowed at the start of the line as a comment, even though this is not valid JSONL. Imports should be formatted according to the import schema. For the notes associating the PR that added this feature, see https://github.com/internetarchive/openlibrary/pull/8122.

MARC Records

Bulk MARC Import

We are in the process of expanding the api/import/ia endpoint functionality.

From a conversation with @mek + @hornc We're going to write a ImportBulkMarcIABot in openlibrary-bots

For a given bulk archive.org item within ol_data collection, iterate over all its .mrc files.
For each .mrc file, retrieve the length of the first binary marc using the length (i.e. 1st) component of the LEADER field so we know when the first MARC records stops e.g. http://www.archive.org/download/trent_test_marc/180221_ItemID_Test_001.mrc?range=0:5
We'll use ^ with the correct item + file to get the first MARC and then use the LEADER to process the rest
We'll take the MARC, call the MarcBinary code in openlibrary/catalog/marc/marc_binary.py
We'll have to exposed a new endpoint within openlibrary/plugins/importapi/code.py for MARC import (w/o IA book item)
Add "local_id": ["urn:trent:12345"]key to openlibrary book metadata in addition to correctly formatted source_record so archive.org can search for the records using the OL books API from

Importing Partner MARCs with Local ID Barcodes

Cold Start: If we’re creating an instance of Open Library service from scratch, before the following instructions, we’ll first need to create a new infogami thing called local_ids which is of type page. Next:

A new partner named foo appears and their local_id barcode is in field 999$a within their MARCs
As an admin, navigate to https://openlibrary.org/local_ids/foo.yml?m=edit and set the following metadata. The following example presumes that foo-archive-item is the name of an archive.org item (preferably within the archive.org/details/ol_data collection) which contains one or more .mrc file(s). You can refer to https://openlibrary.org/local_ids/trent.yml for a live example.

id_location: 999$a
key: /local_ids/foo
name: Foo
source_ocaid: foo-archive-item
type:
    key: /type/local_id
urn_prefix: foo

save, and you should be ready to start importing MARCs using the following instructions: https://docs.openlibrary.org/2_Developers/3_Backend/Endpoints.html#import-by-archiveorg-reference

Goals

Here is a list of generally expressed goals for importing:

More imported books
Quality imports
Newer, more recent books
More useful items, what are our users searching for and not finding?
Ongoing, automated pipeline so OL stays (or becomes..) current
Flexible, extensible import process that can import a wide range of available and new data sources
Synchronisation with current and potential data partners (BWB, Wikidata, etc.)

These are high level areas. To make concrete progress we need to identify some specific goals, collect some data, identify tradeoffs, and prioritise between these categories, then break down into specific tasks.

Metrics

In order to measure progress and gaps, we need some metrics...

Basic starting point -- where are we?

How many books are we currently importing now and over the recent past?
- graph imports per month using add_book query?
- Extra details:
  - Split by source, user added vs. bot added, MARC import, ISBN import
  - Distribution by published date for current imports and sources
What is the current distribution of published dates in Open Library?
- TASK: create a script to extract this data (for graphing) from a monthly ol_edition dump.
What is being searched for and not found by users?
Overlap stats with partner data (needs requirement details)

Published dates in Open Library -- books with ISBN (14M out of 25M) @ July 2019

Editions by publication year graph for July 2019

Sources of import metadata

Libraries, MARC data
- PROs:
  - Better metadata
  - More detail
  - Author distinction with dates
- CONs:
  - Tend to be older books... primarily because we are not actively seeking new dumps? SFPL was the most recently extracted catalog imported (https://archive.org/details/marc_openlibraries_sanfranciscopubliclibrary).
Booksellers, ISBN lookup
- PROs
  - Newer books, incentive to keep up, market driven, popular modern books emphasised.
- CONs
  - Lighter metadata
  - Frequently include notebooks and other products with ISBNs for sale that we might want to exclude
  - Frequently include non-book products with other identifiers (or incorrectly converted data in ISBN field) in their catalogs
  - Bookseller catalogs are not concerned with overall catalog coherence or quality -- maximising sales goal not impacted by junk metadata entries. Offer it for sale even if it does not make sense. We OTOH want ALL items to be coherent.

IDEA: Define a bookseller metadata quality threshold. Heuristics to decide whether we want to import a record?

Quality issues

Duplicate records (Author, Work, Edition)
Conflated records
Junk records
- non-book items
- other non-record items the are generated due to errors
Corrupt data
- Unicode conversion
- Dates, names, ISBN10 -> 13 conversion etc.
- Misaligned metadata (good values in incorrect fields, multiple values in one field etc.)
- Mismatched identifiers contribute to conflation issues

Current state

.. notes written, pending ..

Other issues related to importing

Add book UI
- Consolidate add_book endpoints
- UI add book should use and leverage the import API checks and available fields for consistency and code maintainability
- https://github.com/internetarchive/openlibrary/issues/1163

Related resources

Historical -- we likely want to resume the intent of these efforts:

Unified Import System

How we'd like an unified import system to behave:

We have a single cron container on ol-home0 which queries for, or fetches, data from sources (archive.org, partners / nytimes / openstax / bwb) and submit either archive.org IDs or json book records to a queue managed by scripts/manage-imports.py
A service container on ol-home ImportBot continuously listens for new batches and imports them into Open Library.

Challenges

manage-imports.py currently only works on archive.org IDs (imports IA items) rather than on json data.
When we fetch a day of changes from Archive.org during our daily cron-check, it's easy for us to submit a batch to the ImportBot service. But when we fetch large volumes of monthly partner data which has millions of records, it's unclear how we should batch this and submit it to the ImportBot and how to know which [batches] are failing or succeeding. Do we submit 5000 batches of 1000 (e.g. 5M)?
import_item table only supports archive_id right now, what if it supported data as an alternative column and what would happen if this table had hundreds of millions of records.

IA Import Bot

Video walkthrough here: (how OL Imports work) https://archive.org/details/openlibrary-tour-2020/ol_imports_comprehensive.mp4 See code: https://github.com/internetarchive/openlibrary/blob/master/openlibrary/core/ia.py#L313-L399

There is a thing called the ImportBot daemon. It runs 24/7 checking the ImportQueue for new batches of ~1,000 records. These batches can include ia or any number of partner records.
There's also a cron job (i.e. a timed linux job) which runs IA Imports once a day. This cron job calls the code above (i.e. get_candidate_ocaids) to fetch archive.org IDs (i.e. ocaids) and submits them to the ImportQueueto be processed by [1]

Overview of Known Data Sources

Penguin Books https://developer.penguinrandomhouse.com/docs/read/enhanced_prh_api

Google Books https://developers.google.com/books/docs/v1/getting_started

OCLC Worldcat Book Lookup https://platform.worldcat.org/api-explorer/apis/Classify/ClassificationResource/Search

Simon and Schuster http://www.simonandschuster.biz/c/biz-seasonal-catalogs https://catalog.simonandschuster.com/?cid=16322

Library of Congress 25M MARCs https://www.loc.gov/cds/products/MDSConnect-books_all.html

Hathi Trust 16.8M volumes, 8M book titles, 38% public domain https://www.hathitrust.org/bib_api

Wikidata lookup any identifier (author, subject, language,...) and find other identifiers for the same entity https://www.wikidata.org/

Developer's Guide to Data Importing - internetarchive/openlibrary GitHub Wiki

Data Importing

Import Options

Import by ISBN or ASIN

How ISBN Fetching API Works

Importing JSON

Using Javascript

Python import using requests:

Using openlibrary-client:

Using curl (local development example):

Procedure for Bulk Imports

Historical Reference

API Authentication for Imports

NEEDS CLEANUP

Python and requests

openlibrary-client

curl

Import APIs

Submitting ISBNs in Bulk (not currently an option)

Discovering ISBNS

Submitting ISBNs in Bulk

Updating ISBN Import

Import by OCAID

Background

Ongoing Automatic Imports

Direct Import by OCAID

JavaScript import using fetch

Python import using requests:

Using openlibrary-client:

Shell import using curl:

Bulk Import by OCAID

Batch importing JSONL

MARC Records

Bulk MARC Import

Importing Partner MARCs with Local ID Barcodes

Goals

Metrics

Basic starting point -- where are we?

Published dates in Open Library -- books with ISBN (14M out of 25M) @ July 2019

Sources of import metadata

Quality issues

Current state

Other issues related to importing

Related resources

Historical -- we likely want to resume the intent of these efforts:

Unified Import System

Challenges

IA Import Bot

Overview of Known Data Sources

⚠️ **GitHub.com Fallback** ⚠️

Python import using `requests`:

Using `curl` (local development example):

Python and `requests`

`curl`

JavaScript import using `fetch`

Python import using `requests`:

Shell import using `curl`:

⚠️ GitHub.com Fallback ⚠️