Guide to Identifiers - internetarchive/openlibrary GitHub Wiki

Guide to Identifiers

Best practices were discussed at http://webservices.itcs.umich.edu/mediawiki/oaibp/index.php/IdentifyingTheResource although that site has been frozen since. In mid-2007, the DLF was reabsorbed into the CLIR to form https://www.diglib.org. Not sure where current best practices abide.

Below are two markdown tables—one for Author Identifiers and one for Edition Identifiers—that list the identifier, a brief description, an example, and a link to additional documentation.

Author Identifiers

ID Description Example Docs
OLID Open Library author record OL1234A Open Library API
LCAuth Library of Congress authority ID https://id.loc.gov/authorities/no2013090983 Library of Congress Authorities
VIAF ID Virtual International Authority File identifier e.g. 12345678 VIAF
ISNI International Standard Name Identifier e.g. 000000012146438X ISNI
ORCID Open Researcher and Contributor ID 0000-0001-2345-6789 ORCID
WD ID Wikidata item identifier Q6290611 Wikidata

Edition Identifiers

ID Description Example Docs
OCLCid OCLC control number e.g. 12345678 OCLC Help
LCCN Library of Congress Control Number e.g. 2002022641 LCCN Namespace
OLID Open Library ID OL234M Open Library API
OCAID Open Content Alliance ID (Archive.org item identifier) e.g. 12345678 Archive.org
HTID Hathi Trust ID e.g. HT123456789 Hathi Trust Data
Google ID Google Books ID zyTCAlFPjgYC Google Books Partner Help

Open Library identifiers on Archive.org

We used to connect Archive.org <=> OpenLibrary.org items using a metadata field called openlibrary on Archive.org and a field called ocaid on OpenLibrary.org (which stands for Open Content Alliance ID — see: https://en.wikipedia.org/wiki/Open_Content_Alliance).

@mek noticed these openlibrary IDs were very stale and had not been used for a while. So 2 new fields were added (in attempt to deprecate openlibrary field on Archive.org). These are openlibrary_edition and openlibrary_work.

If I recall @hank and @judec informed me the openlibrary metadata key is actually being used in certain places within our derive pipelines — this code has never been updated to use openlibrary_edition and openlibrary_work.

Retrieving Archive.org data for an Open Library identifier

For any given openlibrary_edition one should be able to use this ID pull the MARC from Open Library to confirm if the metadata matches. Or even more easily — let’s say you have a book on archive.org called… jungleauthoritat00sinc

And its metadata lists its openlibrary_edition as OL3561303M.

If you use https://openlibrary.org/books/OL3561303M.json it will return the json API data for that book and you can check the values without necessarily even needing the MARC

OpenLibrary

{
  covers: [253146],
  ocaid: "jungleauthoritat00sinc",
  key: "/books/OL3561303M",
  identifiers: {
    goodreads: ["54855"],
    librarything: ["3414"]
  },
  lccn: ["2002026536"],
  isbn_10: ["039397779X"],
}

Archive.org identifiers

<isbn>9780393977790</isbn>
<isbn>039397779X</isbn>
<openlibrary>OL3561303M</openlibrary>
<external-identifier>urn:asin:039397779X</external-identifier>
<external-identifier>urn:acs6:jungleauthoritat00sinc:pdf:0c7cfd74-4178-4606-aad6-82c7dd226004</external-identifier>
<external-identifier>urn:acs6:jungleauthoritat00sinc:epub:e2b0f076-2d77-44b4-bbbd-75e14c654c2c</external-identifier>
<external-identifier>urn:oclc:record:1035882283</external-identifier>
<boxid>IA1138320</boxid>
<identifier>jungleauthoritat00sinc</identifier>
<containerid>S0022</containerid>
<identifier-access>http://archive.org/details/jungleauthoritat00sinc</identifier-access>
<identifier-ark>ark:/13960/t9n33ns5x</identifier-ark>
<oclc-id>473022932</oclc-id>
<oclc-id>492015570</oclc-id>
<oclc-id>50143929</oclc-id>
<oclc-id>611841594</oclc-id>
<oclc-id>750905675</oclc-id>
<oclc-id>845516453</oclc-id>
<oclc-id>849008372</oclc-id>
<lccn>2002026536</lccn>
⚠️ **GitHub.com Fallback** ⚠️