Home - internetarchive/openlibrary GitHub Wiki
Welcome to the Open Library Handbook!
For a top-level executive summary of the Open Library project, please see the Main Open Library Index. This document contains a year-by-year breakdown of board reports, roadmaps, community call documents, a project index, and top-level team documents spanning engineering, design, communications, and more.
You can read more about the vision and mission of the Open Library here. In 2005, the Internet Archive co-founded the Open Content Alliance (OCA) to digitize and archive the world’s books. Open Library was born shortly after, circa 2006, as a card catalog for every published book. Since its inception, OpenLibrary.org has aspired to enable patrons to access readable digital editions (where available) of open access, public domain, and unrestricted books. In 2007 the Internet Archive received Special Library (Rush Brandis, California State Library) classification by the State of California. This same year, Aaron Swartz gave a seminal talk about Open Library and its roadmap at Harvard University's Berkman Klein Center. In 2011, the Internet Archive piloted an online lending program on OpenLibrary.org, in collaboration with several OCA partners. In 2014, the Internet Archive received a grant under the Library Services and Technology Act (LSTA) from the California State Library system to support the digitization of books and the development of OpenLibrary.org as an online lending library service.
After the completion of the grant and the passing of its co-founder Aaron Swartz, Open Library entered a period of hiatus, without full-time staff, and was considered for sunset. In 2015, the Open Library project was kept on life support through the significant shared efforts of Jessamyn West, Giovanni Damiola, and Brenton Cheng.
In 2016, the program transitioned leadership to Mek and a community of volunteers who added key improvements, such as the “Want to Read” button and a mobile redesign. Among these volunteers, Charles Horn, Drini Cami, Jim Champ, Lisa Seaberg, and Scott Barnes eventually transitioned to staff on Open Library or adjacent projects at the Internet Archive.
In 2019, Open Library launched the Book Sponsorship program, enabling the community to help fund the digitization of books missing from the lending program. In 2020, Open Library invested in sustainability (upgrading to Python3, establishing an import pipeline, and production Docker), simplifying the patron experience, and improving discovery & adoption by book lovers. In 2021, we focused on imports, partnerships, search, and becoming the Open Book Catalog for the Internet. In 2022, we focused on increasing direct value to patrons by improving the core usability and experience of the service: book page design, edition-level search, navigation, mobile, and performance. In 2023, having plucked low hanging fruit and achieved feature parity with many different services, it became essential for Open Library to clarify its unique value proposition and did so by conducting design research with its patrons. In 2024, the project is acting on these learnings to focus and align efforts to achieve a more defensible and bright library future.
As of 2024, the Open Library program is directed by Mek Karpeles and staffed by Drini Cami, Jim Champ, and Scott Barnes, with significant support from Lisa Seaberg from Patron Services. The project enjoys contributions from volunteers spanning more than 20 nations.
A 2022 Video of why we plan and how
- The internal staff ABC (Archive Book Catalog) Team call (1h, Monday @ 12pm PT) 2021
- The public Open Library Community Call
- M/W engineering standups at 11am (20min)
- Weekly 1:1's (45min)
Open Library does planning on a yearly cadence and involved the community in its planning process. These yearly plans are distilled into executive summaries which are then vetted by the community and ultimately presented for management and the board for approval.
In November/December, the Open Library community works on its yearly Planning Document.
This gets turned into a yearly Executive Priorities Roadmap which gets submitted to the board
Each week (presently on Monday [2021]) the internal team has an ABC (Archive Book Catalog) call. On the last week of each month, during our weekly staff ABC calls, Open Library staff meets to plan the upcoming monthly Milestone. This process is informed by (a) the yearly Roadmap, (b) comments from stakeholders -- including management, community members, and partners -- and (c) high priority items on our issue tracker. During the milestone planning call, we calculate a summary/snapshot of how we performed, we renamed the current milestone to Sprint YYYY-MM
and then close/archive it, and move all its remaining items to the rolling Next Proposed Milestone. We then create a new empty Milestone called "Active Milestone" for the upcoming sprint and go through Next Proposed to source issues. Each issue on the milestone must have an assignee and it must be clear to all parties what is expected/necessary in order for the issue to be resolved.
Every three months (before the end of the quarter) we use our ABC weekly staff meeting to review our yearly planning document. This includes evaluating our progress, blockers or circumstances which may merit a change of prioritization, and communication of changes to stakeholders.
When we have large/epic projects spanning multiple issues or individuals, we typically use a google doc to flesh out the initial idea, as well as a placeholder github issue with the epic
label. The Book Sponsorship Program, our Migration to Python 3, the Canonical Books Page Re-Design, and Community Book Tags Project are four such examples. Once the proposal outline has been reviewed with stakeholders, we update the placeholder Epic issue to include sub-issues which may be added to upcoming milestones, as deemed appropriate. Each project of this size is assigned a team Lead who is responsible for overseeing its progress and coordinating with contributors. Leads are invited to use whatever project management works best for their style, though a popular approach is getting permission to registering a new Project Board.
In 2018, Mek, Drini Cami, and Charles Horn, went through the process of high-level, long-term strategic planning. Some internal documents are available, as well as copious internal notes. We also have years of notes from previous members of the Open Library team. At this point, they have not been compiled into a single publicly available 5-year proposal.
As it evolves, our 5-10yr vision will be documented here
We drive most of our work on the Open Library project off GitHub Issues in the InternetArchive/OpenLibrary GitHub repo.
Much of our work is done by volunteers. So we plan very lightly, by year and by quarter, given the resources we have available. As new volunteers join, plans may change to accommodate their skills and interests.
Things we'd like to get done during a given year get the milestone for that year, e.g. "2019". For things we expect to be done in by the end of a given quarter, we apply the quarterly milestone, e.g. "2019 Q1" for things to get done on or before March 31, 2019.
Milestones are closed when their deadline arrives. Issues associated with that milestone that are not done get rescheduled (the milestone is changed) or backlogged (labeled State: Backlogged
).
The assignee of an issue is the person responsible for its completion.
An example of a common search might be assignee:cdrini label:"state: work in progress"
to see what Drini is working on.
Singular ownership is important to make sure things don't fall on the floor. We therefore avoid multiple assignees. Most issues have multiple individuals involved in various aspects of assessing and resolving it - those people are "mentioned" (e.g. "@hornc") in the issue comments.
If multiple folks are working together to solve a problem, use @mentions in the issue comments, or if it is really complicated, create a subissue to be owned by another person. Don't forget to mention the parent issue in the first comment.
Any open bug that is unowned is in need of triage.
We reserve a set of labels for use with Github issues, to assist with issue handling and project management for OpenLibrary.
Most generally, an issue evolves through a series of states, for example:
submitted -> assessed -> scheduled -> fixed -> closed.
On the way down that path, lots of things can happen. The labels below can be applied and removed during initial triage (either by the submitter, or by an initial triage), and thereafter by the owner of the bug.
The sections below contain:
- Owners
- Milestones
- Managed Labels By Group
- Personal Labels, Deprecated Labels
- Tips for Filtering Issues and Updating Their Labels
Note the following:
-
The label's label (!) matters. Each label starts with a prefix that groups the labels into sorted list. Labels prefixed with "~" are deprecated. Labels prefixed with a developer's initials are managed by that developer (for their own purposes).
-
Label color matters. Each prefix has a common hue (developers, pick your hue!). Generally, the prefixes are orthogonal (an issue can only have one Priority, only one Type, from the managed set). So if you see an issue with more than label of a given hue, something might be awry. Labels get reduced color saturation (greyed out) labels to indicate they are deprecated. Lower color values (whiter) typically suggest reduced urgency. The label text also has these indicators, for accessibility (since not everyone sees all the colors).
-
Be careful touching labels on issues belonging to others! People are using the labels to actively manage their activity. You can add and remove labels from an issue:
a. At the time you submit the issue, as the submitter. b. The assignee/owner of an issue can label it as they see fit, using their own labels, and using managed labels according to the guidelines. c. The Bug Triage Owner will adjust issues during triage. d. Deputies will review and adjust labels as needed for issues in their area. e. If you're not sure, ask the issue owner, the Bug Triage Owner, @brad2014, or @mek. (The comment stream of the issue, or the slack channel are great places to raise these questions).
Every issue on Open Library's tracker should be assigned a Lead: @person
and a Priority: #
label by a member of staff.
Issues which have Needs: Lead
and or Needs: Triage
labels do not yet have a lead/mentor assigned and have not yet been vetted or evaluated for the community. Once an issue has been triaged, members of the community may request to be assigned to an issue and enjoy the mentorship of the designated Lead
.
Priority describes how urgent the bug is. Very urgent bugs generally have an active conversation on the slack channel, so that they can be fixed right away.
Color | Label | Description |
---|---|---|
Priority 0: Immediate | Issues that prevent users from using the site, or that corrupt site data. [managed] | |
Priority 1: Urgent | Do this week, receiving emails, time sensitive. [managed] | |
Priority 2: High | Important, as time permits. [managed] | |
Priority 3: Normal | Issues that we can consider at our leisure. [managed] | |
Priority 4: Low | An issue, but should be worked on when no other pressing work can be done. [managed] |
When a priority label is applied to an issue by the submitter, or on any issue without an owner, it represents a suggestion, not a decision. Priorities are not immutable - even while an issue is being worked on, the owner may decide to move the priority up or down.
Although priorities indicate urgency rather than timing, a helpful frame for assigning priority is to ask questions such as these - think of them as "rules of thumb":
-
Is the issue preventing numerous users from successfully using the website? Is the issue related to leaking sensitive information? Is the issue related to usage or running processes actively corrupting our data? Does it really need to be fixed in the next 48 hours? Mark it
Priority 0: Urgent
. If it requires conversation or multiple developers to fix, they should all be talking in a slack channel right now. -
Should a developer interrupt and set aside their current activity to get this issue resolved? Does it need to be fixed in the next 14 days? Assign an owner who will label it
Priority 1: High
. If you can't find a volunteer to own it, it can't bePriority 1
. -
The other priority labels are available to memorialize your assessment. They mostly indicate, "I've looked at this, and it is not urgent."
At most, one Priority
label can be assigned to an issue. If there is no Priority
label, consider the priority unassessed. It is good form to mark all your issues with some priority level, because it gives us a historical record of the distribution of issues by priority.
Every issue on Open Library must be assigned a lead label. A Lead is member of the community with domain expertise who has been appointed by staff to help manage a specific aspect of Open Library (such as search, design, javascript, i18n, etc). Anyone may apply to be considered for a specific lead position.
The Lead is responsible for:
- Project Management: Defining and breaking down an issue to making sure it's actionable and labeled
- Mentoring: Monitoring the issue for new comments and helpfully responding to questions and comments
- Assignment: Assigning members of the community (or themselves) to an issue and committing to give them mentorship
- Escalation: Raising relevant questions, issues, or concerns about designs and requirements to members of staff
- Review: Overseeing the code review process for PRs addressing the issue
See the Team Leads Labels to get an idea of who to tag.
Labels are grouped by prefix and color. If you create a label outside the managed set, prefix it with your initials and give your personal labels a common color. We are continually evolving the managed set to meet our needs. If you think a label deserves to be in the managed set, just mention it.
The labels are grouped into different axes for slicing and dicing issues:
What kind of issue this is. Is it something that is broken that should (perhaps) be fixed, or is it a request for a new feature or enhancement, or is it a reminder to reorganize or clean up some aspect of the code base?
Color | Label | Description |
---|---|---|
Type: Bug | Something isn't as intended. [managed] | |
Type: Feature | Issue describes new functionality we'd like to implement. [managed] | |
Type: Question | This issue doesn't require code. A question needs an answer. [managed] | |
Type: Refactor/Clean-up | Issues related to reorganization/clean-up of data or code (e.g. for maintainability). Specifically "restructuring of an existing body of code, altering its internal structure without changing its external behavior. (https://refactoring.com). [managed] | |
Type: Epic | A feature or refactor that is big enough to require subissues. [managed] | |
Type: Subtask of Epic | A subtask that is part of the work breakdown of an epic issue (see comments). [managed] |
Epics and subtasks are used when we want to separate out the ownership, comment stream, and timing of different parts of a large project. The Epic is closed when all its subtasks have been closed. For most issues, putting a checklist in the comment stream suffices (when everything is checked off, the issue can be closed). The "Needs: Breakdown" label can be used for any issue (epic or not) that needs a decision identifying the list of steps that will be taken in order to close the issue.
Note that Bug
, Feature
, Question
, Refactor
, Subtask
are mutually exclusive. Every issue (post-triage) should have one of these. If an issue is labeled Epic
, it probably also should have a Feature
or Refactor
label.
Use these labels to distinguish between issues that we're actively working on, those that we plan to work on, and those that seem to be good ideas that we'll consider when we have the additional time and resources required.
Color | Label | Description |
---|---|---|
State: Backlogged | No one working on it, not in any milestone, but want to leave open to consider later. [managed] | |
State: Blocked | Progress has stopped, we are waiting for something. [managed] | |
State: Scheduled | A decision has been made that this issue should be addressed. [managed] | |
State: Work In Progress | This issue is being actively worked on. [managed] |
If no state label is present, the issue needs assessment.
If someone was working on an issue but had to set it aside, the state label might be changed to "Backlogged," or the current owner might find someone to hand it off to, or it might even be closed (if we decide it didn't need to be addressed after all).
If an issue is "State: Scheduled", it must have a milestone that indicates by when it is scheduled to be completed. We plan by quarters, so "2019 Q1" means it is an issue we expect to resolve on or before March 31, 2019.
If an issue is Priority 0
or Priority 1
, and the state is not Work In Progress
, something is wrong, and alarms should sound.
These labels indicate that an issue or pull request is stuck because the owner needs someone to respond - they'll add comments to the issue saying what exactly they need.
Color | Label | Description |
---|---|---|
Needs: Triage | This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] | |
Needs: Breakdown | This big issue needs a checklist or subissues to describe a breakdown of work. [managed] | |
Needs: Community Discussion | This issue is to be brought up in the next community call. [managed] | |
Needs: Detail | Submitter needs to provide more detail for this issue to be assessed (see comments). [managed] | |
Needs: Feedback | A proposed feature or bug resolution needs community feedback prior to forging ahead. [managed] | |
Needs: Help | Issues, typically substantial ones, that need a dedicated developer to take them on. [managed] | |
Needs: Review | This issue/PR needs to be reviewed in order to be closed or merged (see comments). [managed] | |
Needs: Special Deploy | PR needs a non-standard deploy. [managed] | |
Needs: Submitter Input | This issue/PR needs a response from the submitter. [managed] | |
Needs: Investigation | This issue/PR needs a root-cause analysis to determine a solution. [managed] |
If you see one or more of these labels on an issue, assume we are not making progress on it.
If you are the owner of an issue and add this label, always add a comment that indicates, as best you can, what you need to get unstuck. If you think you need the help of another team member, make sure to mention them by their handle in the comment.
Remember to remove this label once the need is met and the issue is unstuck.
Issues typically lead to pull requests to modify the repo in order to resolve the bug.
It is considered good form, immediately prior to closing a bug, to add a label indicating if it was closed for any of the following reasons.
Color | Label | Description |
---|---|---|
Close: Duplicate | This issue or pull request already exists (see comments for pointer to it). [managed] | |
Close: Not Reproducible | Closed because we cannot reproduce the issue. [managed] | |
Close: Not an Issue | Questions and discussions resolved or moved to Gitter or Slack. [managed] | |
Close: Will Not Fix | Closed because we have decided not to address this (e.g. out of scope). [managed] |
Some observations:
-
You'll almost always want to add additional detail in the comments as to how the decision to close was arrived at. Liberally mention other people you consulted with (for example, "I spoke to @jeff and we agreed this affects very few users.") and other issues (super-common: "this is a duplicate of issue #XXXX").
-
One of the best ways to attract attention to an issue that you feel is unfairly ignored is to close it prematurely.
-
If there is (or should be) another
State: Scheduled
issue to fix a big problem, and fixing the big problem would also resolve this issue, we have a choice: we could address this issue (e.g. with a stop-gap solution, right now, while we wait for the big fix), or close this issue (i.e. decide to leave it broken). In the comments, reference the big picture issue (e.g. "This issue will be properly resolved when we address issue #XXXX."), and explain your intentions. For example, you might comment "We'll leave this broken until then," and labelClose: Will Not Fix
, or comment "In the meantime, we'll deploy the following stop-gap" and label this issue, say,Priority 1: High
.
There are some issues that affect multiple modules, or are related to a user story or workflow that touches multiple systems, and we use "theme" labels to identify them. This list is expected to grow.
Color | Label | Description |
---|---|---|
Theme: Accessibility | Work related to disability accessibility. [managed] | |
Theme: Book Sponsorship | Issues related to the workflow for book sponsorship. [managed] | |
Theme: Backup/Restore | Issues related to disaster recovery, backup/restore, data dumps. [managed] | |
Theme: Design | Issues related to UI design, branding, etc. [managed] | |
Theme: Development | Issues related to the developer experience and the dev environment. [managed] | |
Theme: Identifiers | Issues related to ISBN's or other identifiers in metadata. [managed] | |
Theme: Internationalization | Making OpenLibrary work for both foreign-language users and books. [managed] | |
Theme: Performance | Issues related to UI or Server performance. [managed] | |
Theme: Reading Log | Related to workflows for creating, modifying, displaying a user's reading log. [managed] | |
Theme: Search | Issues related to search UI and backend. [managed] | |
Theme: Upgrade to Python 3 | Issues relating to the systemwide upgrade from Python 2 to Python 3. [managed] | |
Theme: Testing | Work related to tests that need to be written or fixed. [managed] | |
Theme: Translation | Work related to language accessibility. [managed] | |
Theme: Public-APIs | Issues related to APIs accessible to external parties. [managed] | |
Theme: Editing | Issues related to APIs accessible to external parties. [managed] |
The unifying characteristic of Themes is that they involve issues that touch many parts of the repo (UI, Server, Configuration, Documentation, Data).
It is expected that an issue will have at most one Theme:
label.
The broad area this issue is related to, often suggesting who first should consider it.
Color | Label | Description |
---|---|---|
Affects: Admin/Maintenance | Issues relating to support scripts, bots, cron jobs and admin web pages. [managed] | |
Affects: Configuration | Issues related to system configuration (production, staging, or development). [managed] | |
Affects: Data | Issues that affect book/author metadata or user/account data. [managed] | |
Affects: Documentation | Issues related to developer or ops or data documentation. [managed] | |
Affects: Librarians | Issues related to features that librarians particularly need. [managed] | |
Affects: Mobile/Responsive | Affects the responsive UI on mobile devices. [managed] | |
Affects: Server | Issues with the server or its plugins. [managed] | |
Affects: UI | The issue is focused on the web user interface and user experience. [managed] |
It is preferred that an issue only have one Affects:
label, but we're not religious about it. If you notice that an issue affects multiple areas in the above list, you may want to split it into multiple issues, one per area. If it makes sense to resolve them independently, that's enough. If they all need to be resolved in a coordinated fashion, create an Type: Epic
issue which can remain open until all the subissues are closed.
These labels identify the specific module or service that the issue relates to. Often this corresponds to a particular directory or file or interface or class present in the repo hierarchy. This list is expected to grow.
Color | Label | Description |
---|---|---|
Module: Accounts | Issues related to authentication, account maintenance, etc. [managed] | |
Module: Docker | Issues related to the configuration or use of Docker. [managed] | |
Module: Git | Issues related to the git repo, branches, commit messages, etc. [managed] | |
Module: Infogami | Issues related to the configuration or use of the Infogami subsystem. [managed] | |
Module: JavaScript | Issues related to the JavaScript functionality. [managed] | |
Module: Memcache | Issues related to the configuration or use of the Memcache subsystem. [managed] | |
Module: Solr | Issues related to the configuration or use of the Solr subsystem. [managed] | |
Module: CSS | Issues related to CSS stylesheets. [managed] |
A common search might be something like label:"Affects: Server" label:"Module: Solr" label:"State: Work In Progress"
to see who is actively working on calls to solr in the server. If you wanted to pick up issues in that area, you could see who else is doing so.
A few remaining labels that are not in any group, because of github conventions, or for other reasons.
Color | Label | Description |
---|---|---|
Good First Issue | Easy issue. Good for newcomers. [managed] |
A good first issue
should be clear, should not require a lot of context, should be low risk and easy to review. Issues that involve high priority or global changes to the production system code are not good candidates.
If you have a large number of issues assigned to you, there may be times when you want to divide them into groups by your own criteria. You can do this by creating your own labels. They all go into the repository label set, so we ask that you:
-
Pick your own unique (!) color that you apply to all your personal labels, and
-
Prefix the label name with your initials.
For example, Charles Horn creates his own labels with prefix CH:
and color #1d76db.
Labels that are grey and/or start with a tilde ~
are deprecated. They typically are not used much, and shouldn't be added to issues going forward.
-
The default issues page presents several search fields. There is one at the very top, for searching the repository or all of github. There is another one under the
issues
tab that searches through the issues. To the right of the issues filter is the labels button, which gives you a "labels list" to browse and drill down to issues. -
You can type a label search term directly into the issues filter search field, you type
label:<labelname>
to include issues containing the label, and-label:<labelname>
to exclude issues containing that label. Because all the managed label names contain spaces and colons, you need to quote them if you are typing them directly into the issue search bar. For example,label:"affects: documentation"
(case doesn't matter). -
If the issues filter field contains multiple labels terms, you will see the issues that match all those label terms. There is no way to see a list issues that contain label A or label B.
-
If you're not sure exactly what the label is, just use the labels pull down, and in its search bar type any word that is in the label name or description, and the matches will be displayed. Click on the labels you want to add to your issues filter.
-
All the managed labels have
[managed]
in their description, so an easy way to browse just the managed labels is to search labels on that string. -
It is easy (and sometimes dangerous) to do batch updates of labels (or milestone, or assign, or close). Just filter the issues to the ones you want to change, select the issues you want to label. When you select one or more issues, the pulldown menu changes from filter mode to update mode. Pull down the label menu, and you can select which labels you want to add or remove (it's a toggle).
Leads have the challenging job of monitoring and keeping up with progress on their issues and pull requests. Staff has several bots that we hope can make the process easier.
- Keeping up with new comments on issues. Every day, we run the Issue New Comments Bot to be pinged about the issues for which we are the lead.
For each of the following, use the labels
facet to add your label to see issues and PRs under your leadership:
-
Issues which need to be triaged. Use the
labels
facet to add your label to see issues you need to triage. -
Issues which require some sort of project management action such as "needs breakdown" or "needs design". Use the
labels
facet to add your label to see issues you need to triage. - Issues that are waiting on the submitter. In the near future, we'll have a bot that automatically removed "Needs: Submitter Input" when the author pushes new updates for review.
If you don't have merge permissions and a PR looks ready to go, please mark it is "Needs: Staff / Internal".
Don't see what you're looking for? Check questions asked by contributors on Github or submit your own question
-
How do I set up the Open Library app locally?
- What process should I follow if I encounter a problem when building with docker?
- How do I find, claim, and work on a good first issue?
- How can I debug when things go wrong?
- How do I import production book & author data into my local environment?
- How can I login as a user in my local environment?
- How do I add a new route to Open Library? (tutorial)
- How do I add new Javascript functionality to a template?
- How do I find the right CSS file to add style rules?
- How do I rebuild css & js assets after I make changes?
I18n pages allow for the translation of content to various languages, enabling users to access localized versions of a webpage based on their locale preferences. For instance, when a user accesses https://openlibrary.org/subjects, they are redirected to https://openlibrary.org/subjects.en or https://openlibrary.org/subjects.es, depending on their selected language. Any text that is visible to the patron should be internationalized. The basics of web.py's templator
I18N support is described here: http://webpy.org/cookbook/i18n_support_in_template_file
To get started:
- Please kindly reach out to us via the volunteer page https://openlibrary.org/volunteer#translator
- Watch this overview:
Open Library i18n is handled via the python Babel library, GNU gettext
, and the message lists located https://github.com/internetarchive/openlibrary/tree/master/openlibrary/i18n
The messages file format used by the gettext
toolset is described here, and in the gettext manual.
In case you want to get started here are the following steps:
Option 1. Locate the right target language within the project (e.g. es
for Spanish) and then click on the po
file (the raw file where translation strings are contributed), e.g. this one for Spanish. Click on the pencil (edit) option which will bring you to an editable page like this where you can add or edit translations. When you're satisfied with your translations, scroll down to the bottom of the page where it shows Commit Changes, leave a description of your changes and make sure to select the radio button of Create a new branch. You can call "translations-es" or dash whatever language you're working with. Then, click Propose Changes
button and you're done! We can follow up if there are any validation issues which may need to be addressed.
Option 2. If you prefer working with git
you may instead fork / clone the repository from Github. Install git and follow the instructions on our Git Cheat Sheet to get set up.
If you are starting a new language translation, copy the template to the correct place in the directory hierarchy, add the plural forms info at the top and replace the English version of the msgstr
text values with the translated versions for your language. A new directory containing a translation template file must be created for each new language. These can be automatically generated if your Docker environment is set up (see our Docker README), or created manually.
Before creating the new directory, you will need to know your language's two-letter ISO 639-1 code. Make a note of the code once you have found it here: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
If you find something which can't be translated correctly (perhaps because the text is being concatenated in the code rather than in the message formatting), please create an issue describing the location and what the problem is.
-
Run
docker compose up -d
-
Run
docker compose exec -uroot web ./scripts/i18n-messages add [CODE]
, replacing[CODE]
with your two-letter ISO 639-1 code.
-
Create a new folder for your language. Create a new folder in
/openlibrary/i18n/
, using your two-letter ISO 639-1 code as the folder's name. -
Make a copy of the latest messages to translate. The messages template file,
/openlibrary/i18n/messages.pot
should be copied asmessages.po
(note the difference in extension, thet
for template is dropped for the copy) to your newly created folder.
In order for a new language option to be available in our language drop-down and footer, the language_list.html template must be updated to include the new language. An Open Library staff member can do this if you are unfamiliar with HTML.
You can edit the message.po
file using your favourite editor, or a .po specific tool such as poedit, and send in a Pull Request. Pull Request Guidelines can be found on our CONTRIBUTING guide and our Git Cheat Sheet.
In order to open your language version of the website in the browser, you will need to setup your docker environment (see our Docker README). After having run docker compose up -d
, run docker compose run --rm -uroot home make i18n
to build the translation files; then e.g. http://localhost:8080/?lang=fr should work.
To view production Open Library in a preferred language, you will need to adjust your browser language preferences. You can also use the lang=
parameter on the URL with a two character language code, e.g. https://openlibrary.org/?lang=fr
If changes have been made to the .pot
file, to reflect those changes to a given language you need to merge the two files. After setting up your docker environment (see our Docker README, run the following, replacing [CODE]
with your two-letter language code:
docker compose run --rm -uroot home ./scripts/i18n-messages update [CODE]
If you don't have a Docker set-up, you can also use Poedit to merge PO and POT via 'Catalog -> Update from POT file' and use it to review and translate changed and missing strings.
See our i18n guideline in the wiki for important and useful tips.
If you are updating an existing translation, run scripts/i18n-messages update
to merge the new msgids from the i18n/messages.pot
message templates into your i18n/<lang>/messages.po
message catalog. Review all fuzzy
matches and either remove the fuzzy
keyword, if a correct match, or update and remove the fuzzy
keyword. Make sure it's an exact match before removing the fuzzy
label. Sometimes there are minor, but important changes like datatype changes, e.g. %(count)s
to %(count)d
. Also review all entries with an empty msgstr
and add correct translations for them.
If the text was revised and the update/matching algorithm didn't think it was a close enough match to even label a fuzzy
match, you may find it at the bottom of the file in the section commented out with tildes (~). Any text in that section which is not useful to reuse or save, can be deleted.
Remember:
- keep the substitution variable names and data types unchanged in your translated text (e.g.
%(count)%s
) - don't translate embedded HTML markup (e.g. , <a href=>, etc)
- do escape any embedded quotes (e.g.
\"
)
Before submitting a PR with your translations, we recommend correcting any validation errors identified by the following script (replace [CODE]
with your language code):
docker compose exec -uroot web ./scripts/i18n-messages validate [CODE]
To add i18n support to Open Library, templates and macros are modified to use gettext function calls. For brevity, the gettext function is abbreviated as:
<a href="..">$_("More search options")</a>
The messages in the the templates and macros are extracted and .pot
file is created. After setting up your docker environment (see our Docker README, run:
docker compose run --rm -uroot home ./scripts/i18n-messages extract
The .pot
file contains a msgid
and a msgstr
for each translation used. The msgstr
field for each entry is filled with the translation of the required language and that file is placed at openlibrary/i18n/$locale/messages.po
:
mkdir openlibrary/i18n/te
cp openlibrary/i18n/messages.pot openlibrary/i18n/te/messages.po
# edit openlibrary/i18n/te/messages.po and fill the translations
The .po
files are compiled to .mo
files to be able to use them by gettext system. This is done by make i18n
automatically when the code is deployed, but needs to be done manually by a maintainer when deploying to dev.openlibrary.org .
-
.po
- Portable object file: This is the file where you will translators will add translations to. -
.pot
- Portable object template file: This is the file that lists all the strings in Open Library before translation. -
.mo
- Machine object file: This file is generated bymake i18n
, and is what is used by the actual site.
The codebase has deprecated translations in the /openlibrary/i18n directory. In directories of older translations, the messages.po
file will be replaced with a legacy-strings.{ISO 639-1 Code}.yml
file.
Languages with deprecated translations:
- hi Hindi
- kn Kannada
- mr Marathi
- nl Dutch
@cdrini has a script to help automate the creation if i18n versions of openlibrary.org infogami pages (e.g. https://openlibrary.org/about v. https://openlibrary.org/about.es):
https://gist.github.com/cdrini/615d75653e1e47115930fa394e83ab17
Any text that will be visible to the user should be internationalized. The basics of web.py's templetor
i18n support is described here.
The two primary i18n message functions are:
-
gettext()
which is bound to '_' as a convenience since it's commonly used -
ngettext()
(orungettext()
as we've historically used) which currently needs to be spelled out, but is commonly bound toN_
, so that's a convention we may adopt.
Text Type | Example Text | Example Syntax | Notes |
---|---|---|---|
Plain inner HTML | <p>About this book:</p> |
<p>$_("About this book:")</p> |
Wrap it in $_(" ")
|
title or alt attributes |
<button title="Submit" alt="Submit button"> |
<button title="$_('Submit')" alt="$_('Submit button')"> |
Wrap it in $_(' ') - note the use of single quotes |
value attributes |
<button value="Submit" /> or <input value="Enter your name"
|
<button value="$_('Submit')" /> & <input value="$_('Enter your name...')"
|
Note: You only need to use the i18n syntax if the value is being used to set visible placeholder text -- value s like this one can be left as is: <button value="add">$_("Add")</button>
|
Text that contains ' or "
|
title="That's strange!" or <p>Click "Add"</p>
|
title="$_('That\'s strange!')" or <p>$_("Click \"Add\"")</p>
|
If there's a conflict between the outer and inner quotes, i.e. ' or "" inside "$_(' ')" or "" inside $_(" ") , escape the inner quote/apostrophe with a \
|
Text that contains a variable | title="Photo of $author_name" |
title="$_('Photo of %(author)s', author=author_name) |
Wrap a stand-in for the variable in %( )s if the variable is a string, or %( )d if the variable is a number, and then assign your stand-in to the initial variable; Note: It's important to use a meaningful stand-in name here, like "count" or "author" to give translators context |
Text that contains nested HTML tags | <p>Would you like to <strong>return</strong>?</p> |
<p>$:_('Would you like to <strong>return</strong>?')</p> |
Wrap each full sentence and punctuation in $:_(' ') -- note the : and single quotes* |
Text that contains a plural | ("There is one person waiting for this book.", "There are {wlsize} people waiting for this book.") |
$ungettext("There is %(count)d person waiting for this book.", "There are %(count)d people waiting for this book.", wlsize, count=wlsize) |
Write out ungettext instead of the traditional _ and sub in the variable (see variable instructions)** |
Everything enclosed within any version of the $_(' ')
syntax will be extracted into messages.pot file. Review the messages.pot file to get an idea of what the extracted messages look like.
When syntaxing messages that are split up by nested HTML tags, consider if the message can be translated in chunks, or if the meaning of the chunks depends on the sentence as a whole. If it can be broken into chunks, it will save the translators the effort (and potential risk) of copy & pasting HTML. However, translation accuracy is the priority.
Note:
- There's no need to do nested
i18n
, i.e. thetitle
in$:_('Click <a href="$link" title="External Link">Here</a>')
would already be translated with the rest of the string. - Be sure to include punctuation marks inside parentheses, i.e. the
.
at the end of>$:_(' Please read our <a href="/help/faq">FAQ</a>.')
- Exclude HTML syntax where possible to avoid unnecessary copy and pasting for translators, i.e. you can leave out the
here like so: $_("Add some subjects")
-
But if you include an HTML closing tag, be sure to include the opening tag as well, i.e.
$:_('<strong>Download the DAISY.zip</strong>.)
- While working, avoid moving extra spaces to the beginning or end
<a></a>
tags, which will cause weird links that look like this - You may run across some untranslated HTML in the JavaScript test files; this is fine, and should be left as is
* You should try to avoid this where possible because it requires the translator to copy the HTML exactly—but sometimes you can't avoid it. Note you should not split up the sentence; it might not make sense in other languages.
These sentences however can be represented without i18n-ing the HTML by using python template strings:
$def cc0_link():
<a href="https://creativecommons.org/publicdomain/zero/1.0/" target="_blank" title="$_('This link to Creative Commons will open in a new window')">$_('CC0')</a>
$:(_('By saving a change to this wiki, you agree that your contribution is given freely to the world under %s. Yippee!') % str(cc0_link()).strip())
In this way, only the text is presented to the translators.
** In the translation file, this would look like:
#: borrow.html:114
#, python-format
msgid "There is %(count)d person waiting for this book."
msgid_plural "There are %(count)d people waiting for this book."
msgstr[0] "%(count)d personne attend ce livre."
msgstr[1] "%(count)d personnes attendent ce livre."
The top of the file declares the number of different plural forms for the language since this varies widely among languages. There is more information on plural forms support here.
- Follow the HTML instructions above, but omit the
$
. E.g.
from openlibrary.i18n import gettext as _, ungettext
_('My translated string!')
Note:
When adding Python translations, you may encounter an AttributeError
like 'ThreadedDict' object has no attribute 'lang'
.
This can happen if translated text is at the top of the file rather than nested in a function, as the gettext
function _
requires the language context (web.ctx.lang
) to be ready before it runs.
A simple workaround for this problem is nesting the desired text inside a function, either directly where it will be used or as a helper function near the top of the file.
For instance, a dict like this:
LOGIN_ERRORS = {
"invalid_email": _("The email address you entered is invalid"),
"account_blocked": _("This account has been blocked"),
"account_locked": _("This account has been locked"),
...
}
can be made translation-safe if nested in a function like so:
def get_login_error(error_key):
LOGIN_ERRORS = {
"invalid_email": _("The email address you entered is invalid"),
"account_blocked": _("This account has been blocked"),
"account_locked": _("This account has been locked"),
...
}
return LOGIN_ERRORS[error_key]
Unlike our Python code and HTML templates, there is currently no way to extract strings from our Javascript code for localization. To bypass this limitation, we have been including localized strings in a data-i18n
attribute.
As an example, the following code will display the localized Hello World
message in the .greeting-example
span when the greeting.html
template is rendered on a page.
In the greeting.html
template, we first create a dict
that will contain all of the localized UI strings that will be used by the client-side code. In this case, we only have a single string.
This dict
is then added to the root element of the template in a data-i18n
attribute. The json_encode
call simply converts a Python dict
to a JSON string.
/openlibrary/templates/greeting.html
:
$def with ()
$ i18n_strings = {
$ 'greeting': _('Hello World')
$ }
<span class="greeting-example" data-i18n="$json_encode(i18n_strings)"></span>
In the main index.js
file, we add code that initializes the .greeting-example
element if it
exists on the page.
js/index.js
:
const greetingElement = document.querySelector('.greeting-example')
if (greetingElement) {
import('greeting')
.then((module) => module.initGreeting(greetingElement))
}
In initGreeting
, we parse the value of data-i18n
and set the textContent
of the greeting
span to the localized "Hello World" string.
js/greeting.js
:
/**
* Sets text content of the given element to the localized greeting.
*
* @param {HTMLElement} greetingElement
*/
export function initGreeting(greetingElement) {
const i18nStrings = JSON.parse(greetingElement.dataset.i18n)
greetingElement.textContent = i18nStrings.greeting
}
Must DOs:
- Internationalize all user visible strings, including HTML
title
andalt
which are used - Use consistent terminology and phrasing throughout the UI to reduce the amount of text which needs to be translated
- Use meaningful mnemonic parameter names to help the translators understand the context. e.g. "%(editioncount)d editions"
- Double check that the format string types match the types of parameters being passed. Mismatches will cause errors at runtime.
- Be sure to escape embedded quotes and apostrophes, if necessary, after wrapping strings in single/double quotes. e.g. "_$('Mustn't forget to escape')"
- Be sure to remove extra dollar signs ($) when wrapping expressions with $_()
- Generate a message catalog template (messages.pot) when user visible strings are added or change
- Make changes incrementally in small batches. Multiple preprocessors (templator + babel) can make error messages obscure, so it's easier to debug if you know what changed.
DON'Ts:
- Don't display internal status / keyword values from the code directly to the user. These can't be internationalized.
- Don't do pluralization or string concatenation in code or templates. This mandates ordering in ways that can't be translated. Give the translators completed sentences or phrases, with embedded replacements, to work with so they can create natural translations.
- Don't use inline styling or links in text, if at all possible. e.g. ,
- Don't update the translated message catalogs. Because the merging process is inexact, it's better for the translators to handle this so that they can validate the results. Do update the message templates though (ie
messages.pot
) - Don't hard code in
1
for singular nouns (e.g.1 edition
) because in some languages,0 editions
is singular and translated as0 edition
. Instead, substitute a variable, as you would with the plural. E.g.$ungettext("There is %(count)d person waiting for this book.", "There are %(count)d people waiting for this book.", wlsize, count=wlsize)
and not$ungettext("There is 1 person waiting for this book.", "There are %(count)d people waiting for this book.", wlsize, count=wlsize)
Internationalization (i18n) pages are specialized pages within the Infogami platform that enable users to contribute translations. This guide outlines the process of converting a standard page into an i18n page, using the example of converting https://openlibrary.org/librarians to its English (en
) i18n version.
Follow these steps to convert a standard page into an i18n page. In this example, we'll use https://openlibrary.org/librarians as the source page to be converted into its English (en
) i18n version.
- Administrator permissions on the Infogami platform.
-
Create the English (
en
) Version of the Page- Access the page: https://openlibrary.org/librarians.en.yml?m=edit
- Copy the content from the unsuffixed page: https://openlibrary.org/librarians.yml?m=edit
- Update the
key
field to include the.en
suffix.
-
Edit the Un-Suffixed Page
- Access the unsuffixed page: https://openlibrary.org/librarians.yml?m=edit
- Remove the existing
body
content. - Change the
type
field to/type/i18n_page
.
-
Test
- Access https://openlibrary.org/librarians to ensure it displays content in English (or your desired locale).
- Access https://openlibrary.org/librarians?lang=es to verify that it displays content in Spanish (or the corresponding locale).
This concludes the process of converting a standard page into an i18n page, making it accessible in multiple languages. Ensure that you have made the necessary updates to the key
and type
fields as specified above.
Topics which could use recipes
- Setting a Cookie
- Currently Logged-in Patron
- Fetching patron's lists
- Fetching books from Open Library with Availability
- Getting a patron's S3 Keys
- Caching/memoizing a function
Need to test Open Library functionality from the command line in python? You can use this magic incantation to load the minimal Open Library app to test models, use web.ctx.site
to fetch data, and more.
First, you need to exec into the docker container and launch python:
docker compose exec -it web python
Next, use the following incantation to load Open Library and launch a minimal headless app:
import web
import infogami
from openlibrary.config import load_config
# load_config('/olsystem/etc/openlibrary.yml') # if production
load_config('config/openlibrary.yml') # if local
infogami._setup()
from infogami import config
from openlibrary import accounts
logged_in_user = accounts.get_current_user()
from openlibrary import accounts
from openlibrary.accounts.model import OpenLibraryAccount
logged_in_user = accounts.get_current_user()
username = logged_in_user and logged_in_user.key.split('/')[-1]
account = username and OpenLibraryAccount.get(username=username)
s3_keys = web.ctx.site.store.get(account._key).get('s3_keys')
import web
import infogami
from openlibrary.config import load_config
load_config('/olsystem/etc/openlibrary.yml')
infogami._setup()
from infogami import config
from openlibrary.core.vendors import AmazonAPI
web.amazon_api = AmazonAPI(*[config.amazon_api.get('key'), config.amazon_api.get('secret'),config.amazon_api.get('id')], throttling=0.9)
book = web.amazon_api.get_products(["1666568287"], serialize=True)
- Log into the host
ol-home0
docker exec -it openlibrary-affiliate-server-1 bash
curl localhost:31337/status
Or to monitor repeatedly during debugging:
% cat > ~/affiliate_status.sh
#!/bin/bash
while true
do
curl --silent localhost:31337/status | cut -c 1-90
sleep 1
done
The Internet Archive’s Open Library Fellowship is a flexible, self-designed independent study which pairs volunteers with mentors to lead development of a high impact feature for OpenLibrary.org.
Most fellowship programs last one or more months and are flexible, according to the preferences of contributors and availability of mentors. We typically choose fellows based on their exemplary and active participation, conduct, and performance within the Open Library community. The Open Library staff typically only accepts 1 or 2 fellows at a time to ensure participants receive plenty of support and mentor time.
List of Fellowship Opportunities
Occasionally, funding for fellowships is made possible through Google Summer of Code or Internet Archive Summer of Code & Design.
If you’re interested in contributing as an Open Library Fellow and receiving mentorship, you can apply using this form or email [email protected] for more information.