Developer's Starter Guide - internetarchive/openlibrary GitHub Wiki

Audience

The intended audience of this guide are first time contributors who are trying to understand the Open Library codebase.

Prerequisites

Before starting this guide, it is recommended that you have:

  1. cloned or forked and pulled an up-to-date version of the Open Library repository
  2. successfully launched your local environment using the docker guide
  3. completed the git cheat sheet tutorial
  4. found a good first issue to work on

Now you're ready to understand how the code is structured within Open Library.

Approach

Working backwards from what you see on the site. In this guide, we're going to work backwards from what visitors see on the Open Library website and then explore the sequence of actions (the "lifecycle") that takes place in order for the html to be served.

The Anatomy of the Site

HTML Templates

When a patron navigates through the Open Library website and requests pages (like the screenshot pictured below), the patron will be served "rendered" html content from the openlibrary/templates directory.

Screenshot 2024-09-04 at 6 12 47 AM

Here are a few examples of common templates:

Note

Some pages, like the /books page, are first-class "registered" Infogami types and are powered through the Infogami wiki framework that runs Open Library. This is true for books, authors, lists, and a few other types. Because these are first-class registered types handled by infogami, You won't find plugin controllers for them in Open Library. Their html templates can be found in the special templates/type directory. When Open Library notices a url containing a key for one of these special types, infogami fetches the corresponding object from the database and passes it as page straight into the template. So for the books page, which is templates/type/edition/view.html, the page value defined in its $def with (page, ...) header is actually a book edition or work object whose properties are defined here.

You may notice that most the html pages within the templates folder don't include the <html>, <head>, and <footer> sections of the website, just the contents of the body. This is because Open Library is set up such that nearly all page templates get automatically wrapped by the site template.

The Site Wrapper

While the body of each Open Library page may present differently, most pages share similar scaffolding, look, and feel. For instance, nearly every page will have:

  1. A top black bar topNotice with an Internet Archive logo:
    Screenshot 2024-09-04 at 6 26 30 AM
  2. A header header#header-bar section with the Open Library logo, a search box, a hamburger menu, and an account dropper for logged in patrons:
    Screenshot 2024-09-04 at 6 27 23 AM
  3. A footer menu:
    Screenshot 2024-09-04 at 6 27 53 AM

The top-level html page that is used to define and render the overall structure of every Open Library page is templates/site.html. In its definition, this main site template calls out to other modular, specific "sub-templates" (such as the 3 described above) defined within the templates/site and templates/lib directories, which combine to form the "site". You can read more about this "layout template" philosophy in the official web.py documentation, the micro web framework used by Open Library.

The variable section of the site that changes depending on the requested page is the body. This page is an html template that gets passed in to the site. You may wish to review examples in the Page Templates section.

How a Page Gets Rendered

When we want to serve a rendered template to a visitor, we typically use the render_template() function. The first argument of render_template will be be the name of the page-specific html template we want to be rendered and injected into the site body. The rest of the parameters to render_template, after the filename of the template, are used to pass variables from python into the corresponding html template file, as defined in the template's header.

For instance, imagine we have a simple template called templates/book.html which requires a book_title, a bookcover_url, and an author_name. Such a template might look like:

$def with (book_title, bookcover_url, author=None)

<div>
  <h1>$book_title</h1>
  <img src="$bookcover_url"/>
  $if author:
    <h2>$author</h2>
</div>

The $def with(...) line is how the template declares what variables it needs to be passed in when it is being called and rendered. Elsewhere in the template, the $ symbol acts as an instruction that a variable should be replaced by its value when it's being rendered.

Check the official web.py templator documentation if you'd like a deep dive into all the capabilities of Open Library's python-powered templating system.

Caution

The $: combination instructs the template to render this variable as html, as opposed to plaintext. This is sometimes necessary when rendering one html template from another, but should be done with care when rendering a variable whose value may be user-specified because this may result in dangerous XSS attacks, where a bad actor may intentionally save javascript code into a variable, hoping that the website will render and execute this code as html as opposed to safe plaintext.

Note

Notice that author in this example is provided as a keyword argument with a default value of None and thus is optional when the template is being called. Similar to when defining a python function, once a keyword argument is defined, all following variables after it must also then be keyword arguments.

From python, the corresponding code to render this templates/books.html template would be:

render_template(
  'books',  # name of the template in templates/ directory without .html
  book_title="The Hobbit",
  bookcover_url="https://covers.openlibrary.org/b/id/14624642-L.jpg",
  author="J.R.R. Tolkien"
)

Tip

When we call render_template, it knows to look in the templates/ directory and so we don't need to specify this. Also, we don't need to include the .html extension.

When we call render_template('book', ...) we are fetching the templates/book.html template and passing in the values book_title, bookcover_url, and author from python into the template's $def with (...) section, where it can be substituted as needed into the $variables in the template. The render_template takes this prepared book template and then passes it into site.html where it will be forwarded to the body to be rendered.

Managing Large Templates

When a single template becomes too large and unmanageable, or when the programmer recognizes an opportunity to clean up a template by factoring-out common or shared pieces of logic, they may choose to break a template into smaller logical components and move them into their own separate micro-templates. These micro-templates can live beside other templates in the templates/ directory. If these micro-templates are very self-contained and make sense to be rendered as their own widget, such the QueryCarousel, you might consider saving your template in the macros directory, which is folder of templates that have a few additional special properties.

When refactoring programmer may decide it makes sense to move the code to the macros directory. A macro is simply a special type of template that can be accessed using {{}} syntax by the page editor in Open Library. You can read more about macros in our very stale/outdated infogami documentation.

Here's what the infogami edit UI looks like for a collection: Screenshot 2024-09-04 at 8 31 02 AM

Here's what the page looks like when it's rendered with a macro: Screenshot 2024-09-04 at 8 31 28 AM

Next: Understanding controllers & routers

So far, by extrapolating from what's written in this guide, we hope you've learned how you may be able to:

  1. Find the corresponding html template file for an Open Library webpage somewhere in the templates/ directory
  2. How to make a change to the site wrapper, should you need to e.g. change the header or footer
  3. In theory, add a new template and render it... But from where?

The next step is learning where to add our render_template(...) code and how to connect a url that a patron visits with logic to respond to this request.

To answer this question, you'll want to proceed to the Plugins Guide.

⚠️ **GitHub.com Fallback** ⚠️