Search Architecture - ccnmtl/footprints GitHub Wiki

Infrastructure

The Footprints online presence is built using Django, the popular, open-source Python web framework, with a Postgres backend data store. We've incorporated a Solr engine to facilitate fast and robust search capabilities, with particular emphasis on full-text search and facets. Data is composed using the Haystack library and indexed asynchronously through a queued infrastructure.

Workflow

As application editors create, modify and remove data, changes are stored in the primary Postgres database. A Haystack save trigger queues update requests through RabbitMQ, a high-performance, lightweight message broker. Queuing the update request removes the index operation from the user-interface thread, protecting the editor from slow operations or possible errors. RabbitMQ delivers the request to a Celery task worker, which completes the update to the Solr platform.

Data

The Haystack library abstracts the Django application away from the details of a particular search engine. Haystack supports pluggable backends, including Solr, ElasticSearch, Whoosh and Xapian, to attain code portability between search engine flavors.

Searchable application models are each backed by a Haystack SearchIndex class. Each SearchIndex class specifies a text field that represents the primary field for searching within. Haystack nicely leverages Django templates to help build the document the search engine will index. So, the Footprint model is backed by the FootprintSearchIndex class, and uses this template to compose the aggregate text field.

Example

The Cuzari footprint for the written work Kuzari looks (very roughly) like this in the Postgres database.

Id Footprint Footprint Date Footprint Location Owners Imprint Imprint Date Imprint Location
31 Cuzary 1752 - 1784 The Hague, Netherlands Saruco, Salomon Kuzari as Cuzary 1663 Amsterdam, Netherlands

And, looks like this in Solr, ready to be searched. The text field includes elements from this instance's title notes, written work title, associated actor names, the footprint creator, imprint date and location, etc. (See the template for a full list of fields.)

{
        "id":"main.footprint.31",
...
        "object_type":"Footprint",
        "text":"Cuzary\n\nKuzari\nThe Hague Netherlands\nAmsterdam Netherlands\n
Historical Copy\nNone\n\nSaruco, Salomon (Owner)\n\n\nJudah, ha-Levi (Author)\n\n
Judah ha-Levi as Judah Halevi (Author)\n\n    
Judah ha-Levi as Yehudah ha-levi (Author)\n\nAdam Shear\n",
        "title":"Cuzary",
        "sort_by":"cuzary",
        "work_id":"12",
        "imprint_id":"26",
        "book_copy_id":"24",
        "book_copy_identifier":"12-26-24",
        "imprint_location":["9"],
        "imprint_location_exact":["9"],
        "imprint_location_title":["Amsterdam, Netherlands"],
        "imprint_location_title_exact":["Amsterdam, Netherlands"],
        "pub_start_date":"1663-01-01T00:00:00Z",
        "pub_end_date":"1663-12-31T00:00:00Z",
        "footprint_location":["14"],
        "footprint_location_exact":["14"],
        "footprint_location_title":["The Hague, Netherlands"],
        "footprint_location_title_exact":["The Hague, Netherlands"],
        "footprint_start_date":"1752-01-01T00:00:00Z",
        "footprint_end_date":"1784-12-31T00:00:00Z",
        "actor":["40",
          "100",
          "9407",
          "10270"],
        "actor_exact":["40",
          "100",
          "9407",
          "10270"],
        "actor_title":["Saruco, Salomon (Owner)",
          "Judah, ha-Levi (Author)",
          "Judah ha-Levi as Judah Halevi (Author)",
          "Judah ha-Levi as Yehudah ha-levi (Author)"],
        "actor_title_exact":["Saruco, Salomon (Owner)",
          "Judah, ha-Levi (Author)",
          "Judah ha-Levi as Judah Halevi (Author)",
          "Judah ha-Levi as Yehudah ha-levi (Author)"],
...
}

UI

Our primary search interface allows users to search for Footprints starting with a full-text search, footprint date or imprint date.

Using our example above, you can see that searching for keywords used in the text field, such as "Hague" or "Judah" will return the Cuzari footprint among others.

Once a user completes the initial search, a set of filters is presented to narrow the result set. Reviewing the FootprintSearchIndex class, you'll see that imprint location, footprint location and actors are faceted allowing Solr to aggregate those fields and provide matching counts.

An example search for Kuzari returns 57 footprints. The Imprint Location facet then shows that 25 of the footprints where created in Venice. Checking the Venice, Italy option will narrow the result set to just those 25 matches.

 Venice, Italy (25)
 Basel, Switzerland (13)
 Fano, Italy (5)
 London, United Kingdom (4)

Diagram

The diagram below (with cats!) demonstrates the workflow.

Diagram of the Footprints search flow