Search Architecture - ccnmtl/footprints GitHub Wiki
Infrastructure
The Footprints online presence is built using Django, the popular, open-source Python web framework, with a Postgres backend data store. We've incorporated a Solr engine to facilitate fast and robust search capabilities, with particular emphasis on full-text search and facets. Data is composed using the Haystack library and indexed asynchronously through a queued infrastructure.
Workflow
As application editors create, modify and remove data, changes are stored in the primary Postgres database. A Haystack save trigger queues update requests through RabbitMQ, a high-performance, lightweight message broker. Queuing the update request removes the index operation from the user-interface thread, protecting the editor from slow operations or possible errors. RabbitMQ delivers the request to a Celery task worker, which completes the update to the Solr platform.
Data
The Haystack library abstracts the Django application away from the details of a particular search engine. Haystack supports pluggable backends, including Solr, ElasticSearch, Whoosh and Xapian, to attain code portability between search engine flavors.
Searchable application models are each backed by a Haystack SearchIndex class. Each SearchIndex class specifies a text
field that represents the primary field for searching within. Haystack nicely leverages Django templates to help build the document the search engine will index. So, the Footprint model is backed by the FootprintSearchIndex class, and uses this template to compose the aggregate text
field.
Example
The Cuzari footprint for the written work Kuzari looks (very roughly) like this in the Postgres database.
Id | Footprint | Footprint Date | Footprint Location | Owners | Imprint | Imprint Date | Imprint Location |
---|---|---|---|---|---|---|---|
31 | Cuzary | 1752 - 1784 | The Hague, Netherlands | Saruco, Salomon | Kuzari as Cuzary | 1663 | Amsterdam, Netherlands |
And, looks like this in Solr, ready to be searched. The text
field includes elements from this instance's title notes, written work title, associated actor names, the footprint creator, imprint date and location, etc. (See the template for a full list of fields.)
{
"id":"main.footprint.31",
...
"object_type":"Footprint",
"text":"Cuzary\n\nKuzari\nThe Hague Netherlands\nAmsterdam Netherlands\n
Historical Copy\nNone\n\nSaruco, Salomon (Owner)\n\n\nJudah, ha-Levi (Author)\n\n
Judah ha-Levi as Judah Halevi (Author)\n\n
Judah ha-Levi as Yehudah ha-levi (Author)\n\nAdam Shear\n",
"title":"Cuzary",
"sort_by":"cuzary",
"work_id":"12",
"imprint_id":"26",
"book_copy_id":"24",
"book_copy_identifier":"12-26-24",
"imprint_location":["9"],
"imprint_location_exact":["9"],
"imprint_location_title":["Amsterdam, Netherlands"],
"imprint_location_title_exact":["Amsterdam, Netherlands"],
"pub_start_date":"1663-01-01T00:00:00Z",
"pub_end_date":"1663-12-31T00:00:00Z",
"footprint_location":["14"],
"footprint_location_exact":["14"],
"footprint_location_title":["The Hague, Netherlands"],
"footprint_location_title_exact":["The Hague, Netherlands"],
"footprint_start_date":"1752-01-01T00:00:00Z",
"footprint_end_date":"1784-12-31T00:00:00Z",
"actor":["40",
"100",
"9407",
"10270"],
"actor_exact":["40",
"100",
"9407",
"10270"],
"actor_title":["Saruco, Salomon (Owner)",
"Judah, ha-Levi (Author)",
"Judah ha-Levi as Judah Halevi (Author)",
"Judah ha-Levi as Yehudah ha-levi (Author)"],
"actor_title_exact":["Saruco, Salomon (Owner)",
"Judah, ha-Levi (Author)",
"Judah ha-Levi as Judah Halevi (Author)",
"Judah ha-Levi as Yehudah ha-levi (Author)"],
...
}
UI
Our primary search interface allows users to search for Footprints starting with a full-text search, footprint date or imprint date.
Using our example above, you can see that searching for keywords used in the text
field, such as "Hague" or "Judah" will return the Cuzari footprint among others.
Once a user completes the initial search, a set of filters is presented to narrow the result set. Reviewing the FootprintSearchIndex class, you'll see that imprint location, footprint location and actors are faceted allowing Solr to aggregate those fields and provide matching counts.
An example search for Kuzari
returns 57 footprints. The Imprint Location facet then shows that 25 of the footprints where created in Venice. Checking the Venice, Italy option will narrow the result set to just those 25 matches.
Venice, Italy (25)
Basel, Switzerland (13)
Fano, Italy (5)
London, United Kingdom (4)
Diagram
The diagram below (with cats!) demonstrates the workflow.