Data and index architecture and implications - AtlasOfLivingAustralia/data-management GitHub Wiki

Background key points:

  • The Atlas database and the search index are separate data stores
  • All Atlas searching, record list retrieval, downloads and facets rely on the index. Only when viewing record details is a user looking at the data in the database.
  • There is an index that serves the production Atlas (other, older or newer, indexes may also be present as a result of other re-indexing processes).
  • Re-indexing uses the database to generate a new index based on what is currently in the database, the new index needs to be manually allocated to production

The two can get out of sync:

e.g.

  • a record can be found in the search but when the user goes to view the details nothing comes back.
  • Record counts don't match a recent load of data and the records cannot be found.

Implications on processing and exporting data:

  • data resource load, sample, process adds content to the database but not the index
  • delete removes records from the database and the index in the one process
  • exports download data from the index

Timing:

  • Deletes must be run against the most recent production index, running a delete while a re-index is in process or before the newly generated index is swapped to production will create a situation where records are deleted from the data base but still appear in the index.
  • Live indexing (not recommended) must be run against the most recent production index, running a live index while a re-index is in process or before the newly generated index is swapped to production will create a situation where the data is in the data base but not in the index
  • Load, sample, process steps can be run at any time to add data to the data base, the records will not appear in the index until the next re-index is completed and the index allocated to production
  • Downloads should be run following a re-index