Data and index architecture and implications - AtlasOfLivingAustralia/data-management GitHub Wiki

Background key points:

The Atlas database and the search index are separate data stores
All Atlas searching, record list retrieval, downloads and facets rely on the index. Only when viewing record details is a user looking at the data in the database.
There is an index that serves the production Atlas (other, older or newer, indexes may also be present as a result of other re-indexing processes).
Re-indexing uses the database to generate a new index based on what is currently in the database, the new index needs to be manually allocated to production

The two can get out of sync:

e.g.

a record can be found in the search but when the user goes to view the details nothing comes back.
Record counts don't match a recent load of data and the records cannot be found.

Implications on processing and exporting data:

data resource load, sample, process adds content to the database but not the index
delete removes records from the database and the index in the one process
exports download data from the index

Timing:

Deletes must be run against the most recent production index, running a delete while a re-index is in process or before the newly generated index is swapped to production will create a situation where records are deleted from the data base but still appear in the index.
Live indexing (not recommended) must be run against the most recent production index, running a live index while a re-index is in process or before the newly generated index is swapped to production will create a situation where the data is in the data base but not in the index
Load, sample, process steps can be run at any time to add data to the data base, the records will not appear in the index until the next re-index is completed and the index allocated to production
Downloads should be run following a re-index