Search - AtlasOfLivingAustralia/profile-hub GitHub Wiki

Profiles uses Elastic Search as the underlying search engine.

Every time a domain entity is added/updated/deleted in MongoDB, the ES index is updated automatically by the Grails ElasticSearch plugin.

Domain classes are mapped to the ES index via the static searchable = {} closure in each relevant domain class. Some notes/gotchas about this:

  • Profiles uses only 1 index, but the default for the grails ES plugin is to map each domain class to a separate index. We avoid this by specifying root = false in all searchable domain classes except Profile (which is the root for the index)
  • All associated and embedded objects need to be mapped as components so they end up as nested objects in the ES index (otherwise they result in reference objects, which we don't want)
  • There is a bug in the plugin when saving associated/embedded objects: if the object is not the index root then the parent object (Profile) is not updated. This has been fixed in a pull request, and a work-around put into Profile Service (see the AuditEventListener class in profile-service)
  • We need to support case-insensitive 'exact' match queries against name fields. "Exact match" in ES usually means a not_analysed term query, whereas "case insensitive" usually means an analysed match query, but match queries split search terms into individual words. The best solution is to defined a custom analyser that uses the keyword tokenizer (which treats terms as a single token rather than a token per word), and the lowercase filter. This is a bit difficult when using the grails plugin. The fallback is to add some transient methods to the domain class which return the lower-case version of the fields where we want to do a 'case insensitive exact match', and to map them as not_analysed.

There are two levels of search in Profiles: Free-text and Name.

Free-text search

Searches for the provided text in the following fields:

  • scientificName (aka profile name)
  • matchedName
  • archivedWithName (only if includeArchived = true - this is the original name the profile had at the time that it was archived, at which point the profile name is changed to [scientificName] Archived on...
  • all user-entered attributes

All fields are analysed with the default ES analyser.

Result ranks are boosted for matches against the scientificName, matchedName or name-related attributes. Otherwise, the default ES relevance scoring is used.

Name search

Searches for the provided text in the following fields:

  • scientificName (aka profile name)
  • matchedName
  • archivedWithName (only if includeArchived = true - this is the original name the profile had at the time that it was archived, at which point the profile name is changed to [scientificName] Archived on...
  • all user-entered attributes that have been flagged as being 'Name' attributes via the Collection Administration page's Attribute Vocabulary section.

ScientificName and matchedName are matched using a term query where the only modification of the search term is to use a case-insensitive search. The ES analysers that are used in the free-text search will attempt to cater for spelling errors and punctuation differences: these are NOT carried out by the ES search. However, they are expected to be covered by the name matching steps - the ALA steps definitely cover this requirements, but I'm not sure if the NSL matching does.

NOTE: As of 20/6/16, the two NSL steps in the diagram above are NOT fully implemented - a new version of the NSL search service is under development, and will be used as soon as it is available, and there will be additional future work required to support multiple taxonomic trees from the NSL.