Things to check for a new bie index - AtlasOfLivingAustralia/bie-index GitHub Wiki

Preparation

This is assuming that you are doing a full switch-over.

Mostly obsolete we can do a reprocess and switch inline now.

  • Check that there is enough space available. You will need (as of 2016) the following:
    • 700Gb on the system that runs cassandra for a new cassandra snapshot
    • 1Gb on the system that runs the BIE for a new BIE
    • 2Gb on any system that has the lucene name index
    • Plenty of swap space on the system that runs biocache-cli; at least 32Gb
    • 500Gb on the system that runs biocache-cli for a newly generated solr index
    • 20Gb on the system doing sampling (probably biocache-cli)
    • 300Gb on the system that runs solr
  • Make a cassandra snapshot on the production system using nodetool snapshot occ
  • Get ssh-agent running on the production cassandra system using eval ssh-agent`` and then ssh-add
  • Copy all of the cassandra keyspace snapshots across to the new cassandra system (not just occ)
    • This will take about 8 hours
  • Clear the snapshot with nodetool clearsnapshot
  • Make a copy of the solr index on the production system using tar cvzf ./03-07-2016-17-01/ solr-03-07-2016-17-01.tgz using the correct timestamp instead of 03-07-2016-17-01
  • Copy the tgz over to the new solr server
    • This will take about 2 hours
  • Stop the cassandra server, move the new cassandra database over to the cassandra data area and restart cassandra
  • Untar the solr index, add it to the solr core list and swap it with the existing index
  • Edit the preferred image list on the lists server, if you have anything to add
  • Rebuild the BIE.
    • Import from the collectory. Do this first to ensure that attribution can be found
    • Import the assorted DwCAs from the import directory. Clear the first one.
    • Import the layers and regions
    • Import the wordpress pages
    • Run the link identifier matching
    • Run the image import
    • Run the occurrence count import
  • Swap the online and offline bie cores in solr
  • Run the rematch on the list server - via https://lists.ala.org.au/admin/speciesLists
  • Run the rematch on the SDS server
  • Run biocache update-conservation-data and biocache update-habitat-data to update conservation list status
  • Clear the caches on the biocache web services
  • Re-process the entire biocache
  • Re-index the entire biocache
  • Swap the biocache indexes
  • ???
  • Profit!

Checks

Biocache

  • Ensure a blank simple search on the biocache shows the same number of records between the old and new systems
  • On the advanced search page, do a search on the following
    • Taxon Macropus (needs correct capitalisation at the moment) should show all records of genus Macropus
      • And not frogs, plants or any other riff-raff
    • Text seach Macropus should show all records of genus Macropus and a scattering of other species with the name macropus in them scattered across the kingdoms. There should be Plantae and Fungi amongst them
    • Search for Raw/Provided Scientific Name Osphranter rufus should turn up assorted red kangaroos
    • Search for Raw/Provided Scientific Name Acacia dealbata should turn up assorted silver wattles
      • An occurrence of Acacia dealbata subsp. dealbata should be present and linked to the correct taxon ID
    • Search for species group Mammals should not have any weird kingdoms, phyla or classes floating about
    • Search for the Australian Museum Entomology Collection should mostly show class Insecta, kingdom Animalia etc. There will be some errors as stuff like Genus nov. gets mismatched and some molluscs and other oddities.
    • Search for country Afghanistan shows some specimens in collections
      • Switching to the map view shows a scatter of occurrences in Afghanistan
    • Search for state/territory Australian Capital Territory shows a list of records from the ACT
      • Switching to the map view shows occurrences in the ACT (and Jervis Bay)
    • Search for type status holotype shows a number of holotypes
    • Search for record type fossil specimen shows assorted fossils
    • Search for dataset name ACT BioBlitz should pop up two possible dataset names (one ACT BioBlitz and one ACT Bioblitz Moth Survey). A search for ACT Bioblitz Moth Survey should produce a list of moths
    • Search for catalog number K.446132 should produce an occurrence for Agrotis infusa
    • Search for record number 6653 should produce a number of records with collecting number 6653
    • Search from begin date 2016-05-01 or similar should produce a list of records with record date on or after that date
    • Search with end date 1900-01-01 should produce a list of mouldy old records
  • On the batch taxon search page, a search for Acacia and Macropus should produce a mixture of records.
    • Filtering the records by kingdom should narrow down to the appropriate genus. More or less, since the search is on raw/provided name.
  • On the catalog number search, searching for K.446132 and K.446133 should return two records.
  • On the spatial search, searching for a circular area should produce a suitable range of species and the map view should show suitable results. The record images and charts should also show something for a decent area.
  • Search for taxon Macropus
    • Filter by scientific name
    • Filter by species
    • Filter by subspecies
    • Filter by rank to subspecies
    • Filter by parent-child synonym and look for Macropus robustus robustus
    • Filter by common name
    • Filter by specimen type
    • Filter by country of Papua New Guinea and check map view
    • Filter by National Dynamic Land Cover and check map view
    • Filter by Local Government Area and check map view
    • Filter by Sensitive generalised and check for Macropus irma
    • Filter by month
    • Filter by State conservation vulnerable and check for Macropus parma
    • Filter by record type Image and check for image gallery
    • Filter by multimedia Image and check for image gallery
    • Filter by institution
  • Search for taxon Climacteris picumnus
    • Ensure that individual occurrence records have a conservation status of endangered for the ACT
  • Search for State ACT
    • Filter by State Conservation: Vulnerable and ensure that there are approximately 2 species listed
    • Filter by State Conservation: Endangered and ensure that there are approximately 30 species listed (there should be 36 but some may not have occurrences)
    • The state conservation list for ACT is at http://lists.ala.org.au/speciesListItem/list/dr649

BIE

  • Search for Acacia dealbata and see of the Autofill shows first Acacia for Acacia and then Acacia dealbata shows in the pick list.
    • Acacia dealbata Link should be the first search result to show
  • On the Acacia dealbata Link page
    • The map should show concentration
    • The records list should be populated
    • The gallery should have images
    • The names should link to APC and NZOR. There should be multiple synonyms and common names.
    • The classification should show higher and lower classification
    • The records graphs should be populated
    • The literature and sequences pages should produce results
      • Several results should appear for the BHL and Trove
    • The data partners record should be populated
    • Pick a record. Navigate to the record and back from the link to the species name. You should get back where you started.
  • Search for Lophochroa leadbeateri
    • Expert distribution should show on overview page
    • No results should show for BHL
  • Search for Megaptera novaeangliae
    • Conservation status should show as AUS Vulnerable, NT Least Concern, QLD Vulnerable, etc.
  • Search for Bilby
    • The link should go to Macrotis lagotis (Reid, 1837)
    • Conservation status should show for multiple records
  • Search for Blue Gum
    • Multiple common and place names should appear
  • Search for Platypus
    • The common name for Ornithorhynchus anatinus should show first, with an iconic species marker, followed by weird beetles and other stuff.
  • Search for Kangaroo
    • Common names for Osphranter rufus, Dendrolagus lumholtzi, Macropus giganteus should appear first.
  • Search for Grey Kangaroo
    • Macropus giganteus and Macropus fuliginosus should be in the first few results
  • Search for Bungendore
    • Multiple locations should appear
    • Clicking on a location should show a map and species list for Bungendore in Explore your Area
  • Search for Biodiversity
    • Multiple data resources, data providers, institutions and spatial layers should appear
    • Filter on Data Resource and choose the first result
      • Should redirect to collectory resource
    • Filter on Institution and choose the first result
      • Should redirect to collectory resource
    • Filter on Layer and choose the first result
      • Should redirect to spatial toolbox
  • Search for Salix
    • The first two results should be from Animalia and Plantae
    • Filter on taxonomic rank
    • Filter on image available
    • Filter on taxonomic status
  • Search for Macropus rufus
    • The first result should show an accepted name of Osphranter rufus
    • Clicking on the Synonym entry should show the synonym page with a link to Osphranter rufus