outlier_layer_count - AtlasOfLivingAustralia/ala-dataquality GitHub Wiki

Superseded - outlierLayerCount

Short description

The number of environmental spatial layers for which this record has been identified as an outlier by the Reverse Jackknife method

Description

The location is an outlier detected using reverse jack-knife algorithm against one or more environmental spatial layers. The reverse jack-knife algorithm algorithm creates a curve of the expected environmental conditions for a species based on the currently known occurrence points. New records are checked against this curve for their alignment against the expected environmental range for the species. Five environmental layers are used in the ALA and this number is the count of layers where the record is an outlier.

The Reverse Jackknife method is described by Chapman A.D. 2005. Principles and Methods of Data Cleaning - Primary Species and Species-Occurrence Data,(http://www.gbif.org/orc/?doc_id=1262) who mentions that it has proved extremely reliable in automatically identifying suspect records, with a high proportion (around 90%) of those identified as being suspect, proving to be true errors. The algorithm is summarised on page 51 of the report, and is is based on a method described by Barnett and Lewis (1978).This field gives the number of layers the record has been flagged as an outlier against.

See https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/DETECTED_OUTLIER_JACKKNIFE for more information

Relevant Standards

Expert vocabulary

ALA usage

Technical description, provenance, code

https://github.com/AtlasOfLivingAustralia/biocache-store/blob/261291dffe2ea1694ae0ef29ca63f2a4e3461ae2/src/main/scala/au/org/ala/biocache/outliers/ReverseJacknifeProcessor.scala