Help - GlobalNamesArchitecture/gnlist-resolver-gui GitHub Wiki
Scientific Names List Resolver
Introduction
gnlist-resolver-gui or Scientific Names List Resolver is an app that allows you to upload a file containing scientific names and match it with scientific names from a data set (for example Catalogue of Life,
IPNI, ZooBank etc)
Features
-
It allows to compare large lists of names (up to 100,000 names) all at the same time and returns result either in CSV or Excel-compatible XLSX format
-
It returns important statistics about the match -- was it exact or fuzzy, edit distance for fuzzy matches, confidence score for the match, classification and id from another resource (if they are available)
-
For fuzzy match in XLSX format it highlights the difference between matched names

Usage
-
Prepare your file by saving in CSV format using UTF-8 encoding. There are several supported formats for a file, and there is a good chance that you will only need to change headers according to the list of supported terms.
-
When names are represented as one string, modify capitalizations of the words to correspond to nomenclatural rules (for example convert
ECHINISCOIDES Sigismundi Groenlandicus KRISTENSEN and HALLAS, 1980toEchiniscoides sigismundi groenlandicus KRISTENSEN and HALLAS, 1980). Note that the authors names might be capitalized. -
Upload the file
-
Check the headers. The headers recognized by the app will appear with a green background. All other headers will ignored by the matching process. You can delete erroneous matches, or add a new match. Note that there are two possible workflows:
- The name is given as a single string (
scientificNameterm is present) - The name is split into parts (
genus,specificEpithetterms are present)
- The name is given as a single string (
-
Pick a source that you want to use for name matching and select other settings, if available
-
Get a break, and watch statistics of your match updated dynamically.
-
When all is done (of after pushing the
Cancelbutton) download results of the match in CSV or Excel format
Supported Terms for the Headers
subKingdom
subPhylum
superClass
subClass
cohort
superOrder
subOrder
infraOrder
superFamily
subFamily
tribe
subTribe
subGenus
section
subSpecificEpithet
variety
form
Input file format
- Comma Separated File with names of fields in the first row.
- Columns can be separated by tab, comma or semicolon
- At least some columns should have recognizable fields, unused fields won't hurt the process
- Comma or semicolon-separated values need to be bordered by double quotes if there are commas or semicolons inside the value
taxonID kingdom phylum class order family genus species
subspecies variety form scientificNameAuthorship scientificName
taxonRank
simplest Example -- only scientificName
| scientificName |
|---|
| Animalia |
| Macrobiotus echinogenitus subsp. areolatus Murray, 1907 |
taxonID and scientificName Example
taxonID;scientificName
1;Macrobiotus echinogenitus subsp. areolatus Murray, 1907
...
| taxonID | scientificName |
|---|---|
| 1 | Animalia |
| 2 | Macrobiotus echinogenitus subsp. areolatus Murray, 1907 |
Rank Example
taxonID;scientificName;taxonRank
1;Macrobiotus echinogenitus f. areolatus Murray, 1907;form
...
| taxonID | scientificName | taxonRank |
|---|---|---|
| 1 | Animalia | kingdom |
| 2 | Macrobiotus echinogenitus subsp. areolatus Murray, 1907 | subspecies |
Family and Authorship Example
taxonID;family;scientificName;scientificNameAuthorship
1;Macrobiotidae;Macrobiotus echinogenitus subsp. areolatus;Murray, 1907
...
| taxonID | family | scientificName | scientificNameAuthorship |
|---|---|---|---|
| 1 | Animalia | ||
| 2 | Macrobiotidae | Macrobiotus echinogenitus | Murray |
Fine-grained Example
TaxonId;kingdom;subkingdom;phylum;subphylum;superclass;class;subclass;cohort;superorder;order;suborder;infraorder;superfamily;family;subfamily;tribe;subtribe;genus;subgenus;section;species;subspecies;variety;form;ScientificNameAuthorship
1;Animalia;;Tardigrada;;;Eutardigrada;;;;Parachela;;;Macrobiotoidea;Macrobiotidae;;;;Macrobiotus;;;harmsworthi;obscurus;;;Dastych, 1985
| TaxonId | kingdom | subkingdom | phylum | subphylum | superclass | class | subclass | cohort | superorder | order | suborder | infraorder | superfamily | family | subfamily | tribe | subtribe | genus | subgenus | section | species | subspecies | variety | form | ScientificNameAuthorship |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 136021 | Animalia | Pogonophora | |||||||||||||||||||||||
| 136022 | Animalia | Pogonophora | Frenulata | Webb, 1969 | |||||||||||||||||||||
| 565443 | Animalia | Tardigrada | Eutardigrada | Parachela | Macrobiotoidea | Macrobiotidae | Macrobiotus | harmsworthi | obscurus | Dastych, 1985 |
You can take and modify example files to suite your needs
Output file format
Output includes the following fields:
| Field | Description |
|---|---|
| taxonID | original ID attached to a name in the checklist |
| scientificName | name from the checklist |
| matchedScientificName | name matched from the GN Reolver data source |
| inputCanonicalForm | canonical form of the input name |
| matchedCanonicalForm | canonical form of the matched name |
| editDistance | for fuzzy-matching -- how many characters differ between checklist and data source name |
| rank | rank from the source (if it was given/inferred) |
| matchedRank | corresponding rank from the data source |
| matchType | what kind of match it is |
| score | heuristic score from 0 to 1 where 1 is a good match, 0.5 match requires further human investigation |
| matchTaxonID | the ID of matched name |
| classification | a hierarchy path for the matched name |