4. Input files - nrsalinas/ackbar GitHub Wiki

User-provided files and other input data

The user is required to assemble all the data needed for the analysis—biological and geographical—, into several files. Some files are tables (format csv), others are ESRI shapefiles. Bear in mind that all geographic information (location coordinates, polygon shapes, etc.) should be projected on the EPSG 4326 coordinate reference system. Failure to do so will produce spurious results.

Examples of all types of input files can be found in the Github repository of the project.

An important note about csv files. Information standards have been created to ease communication; therefore, it is strongly encourage to follow the format definition recommended for input files, otherwise it is not guaranteed the analysis will be executed as intended. Most of the input files are data tables following the csv format. Under such definition, columns are delimited by commas; use of other character as delimiter is a practice that does not conform to the standard. The standard also allows the usage of double-quotes to enclose complex text strings within a cell (v.g., a string that includes commas or line breaks). However, I strongly recommend you to avoid the use of commas or line breaks as part of a single cell text, this practice sometimes breaks the execution of the program, particularly when you are using multiple OS in your analytic pipeline.

Distribution file

All data points corresponding to natural occurrences of the species should be parsed as a csv file. It should contain three columns:

  • Taxon: Species name. Should not contain more than 105 characters. This should be the first column in the file.
  • Longitude: Longitudinal coordinate of the locality, in decimal format.
  • Latitude: Latitudinal coordinate of the locality, in decimal format.

IUCN categories file

A .csv file that contains the IUCN evaluation information required of the species to be analyzed. All the info of a given species should be presented in a single row. Species not included in this file will be assumed to be Least Concern (LC).

The file should have three columns:

  • Taxon: Name of the taxon.
  • Category: Standardized IUCN code of the category: CR, EN, VU, NT, LC. Some categories are not allowed (EX, EW, DD and NE) because taxa under such categories should not be used to delimit KBA. See the Introduction page for a detailed explanation.
  • Criteria: This field should contain the standardized code of the assessment criteria (v.g., A2c, B2(i,ii,iii), `A2ae;B1+2ab(iii,v);C2a(i)). Be aware that the only letters that should be capitalized are criteria; sub-criteria, thresholds, and parameter types should all be lowercase, if applicable.

Taxonomic groups file

Application of criterion B2 requires the selection of a taxonomic rank above species, and the specification of range sizes for each species (the latter can be relaxed though, see below). Choosing a taxonomic rank also implies selecting a taxonomic classification scheme. Depending on the group of study, the latter can be an ambiguous choice because there may be several competing classification systems supported by the taxonomic community. To avoid any assumption that could mislead the analysis, the user has to assign each taxon to a taxonomic group above species, and provide the estimated global diversity of each group (number of species). This information can be provided through two csv files, one listing the species and their corresponding taxonomic group, and another with the group diversity estimates.

The species-to-group file should contain three columns:

  • Taxon: Name of the species.
  • Group: Name of the taxonomic group (genus, family, class, etc.) under which the species is classified.
  • Range_size: Size of the distribution range of the species, in square kilometers. This field is optional. If missing, the range will be estimated from the Extent of Occurrence.

The file with group diversity estimates should have only two columns:

  • Group: Name of the taxonomic group.
  • Global_species: Diversity in number of accepted species.
  • Range_threshold: If geographic ranges of all species of the group are known, the user should select a threshold such that 25% of all the species of the group have smaller ranges. If range information is incompletely known for the group in question, this field should be left blank.

KBA shapefile directory

Path to a directory. Within this directory the user should have saved all the ESRI shapefiles with the polygons of previously delimited KBAs. All polygons in this file should share a unique, indexed field. The name of that field should be annotated in the configuration file, option "kba_index".

KBA trigger species table

A .csv file with the list of species supporting each KBA previously delimited. Each row should bear the information of a single pair species-KBA. There are two mandatory fields: a column with species names (scientificName) and another with a key that associates the species to a KBA. The name of the latter should be the same token employed as indexing field in the the KBA shapefile(s), and included into the configuration file through the option "kba_index".

⚠️ **GitHub.com Fallback** ⚠️