1.1 SGP data collection - ufarrell/sgp_phase2 GitHub Wiki
SGP collaborators provide sample, locality and geological context details (using MS Excel templates), as well as published data tables and unpublished data from their own archives.
In addition some data has been extracted from relevant published studies where the authors are not directly involved and context information is coded by the SGP team using information provided in the paper.
Context
Where relevant, data standards are enforced using drop-down lists (from dictionaries/controlled vocabularies). Each sheet undergoes quality control before upload to the database. QC includes checking data types, matching to dictionaries, cleaning whitespace and special characters, and checking against existing entries for duplicates.
Contributors are encouraged to fill context sheets as completely as possible, and the user guide highlights key fields:
- section name
- section type
- latitude/longitude
- metamorphic bin
- original sample number
- height/depth in section or core
- geological unit name
- lithology
- depositional environment bin
- interpreted age
For a variety of reasons contributors do not always fill out every column, which has implications for searching and parsing the data. We recommend that users are liberal with their search terms to start, before applying more stringent filters.
Verbatim information is entered alongside dictionary-matched information in several cases, in particular for lithostratigraphic names, geological ages, depositional environment and lithology.
Data is cleaned before import, to make sure formatting is correct and dictionary terms are properly matched. However, note that we do not generally make any changes to the information itself. In some cases, therefore, the database may include apparently contradictory information - there may be legitimate geological disagreement about basin type from one study to another, for example, or different workers may apply different age models to a given formation (especially in the Precambrian, where absolute ages may be scarce and age models rely on indirect methods such as carbon isotope chemostratigraphy).
In Phase 2 some minor modifications were made to our context template to accommodate new carbonate data. This included a supplemental depositional environment dictionary ("dic_env_detail"), a dictionary of sample types (e.g. bulk, matrix (micrite), skeletal (brachiopod)) and a parent-child sample relationship.
Analytical data and methods
Analytical data is received or extracted from published studies in multiple formats, although we encourage contributors to send CSV files. Data sheets are vetted and cross-checked against published tables, and if there are obvious inconsistencies then the contributors are asked for clarification and we make any corrections necessary. QC for data tables on entry includes a comparison against data already in the database, to identify obviously incongruent values - for example, data outside existing min-max values might indicate an error in unit, or cut-paste errors/column offsets.
In the absence of obvious errors, however, the SGP view is that contributors will know their own samples and data best, and we accept data as it is submitted.
Methodological information is coded by SGP from the published papers and data tables.