Home - ufarrell/sgp_phase2 GitHub Wiki

Phase 2 of the Sedimentary Geochemistry and Paleoenvironments Project (SGP) data collection ended in February 2025. A static version of the database was archived, and made available to collaborators through the SGP search website.

The Phase 2 SGP data freeze includes 126006 samples with 4132705 publicly available results.

Breakdown of total samples in the SGP database at the end of Phase 2 is as follows:

  • 129820 samples total.
  • 126006 samples are publicly available in the Phase 2 data freeze: a subset has been hidden in keeping with CARE (Collective benefit, Authority to control, Responsibility, and Ethics) principles.
  • 121047 of publicly available samples have data (4222468 results): some samples are entered into the database but not yet linked to data.
  • 120499 samples have data from analytes that are visible on the website (4132705 results): some unusual/less useful analytes have data stored in the database, but not presented on the website.

The SGP database design was inspired by several existing data models in the geological and natural history museum communities. Tables and relationships for analytical geochemistry are from the British Geological Survey (BGS) geochemistry data model, with minor modifications (Watson et al. 2014). Tables for geological, geographical and sample details are modeled on established collection management databases Specify6 and Arctos, in addition to the Observations Data Model 2, an information model for earth observations. See A. Database description for more details.

There are six primary sources of data:

  1. SGP: curated published and unpublished data from SGP Collaborative Team.
  2. USGS-NGDB: a subset of data from the USGS National Geochemical Database: Rock.
  3. USGS-CMIBS: a subset of data from the USGS Critical Metals in Black Shale project.
  4. OZCHEM: a subset of data from the Geoscience Australia OZCHEM National Whole Rock Geochemistry Dataset.
  5. AGS: six datasets from Alberta Geological Survey Digital Data.
  6. DM-SED: a subset of data compiled for the Deep-Time Marine Sedimentary Element Database of Lai et al. 2025 https://doi.org/10.5194/essd-17-1613-2025.

Samples are linked to a data sources through batches of data. The same sample can in theory be linked to multiple data sources - so far this is the case for just two USGS standards: SGR-1 and SDO-1.

Geography

19734 sites from 84 countries/oceans (for 126006 public samples)

Map of all sites by data source

The majority of sites are outcrop sites. However, core sites tend to have larger numbers of associated samples: 49% samples are from outcrop, 43% are from core (8% unknown/modern/other).

Site type - grouped by number of sites and by number of samples

Age

Samples through time, by phase (age based on interpreted age in Ma)

Samples through time, by data source (age based on interpreted age in Ma)

(114687 samples - of 126006 public samples, 11318 have no interpreted age - though note 4654 of those samples are associated with a geological age of some level, see table below)

Count samples based on interpreted age in Ma firstly, and geological age bin where interpreted age is not available.

geol age all sgp usgs-ngdb usgs-cmibs ozchem ags dm-sed
Cenozoic 17137 667 14420 591 4 9 1219
Mesozoic 23011 4174 9683 2558 6 2895 2783
Paleozoic 46488 17257 17069 6299 456 1015 2679
Phanerozoic (indet) 2 2 0 0 0 0 0
Neoproterozoic 14389 12588 85 349 366 0 778
Mesoproterozoic 7221 4460 1170 482 157 0 201
Paleoproterozoic 8050 4208 149 515 2996 0 119
Proterozoic (indet) 233 0 0 159 74 0 0
Neoarchean 1818 1040 9 432 146 0 189
Mesoarchean 389 96 0 154 100 0 34
Paleoarchean 123 29 0 52 13 0 27
Eoarchean 20 0 0 20 0 0 0
Archean (indet) 461 0 0 461 0 0 0
NULL 6664 3633 0 204 1495 273 0
TOTAL 126006 48154 42585 12276 5813 4192 8029

(note that samples are associated with data sources through batches of data, and therefore the totals for each data source will be less than the total for "all", which includes all samples in the database with or without data).

Lithology

Lithology for all samples

Lithology by data source

Data

The diagram below summarizes the count of individual results for each category of analytes. Categories are based on those used on our search website (http://sgp-search.io/). See C. Analyses for details.