2. USGS‐NGDB - ufarrell/sgp_phase2 GitHub Wiki

Details of USGS National Geochemical Database: Rock available here: https://mrdata.usgs.gov/ngdb/rock/

The USGS-NGDB portion of the data includes 42585 samples with 1616032 results from 10715 sites in the United States.

Geography

Age

40% are from the Paleozoic, 34% from the Cenozoic, 23% from the Mesozoic and 3% from the Proterozoic

Lithology

19% are sandstone, and 14% are shale. 28% do not have a specific lithology (although lithological details may be available in verbatim fields).

Data

Data were entered in batches based on USGS-NGDB table names, units and methods (e.g. USGS table xtbIcpaesChem, total digestion, percent values, whole (none below or above detection)). Summary of batches (sample counts, results counts, analytes) here.

Categories are based on those used on our search website (http://sgp-search.io/).

Completeness

Data Collection/Processing

Samples were filtered to include only sedimentary samples (i.e. samples with the xndryclass ‘sedimentary’, in addition to a small number of samples with the xndryclass ‘unidentified’, but with a sedimentary rock type in the spec_type column). Ultimately only 12% of the original samples were incorporated into the SGP database (most samples in the USGS-NGDB database are igneous or metamorphic).

All data associated with the filtered samples were included in SGP. This included data with good methodological information, as well as data from the table “xtbUnknownChem”, where no methods are available and data quality may not be as good.

Interpreted Age

In Phase1 samples were matched, using a combination of stratigraphy and location, to the Macrostrat continuous time age model by Jon Husson in order to assign interpreted ages. Specifically, the minimum and maximum age estimates from the Macrostrat model were entered, and the interpreted age was entered as the average of these values. Only samples with interpreted ages were entered into SGP.

Phase 2 Updates

In Phase 2 interpreted ages were updated by Daven Quinn (Macrostrat), using the following process:

  • Samples are linked to a Macrostrat stratigraphic column footprint that contains them
  • The search window is expanded to adjacent columns, recognizing that Macrostrat's notion of "column footprint" is fuzzy
  • Priority is given to units within the column directly underlying the sample
  • Adjacent column matching can be turned off with a flag
  • The units within the matched column(s) are used to establish a semantic window for linking
  • All stratigraphic names are extracted, and Macrostrat's lexicon is traversed to extract parent and child units (e.g., members and groups) established in Macrostrat's lexicon
  • Concepts and synonyms encompassing synonymous stratigraphic names are also linked
  • Strat_name_footprints, which are computed from map units as well as stratigraphic column units, are also used to match. These are taken as a fallback as ages are usually less well-established
  • Matches are attempted first by exact matching and then by substring matching

In keeping with CARE (Collective benefit, Authority to control, Responsibility, and Ethics) principles we have removed 3438 USGS NGDB samples from 696 sites with 125201 results, where the decimal latitude and longitude intersect with Native-held land (identified using public TIGER/Line shape files provided by the U.S. Census Bureau - accessed through QGIS via https://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/AIANNHA/MapServer).

The decimal latitude/longitude of six sites (below) were updated, where lat-long contradicted other geographical or geological information. In most cases the error was clearly the result of a typo (e.g. longitude of -116 vs. -106), based on comparison to similar sites with near-identical coordinates.

site_id section_name
4189 USGS-3519
7208 USGS-6539
7211 USGS-6542
7383 USGS-6714
11626 USGS-10957

Data Entry - NGDB vs SGP

An effort was made to match most USGS NGDB Rock columns to SGP columns (see table below), but in some cases compromises were required e.g. concatenating data from USGS columns into one SGP column. In most cases if a column was omitted it did not contain any values (all NULL). Where information was particularly important (e.g. stratigraphical names) the data was cleaned so that it could be matched to the existing dictionaries, although verbatim was also included. Notes that USGS sites where given generic site names in SGP, with a prefix "USGS-" and an autogenerated number.

USGS-NGDB table_name.column_name USGS-NGDB description SGP table_name.column_name Notes
tblRockGeoData.lab_id Unique Sample ID Number (Sample identification). Unique sample identification number assigned by the laboratory. sample.original_num
tblRockGeoData.job_id Analytical Job Number (Processing information). Identifier assigned by the laboratory to each batch or job of samples to be analyzed. sample.usgs_job_id
tblRockGeoData.submitter Submitter Name (Processing information). Name of the sample submitter or submitters. NOT IMPORTED
tblRockGeoData.date_sub Date Submitted (Processing information). Date when the sample was submitted to laboratory for analysis. sample.submitted_date
tblRockGeoData.field_id Field Number (Sample identification). Sample number or field number assigned to the sample by the sample submitter. NOT IMPORTED
tblRockGeoData.state State (Geographic location). State, when noted, from where the sample was collected. site.state_province State abbreviations translated to full state name (i.e. CO = Colorado)
tblRockGeoData.country Country(Geographic location). Country (or water body for international waters), when noted, from where the sample was collected. site.country
tblRockGeoData.datum Datum (Geographic location). Reference datum, when recorded, for the latitude and longitude coordinates of the sample site. Field introduced into database about 1999. site.datum_original
tblRockGeoData.spheroid Spheroid(Geographic location). Reference spheroid or ellipsoid, when recorded, for the latitude and longitude coordinates of the sample site. Field introduced into database about 1999. NOT IMPORTED Mostly NULL, often the same as datum
tblRockGeoData.latitude Latitude (Geographic location). Latitude of the sample site in decimal degrees. site.lat_original
tblRockGeoData.longitude Longitude (Geographic location). Longitude of the sample site in decimal degrees. site.long_original
tblRockGeoData.depth Depth (Site characteristics). Depth from the surface at which the sample was collected. Units are specified by the submitter site.height_depth_m Converted into meters
tblRockGeoData.locat_desc Location Description (Geographic location). Location description as provided by the sample submitter. site.site_desc
tblRockGeoData.datecollct Date of Collection (Processing information). Date the sample was collected, when recorded. collecting_event.start_date
tblRockGeoData.sample_src Source of Sample (Site characteristics). Physical setting or environment from which the sample was collected. site.site_notes Included in site_notes along with methcollct, with semi-colon separator e.g. sample_src: mine or quarry; methcollct: composite; USGS site, sections grouped based on unique combination of sample_src, methcollct, lat-long
tblRockGeoData.methcollct Collection Method (Sample characteristics). Sample collection method: Single grab, composite, or channel. site.site_notes Included in site_notes along with methcollct, with semi-colon separator e.g. sample_src: mine or quarry; methcollct: composite; USGS site, sections grouped based on unique combination of sample_src, methcollct, lat-long
tblRockGeoData.primeclass Primary Classification (Sample characteristics). Primary classification of sample media. All samples in this database have a primary classification of 'rock'. NOT IMPORTED
tblRockGeoData.xndryclass Secondary Classification (Sample characteristics). Secondary classification or subclass of sample media. For rock database this consists of igneous, sedimentary, metamorphic, unspecified, unidentified, other, or NULL. NOT IMPORTED Used to filter data. Sedimentary samples and some ‘unidentified’ samples with sedimentary types in the ‘spec_name’ were imported to SGP.
tblRockGeoData.spec_name Specific Name (Sample characteristics). A specific name for the sample media collected, as provided by the sample submitter. sample.verbatim_lith In verbatim_lith along with addl_attributes, with semi-colon separator e.g. oil shale;OIL SHALE, USGS CORE C291. 1055.1-1056.2 ft; dolomitic; laminated;
tblRockGeoData.addl_attr Additional Attributes and Comments (Sample characteristics). Additional attributes and sample submitter supplied comments. sample.verbatim_lith In verbatim_lith along with addl_attributes, with semi-colon separator e.g. oil shale;OIL SHALE, USGS CORE C291. 1055.1-1056.2 ft; dolomitic; laminated;
tblRockGeoData.geol_age Geologic Age or Range (Sample characteristics). Age or range of ages from the Geological Time Scale for the collected sample. geol_age.verbatim_age geol_age.ics_id was determined based on this age
tblRockGeoData.stratgrphy Stratigraphic Unit Name (Sample characteristics). Name of the stratigraphic unit from which the sample was collected. When present, values are as given by the sample submitter and may represent either a formal name, an informal name, or geologic map unit abbreviation. lithostrat.verbatim_strat lithostrat.strat_id was determined based on this name. Used to filter data, did not include samples without a stratigraphic unit name.
tblRockGeoData.mineralztn Mineralization Type (Sample characteristics). An indication of mineralization or mineralization types as provided by the sample submitter. NOT IMPORTED Mostly NULL
tblRockGeoData.alteration Alteration (Sample characteristics). An indication of the presence or type of alteration noted in the sample by the submitter. NOT IMPORTED Mostly NULL
tblRockGeoData.struct_src Igneous Structural Setting (Site characteristics). An indication of the igneous setting from which the sample was collected. NOT IMPORTED All NULL for included samples
tblRockGeoData.dep_envirn Environment of Deposition (Site characteristics). Original environment of deposition for sedimentary rocks. NOT IMPORTED
tblRockGeoData.source_rk Original Rock Source (Sample characteristics). Used in the rock database to identify the precursor rock, igneous or sedimentary, for metamorphic rocks. Very sparsely populated. NOT IMPORTED Not applicable
tblRockGeoData.metamrphsm Type of Metamorphism (Site characteristics). An indication of the type of metamorphic setting from which the rock was collected. NOT IMPORTED All NULL for included samples
tblRockGeoData.facies_grd Metamorphic Facies or Grade (Site characteristics). Metamorphic facies or grade as provided by the sample submitter. NOT IMPORTED All NULL for included samples
tblRockGeoData.prep Sample Preparation Description (Sample characteristics). Description of the sample preparation methods used. Field introduced into database about 1999. NOT IMPORTED Mostly NULL and data could not be easily matched to SGP fields. Did not specify crushing material (tungsten etc) which is what we are most concerned with at SGP.
tblRockGeoData.mesh_size Sieve or Filter Size (Sample characteristics). Sieve or filter size used in field sampling or laboratory preparation to fractionate the sample. Not commonly used for rock samples. NOT IMPORTED All NULL for included samples