2. USGS‐NGDB - ufarrell/sgp_phase2 GitHub Wiki
Details of USGS National Geochemical Database: Rock available here: https://mrdata.usgs.gov/ngdb/rock/
The USGS-NGDB portion of the data includes 42585 samples with 1616032 results from 10715 sites in the United States.
Geography
Age
40% are from the Paleozoic, 34% from the Cenozoic, 23% from the Mesozoic and 3% from the Proterozoic
Lithology
19% are sandstone, and 14% are shale. 28% do not have a specific lithology (although lithological details may be available in verbatim fields).
Data
Data were entered in batches based on USGS-NGDB table names, units and methods (e.g. USGS table xtbIcpaesChem, total digestion, percent values, whole (none below or above detection)). Summary of batches (sample counts, results counts, analytes) here.
Categories are based on those used on our search website (http://sgp-search.io/).
Completeness
Data Collection/Processing
Samples were filtered to include only sedimentary samples (i.e. samples with the xndryclass ‘sedimentary’, in addition to a small number of samples with the xndryclass ‘unidentified’, but with a sedimentary rock type in the spec_type column). Ultimately only 12% of the original samples were incorporated into the SGP database (most samples in the USGS-NGDB database are igneous or metamorphic).
All data associated with the filtered samples were included in SGP. This included data with good methodological information, as well as data from the table “xtbUnknownChem”, where no methods are available and data quality may not be as good.
Interpreted Age
In Phase1 samples were matched, using a combination of stratigraphy and location, to the Macrostrat continuous time age model by Jon Husson in order to assign interpreted ages. Specifically, the minimum and maximum age estimates from the Macrostrat model were entered, and the interpreted age was entered as the average of these values. Only samples with interpreted ages were entered into SGP.
Phase 2 Updates
In Phase 2 interpreted ages were updated by Daven Quinn (Macrostrat), using the following process:
- Samples are linked to a Macrostrat stratigraphic column footprint that contains them
- The search window is expanded to adjacent columns, recognizing that Macrostrat's notion of "column footprint" is fuzzy
- Priority is given to units within the column directly underlying the sample
- Adjacent column matching can be turned off with a flag
- The units within the matched column(s) are used to establish a semantic window for linking
- All stratigraphic names are extracted, and Macrostrat's lexicon is traversed to extract parent and child units (e.g., members and groups) established in Macrostrat's lexicon
- Concepts and synonyms encompassing synonymous stratigraphic names are also linked
- Strat_name_footprints, which are computed from map units as well as stratigraphic column units, are also used to match. These are taken as a fallback as ages are usually less well-established
- Matches are attempted first by exact matching and then by substring matching
In keeping with CARE (Collective benefit, Authority to control, Responsibility, and Ethics) principles we have removed 3438 USGS NGDB samples from 696 sites with 125201 results, where the decimal latitude and longitude intersect with Native-held land (identified using public TIGER/Line shape files provided by the U.S. Census Bureau - accessed through QGIS via https://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/AIANNHA/MapServer).
The decimal latitude/longitude of six sites (below) were updated, where lat-long contradicted other geographical or geological information. In most cases the error was clearly the result of a typo (e.g. longitude of -116 vs. -106), based on comparison to similar sites with near-identical coordinates.
site_id | section_name |
---|---|
4189 | USGS-3519 |
7208 | USGS-6539 |
7211 | USGS-6542 |
7383 | USGS-6714 |
11626 | USGS-10957 |
Data Entry - NGDB vs SGP
An effort was made to match most USGS NGDB Rock columns to SGP columns (see table below), but in some cases compromises were required e.g. concatenating data from USGS columns into one SGP column. In most cases if a column was omitted it did not contain any values (all NULL). Where information was particularly important (e.g. stratigraphical names) the data was cleaned so that it could be matched to the existing dictionaries, although verbatim was also included. Notes that USGS sites where given generic site names in SGP, with a prefix "USGS-" and an autogenerated number.
USGS-NGDB table_name.column_name | USGS-NGDB description | SGP table_name.column_name | Notes |
---|---|---|---|
tblRockGeoData.lab_id | Unique Sample ID Number (Sample identification). Unique sample identification number assigned by the laboratory. | sample.original_num | |
tblRockGeoData.job_id | Analytical Job Number (Processing information). Identifier assigned by the laboratory to each batch or job of samples to be analyzed. | sample.usgs_job_id | |
tblRockGeoData.submitter | Submitter Name (Processing information). Name of the sample submitter or submitters. | NOT IMPORTED | |
tblRockGeoData.date_sub | Date Submitted (Processing information). Date when the sample was submitted to laboratory for analysis. | sample.submitted_date | |
tblRockGeoData.field_id | Field Number (Sample identification). Sample number or field number assigned to the sample by the sample submitter. | NOT IMPORTED | |
tblRockGeoData.state | State (Geographic location). State, when noted, from where the sample was collected. | site.state_province | State abbreviations translated to full state name (i.e. CO = Colorado) |
tblRockGeoData.country | Country(Geographic location). Country (or water body for international waters), when noted, from where the sample was collected. | site.country | |
tblRockGeoData.datum | Datum (Geographic location). Reference datum, when recorded, for the latitude and longitude coordinates of the sample site. Field introduced into database about 1999. | site.datum_original | |
tblRockGeoData.spheroid | Spheroid(Geographic location). Reference spheroid or ellipsoid, when recorded, for the latitude and longitude coordinates of the sample site. Field introduced into database about 1999. | NOT IMPORTED | Mostly NULL, often the same as datum |
tblRockGeoData.latitude | Latitude (Geographic location). Latitude of the sample site in decimal degrees. | site.lat_original | |
tblRockGeoData.longitude | Longitude (Geographic location). Longitude of the sample site in decimal degrees. | site.long_original | |
tblRockGeoData.depth | Depth (Site characteristics). Depth from the surface at which the sample was collected. Units are specified by the submitter | site.height_depth_m | Converted into meters |
tblRockGeoData.locat_desc | Location Description (Geographic location). Location description as provided by the sample submitter. | site.site_desc | |
tblRockGeoData.datecollct | Date of Collection (Processing information). Date the sample was collected, when recorded. | collecting_event.start_date | |
tblRockGeoData.sample_src | Source of Sample (Site characteristics). Physical setting or environment from which the sample was collected. | site.site_notes | Included in site_notes along with methcollct, with semi-colon separator e.g. sample_src: mine or quarry; methcollct: composite; USGS site, sections grouped based on unique combination of sample_src, methcollct, lat-long |
tblRockGeoData.methcollct | Collection Method (Sample characteristics). Sample collection method: Single grab, composite, or channel. | site.site_notes | Included in site_notes along with methcollct, with semi-colon separator e.g. sample_src: mine or quarry; methcollct: composite; USGS site, sections grouped based on unique combination of sample_src, methcollct, lat-long |
tblRockGeoData.primeclass | Primary Classification (Sample characteristics). Primary classification of sample media. All samples in this database have a primary classification of 'rock'. | NOT IMPORTED | |
tblRockGeoData.xndryclass | Secondary Classification (Sample characteristics). Secondary classification or subclass of sample media. For rock database this consists of igneous, sedimentary, metamorphic, unspecified, unidentified, other, or NULL. | NOT IMPORTED | Used to filter data. Sedimentary samples and some ‘unidentified’ samples with sedimentary types in the ‘spec_name’ were imported to SGP. |
tblRockGeoData.spec_name | Specific Name (Sample characteristics). A specific name for the sample media collected, as provided by the sample submitter. | sample.verbatim_lith | In verbatim_lith along with addl_attributes, with semi-colon separator e.g. oil shale;OIL SHALE, USGS CORE C291. 1055.1-1056.2 ft; dolomitic; laminated; |
tblRockGeoData.addl_attr | Additional Attributes and Comments (Sample characteristics). Additional attributes and sample submitter supplied comments. | sample.verbatim_lith | In verbatim_lith along with addl_attributes, with semi-colon separator e.g. oil shale;OIL SHALE, USGS CORE C291. 1055.1-1056.2 ft; dolomitic; laminated; |
tblRockGeoData.geol_age | Geologic Age or Range (Sample characteristics). Age or range of ages from the Geological Time Scale for the collected sample. | geol_age.verbatim_age | geol_age.ics_id was determined based on this age |
tblRockGeoData.stratgrphy | Stratigraphic Unit Name (Sample characteristics). Name of the stratigraphic unit from which the sample was collected. When present, values are as given by the sample submitter and may represent either a formal name, an informal name, or geologic map unit abbreviation. | lithostrat.verbatim_strat | lithostrat.strat_id was determined based on this name. Used to filter data, did not include samples without a stratigraphic unit name. |
tblRockGeoData.mineralztn | Mineralization Type (Sample characteristics). An indication of mineralization or mineralization types as provided by the sample submitter. | NOT IMPORTED | Mostly NULL |
tblRockGeoData.alteration | Alteration (Sample characteristics). An indication of the presence or type of alteration noted in the sample by the submitter. | NOT IMPORTED | Mostly NULL |
tblRockGeoData.struct_src | Igneous Structural Setting (Site characteristics). An indication of the igneous setting from which the sample was collected. | NOT IMPORTED | All NULL for included samples |
tblRockGeoData.dep_envirn | Environment of Deposition (Site characteristics). Original environment of deposition for sedimentary rocks. | NOT IMPORTED | |
tblRockGeoData.source_rk | Original Rock Source (Sample characteristics). Used in the rock database to identify the precursor rock, igneous or sedimentary, for metamorphic rocks. Very sparsely populated. | NOT IMPORTED | Not applicable |
tblRockGeoData.metamrphsm | Type of Metamorphism (Site characteristics). An indication of the type of metamorphic setting from which the rock was collected. | NOT IMPORTED | All NULL for included samples |
tblRockGeoData.facies_grd | Metamorphic Facies or Grade (Site characteristics). Metamorphic facies or grade as provided by the sample submitter. | NOT IMPORTED | All NULL for included samples |
tblRockGeoData.prep | Sample Preparation Description (Sample characteristics). Description of the sample preparation methods used. Field introduced into database about 1999. | NOT IMPORTED | Mostly NULL and data could not be easily matched to SGP fields. Did not specify crushing material (tungsten etc) which is what we are most concerned with at SGP. |
tblRockGeoData.mesh_size | Sieve or Filter Size (Sample characteristics). Sieve or filter size used in field sampling or laboratory preparation to fractionate the sample. Not commonly used for rock samples. | NOT IMPORTED | All NULL for included samples |