lake_id, site_id, and sample_depth formats - USEPA/SuRGE GitHub Wiki

lake_id

Lakes are uniquely identified by a lake_id value. These values originally took the format of ch4-xxx where xxx is a numeric code. These have been entered in several different formats, however, causing merging issues. Furthermore, the splitting of lakes 69 and 70 forced those lake ids to contain a suffix (lacustrine, transitional, riverine). We will standardize the lake id as the numeric contents of the original lake_id (excluding leading zeroes), followed by _lacustrine, _transtional, _riverine if needed. The field will be of class character. For example, 69_lacustrine. After the lacustrine, riverine, and transitional pieces of lakes 69 and 70 have been aggregated to a whole lake estimate, the text components of the lake_id values can be omitted and the lake_id field converted to numeric.

site_id

Unique sampling locations within a lake are identified by a site_id values. Originally they were formatted as character leading with one or two letters, followed by a dash, followed by a number (e.g. S-07, SU-10). In practice, these values have been entered in several different formats (e.g. S-07, S-7, 7, S 7,....) causing merge issues. The text components of the site_id will be stripped and the numeric components converted to class numeric. This code will do the trick site_id = as.numeric(gsub(".*?([0-9]+).*", "\\1", site_id))

sample_depth

Samples are collected from "shallow" and "deep" depths. Field blanks are not collected from a specific depth, therefore we will assign a sample_depth value of "blank".