Formatting for AWBERC nutrient data - USEPA/SuRGE GitHub Wiki
Brian Morris completed chain of custody forms for 2021 AWBERC nutrient data. The formatting conventions were inconsistent among the forms and sometimes were at odds with those used in the lab. In some cases, the lab overrode the formatting used in the chain of custody. I noted most of these instances in the chain of custody forms. Below are the conventions used in the lab.
Unknowns and field duplicates are assigned TYPE == UKN in the nutrient data file. Field duplicates can be distinguished from the corresponding unknown by the values in the REP field. The unknown is assigned REP == 1|A and the field duplicate REP == 2|B; other identifiers are identical. The use of numbers or letters to discriminate unknowns and field duplicates is done intentionally.
Laboratory duplicates are occasionally conducted. These samples are assigned TYPE == DUP, but all other identifiers are identical to the original sample.
To aggregate laboratory duplicates, first we need to convert type
to unknown:
data <- data %>% mutate(sample_type = case_when(sample_type == "duplicate" ~ "unknown", TRUE ~ sample_type))
Now we can group by lake_id, site_id, sample_depth, rep, analyte, and sample_type to aggregate the lab dup, while preserving the field duplicate.
data <- data %>% group_by(lake_id, site_id, sample_depth, sample_type, rep, analyte) %>% summarize(finalConc = mean(finalConc, na.rm = T))
Now we can recode sample type to identify field duplicate. It will have a rep value of 2 or B:
data <- data %>% mutate(sample_type = case_when(rep %in% c(2, "B") ~ "duplicate", TRUE ~ sample_type)) %>% select(-rep)