04.Data collection09.Data quality control - sporedata/researchdesigneR GitHub Wiki

1. Use cases: in which situations should I use this method?

2. Input: what kind of data does the method require?

  1. Ongoing data collection process

3. Algorithm: how does the method work?

Model mechanics

  1. Data quality control is a basic principle behind data collection, in that it ensures that all subsequent analyses reflect the phenomenon being measured.

Reporting guidelines for Methods

Typical metrics for data quality include:

  1. Consistency - measured by the number of contradictions among two or more variables
  2. Accuracy - the data has an association with whatever was measured. In other words, the measure is valid.
  3. Completeness - often associated with missing rates.
  4. Auditability - there is an audit trail, defined as a chronological record of the values for each variable. Such a system is present in Redcap.
  5. Orderliness and uniqueness - the data set obeys a format that facilitates analysis and avoids data entry errors, with each record only appearing once, i.e., no duplicates. Tidy data is an example.
  6. Timeliness - the data represents the reality with a lag that is as small as possible.

Reporting guidelines include:

  1. Transparent reporting of data quality in distributed data networks [1].
  2. Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data [2].

Results

Data science packages

  • qcc - Shewhart quality control charts for continuous, attribute, and count data. Cusum and EWMA charts. Operating characteristic curves. Process capability analysis. Pareto chart and cause-and-effect chart. Multivariate control charts.
  • accrued Package for visualizing data quality of partially accruing data.
  • Quality Control Charts for 'ggplot' [3].

Suggested companion methods

Learning materials

  1. Books
    • Statistical Quality Control [4].
  2. Articles
  3. Videos

4. Output: how do I interpret this method's results?

Mock conclusions or most frequent format for conclusions reached at the end of a typical analysis.

Tables, plots, and their interpretation

5. SporeData-specific

Templates

Data science functions

References

[1] Kahn MG, Brown JS, Chun AT, Davidson BN, Meeker D, Ryan PB, Schilling LM, Weiskopf NG, Williams AE, Zozus MN. Transparent reporting of data quality in distributed data networks. Egems. 2015;3(1).

[2] Benchimol EI, Manuel DG, To T, Griffiths AM, Rabeneck L, Guttmann A. Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data. Journal of clinical epidemiology. 2011 Aug 1;64(8):821-9.

[3] Grey K, Grey MK. Package ‘ggQC’.

[4] Montgomery DC. Statistical quality control. Wiley Global Education; 2012 May 29.

⚠️ **GitHub.com Fallback** ⚠️