04.Data collection11.Sampling - sporedata/researchdesigneR GitHub Wiki

1. Use cases: in which situations should I use this method?

Sampling strategies are used whenever the data collection intends to represent the population of interest.

2. Input: what kind of data does the method require?

Budget to collect data on large scale
Contact information for the participants to be sampled

3. Algorithm: how does the method work?

Model mechanics

Sampling is a fundamental operation for the auditing and statistical analysis of large databases. It allows data scientists to examine large datasets by selecting, manipulating, and analyzing a small representative portion of data to identify patterns and trends in the larger dataset being studied while reducing the time to compute it.
- Typically, surveys use a sample from the target population based on strata, clusters, and primary sampling unit (psu). The idea is to place weights on your sample results so that it can mirror the target population. See examples of sampling designs at https://twitter.com/beaconcancer22. Like with IPW (inverse probability weighting), the principle is to apply the weights to each observation.
- For instance, if one is working with database validation and wants to check whether the values in the database are accurate, there are different sampling methods that one might use for this goal (i.e., random, systematic, convenience, cluster, and stratified). Besides that, accuracy and precision are two concepts and statistical indicators crucial when working with sampling procedures. They are often used to evaluate the effectiveness of the sampling operations. Therefore, when talking about precision - or the sample size to be sampled from a population to achieve a certain level of precision in that sampling - if you are interested in specific population strata, this will be called stratified sampling. The general principle is that the closest your sample is to the size of that stratum, the more precise your sampling will be, with less error in the estimation.

Reporting guidelines

1.Guideline for sampling, measuring and reporting ionized magnesium in plasma

2.Strengthening the Reporting of Observational Studies in Epidemiology for respondent-driven sampling studies: "STROBE-RDS" statement

Data science packages

Packages in R for sample size calculation based on precision for stratified sampling:

SamplingStrata cheat sheet.
PracTools package (function strAlloc)
optimStrat package (function optiallo)

Learning materials

Books
- R Companion for Sampling: Design and Analysis
Articles *

References

[1] Cocks K, Torgerson DJ. Sample size calculations for pilot randomized trials: a confidence interval approach. Journal of clinical epidemiology. 2013 Feb 1;66(2):197-201.

[2] Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, Alonso-Coello P, Djulbegovic B, Atkins D, Falck-Ytter Y, Williams Jr JW. GRADE guidelines: 5. Rating the quality of evidence—publication bias. Journal of clinical epidemiology. 2011 Dec 1;64(12):1277-82.