05.Data linkage03.Statistical matching - sporedata/researchdesigneR GitHub Wiki
- Statistical matching is used when linking two or more datasets without a unique identifier but with a set of common variables and belonging to the same population. - see Accounting for the burden and redistribution of health care costs: Who uses care and who pays for it
-
Two datasets without a common unique identifier but belonging to the same population -- or at least you should force the two samples to be similar using IPW [3], [2]
-
A common set of variables that would allow you to match individuals with similar characteristics between any two datasets -- and there's got tO be an overlap [2], [3]
Guidelines for Statistical Reporting in Articles for Medical Journals
- Sdglinkage: Synthetic Data Generation for Linkage Methods Development.
- Statistical Matching or Data Fusion (StatMatch).
-
Books
-
Articles combining theory and scripts
[1] D'Orazio M, Di Zio M, Scanu M. Statistical matching: Theory and practice. John Wiley & Sons; 2006 Mar 30.
[2] Degtiar I, Rose S. A review of generalizability and transportability.
[3] Buchanan AL, Hudgens MG, Cole SR, Mollan KR, Sax PE, Daar ES, Adimora AA, Eron JJ, Mugavero MJ. Generalizing evidence from randomized trials using inverse probability of sampling weights. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2018 Oct;181(4):1193-209.
[4] D'Orazio M. Statistical Matching or Data Fusion (StatMatch).
[5] Nowok B. synthpop: An R package for generating synthetic versions of sensitive microdata for statistical disclosure control. Technical report, Administrative Data Research Centre, Univ. of Edinburgh; 2016.