05.Data linkage03.Statistical matching - sporedata/researchdesigneR GitHub Wiki

1. Use cases: in which situations should I use this method?

Statistical matching is used when linking two or more datasets without a unique identifier but with a set of common variables and belonging to the same population. - see Accounting for the burden and redistribution of health care costs: Who uses care and who pays for it

2. Input: what kind of data does the method require?

Two datasets without a common unique identifier but belonging to the same population -- or at least you should force the two samples to be similar using IPW [3], [2]
A common set of variables that would allow you to match individuals with similar characteristics between any two datasets -- and there's got tO be an overlap [2], [3]

3. Algorithm: how does the method work?

Model mechanics

Reporting guidelines

Guidelines for Statistical Reporting in Articles for Medical Journals

Data science packages

Suggested companion methods

Learning materials

Books
- Statistical Matching: Theory and Practice [1].
- Statistical Matching or Data Fusion (StatMatch) [4].
Articles combining theory and scripts

4. Output: how do I interpret this method's results?

Mock conclusions, or most frequent format for conclusions reached at the end of a typical analysis.

Tables, plots, and their interpretation

5. SporeData-specific

Templates

Statistical matching

Data science functions

References

[1] D'Orazio M, Di Zio M, Scanu M. Statistical matching: Theory and practice. John Wiley & Sons; 2006 Mar 30.

[2] Degtiar I, Rose S. A review of generalizability and transportability.

[3] Buchanan AL, Hudgens MG, Cole SR, Mollan KR, Sax PE, Daar ES, Adimora AA, Eron JJ, Mugavero MJ. Generalizing evidence from randomized trials using inverse probability of sampling weights. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2018 Oct;181(4):1193-209.

[4] D'Orazio M. Statistical Matching or Data Fusion (StatMatch).

[5] Nowok B. synthpop: An R package for generating synthetic versions of sensitive microdata for statistical disclosure control. Technical report, Administrative Data Research Centre, Univ. of Edinburgh; 2016.

⚠️ GitHub.com Fallback ⚠️