05.Data linkage04.Enrichment by machine learning - sporedata/researchdesigneR GitHub Wiki
- Enrichment by machine learning is used whenever two datasets do not share a common identifier but come from the same population.
- Two datasets without a common unique identifier but belonging to the same population.
In contrast to statistical matching, a machine learning model is created in relation to the variable that needs to be transported (imputed) to the new dataset using a group of common predictors present in both datasets. The machine learning model is then used in the target dataset to predict the value of the imputed variable of interest.
- Books
- Articles
[1] van Walraven C. Improved correction of misclassification bias with bootstrap imputation. Medical care. 2018 Jul 1;56(7):e39-45.
[2] Nowok B. synthpop: An R package for generating synthetic versions of sensitive microdata for statistical disclosure control. Technical report, Administrative Data Research Centre, Univ. of Edinburgh; 2016.