05.Data linkage01.Probabilistic - sporedata/researchdesigneR GitHub Wiki

1. Use cases: in which situations should I use this method?

2. Input: what kind of data does the method require?

  • Two datasets with a common unique identifier, but where the linkage is imperfect or needs to be quality checked

3. Algorithm: how does the method work?

Data science packages

Learning materials

  1. Books
  2. Articles
    • Probabilistic Record Linkage [1].
    • Probabilistic Linkage to Enhance Deterministic Algorithms and Reduce Data Linkage Errors in Hospital Administrative Data [2].
    • synthpop: An R package for generating synthetic versions of sensitive microdata for statistical disclosure control [3].


[1] Sayers A, Ben-Shlomo Y, Blom AW, Steele F. Probabilistic record linkage. International journal of epidemiology. 2016 Jun 1;45(3):954-64.

[2] Hagger-Johnson G, Harron K, Goldstein H, Aldridge R, Gilbert R. Probabilistic linking to enhance deterministic algorithms and reduce linkage errors in hospital administrative data. Journal of innovation in health informatics. 2017 Jun 30;24(2):891.

[3] Nowok B. synthpop: An R package for generating synthetic versions of sensitive microdata for statistical disclosure control. Technical report, Administrative Data Research Centre, Univ. of Edinburgh; 2016.

⚠️ **GitHub.com Fallback** ⚠️