Merge - Cghlewis/data-wrangling-functions GitHub Wiki

Joining data, also called merging, can be used in a variety of scenarios. A few examples include:

  • Linking data across instruments (a student survey + a student assessment)
  • Linking data across time (a student survey in the fall + a student survey in the spring)
  • Linking data across participants (a student assessment + a teacher survey)
  • Linking for de-identification purposes (a student survey with name + a student roster with study ID)

There are several different types of joins you can perform. Here we review:

  • Left join
  • Right join
  • Full join
  • Inner join

There are two important rules when merging data.

  1. Variable names cannot repeat (within or across datasets).
  2. Each dataset must contain a key (values that uniquely identify rows in a dataset).

Merging data

  • Merge (This link will take you to examples in an external blog post)

Main functions used in examples

Package Functions
dplyr left_join(); right_join(); full_join(); inner_join()

Other functions used in examples

Package Functions
base paste0()
dplyr select(); rename_with()