Merge - Cghlewis/data-wrangling-functions GitHub Wiki
Joining data, also called merging, can be used in a variety of scenarios. A few examples include:
- Linking data across instruments (a student survey + a student assessment)
- Linking data across time (a student survey in the fall + a student survey in the spring)
- Linking data across participants (a student assessment + a teacher survey)
- Linking for de-identification purposes (a student survey with name + a student roster with study ID)
There are several different types of joins you can perform. Here we review:
- Left join
- Right join
- Full join
- Inner join
There are two important rules when merging data.
- Variable names cannot repeat (within or across datasets).
- Each dataset must contain a key (values that uniquely identify rows in a dataset).
Merging data
- Merge (This link will take you to examples in an external blog post)
Main functions used in examples
| Package | Functions |
|---|---|
| dplyr | left_join(); right_join(); full_join(); inner_join() |
Other functions used in examples
| Package | Functions |
|---|---|
| base | paste0() |
| dplyr | select(); rename_with() |