Complete - Cghlewis/data-wrangling-functions GitHub Wiki
These are functions to complete your data using values available elsewhere in your data or from external data sources.
Long Data
This includes functions to complete long data, such as filling in missing years and schools in the columns below using existing values. You see this kind of formatting a lot in Excel spreadsheets.
| school | year | grade | num_students |
|---|---|---|---|
| School A | 2010 | 5 | 100 |
| NA | NA | 6 | 150 |
| NA | NA | 7 | 160 |
It also includes functions to make implicit missing values into explicit missing values. Like in the case of data below where every student had the opportunity to have data collected in both fall and spring, but we were not able to collect fall and spring data on every student (such as student 234). However, we would like to see the value of both "fall" and "spring" in time for everyone with their missing test_score data.
| stu_id | time | test_score |
|---|---|---|
| 123 | fall | 505 |
| 123 | spring | 515 |
| 234 | fall | 580 |
| 345 | fall | 600 |
| 345 | spring | 590 |
Wide Data
As well as functions to complete wide data, such as filling in missing gender from the 19-20 school year, with gender reported from previous years or external datasets.
| stu_id | gender_1718 | gender_1819 | gender_1920 |
|---|---|---|---|
| 150 | male | male | NA |
| 160 | NA | female | female |
| 170 | non-binary | NA | NA |
Complete long data
Complete wide data
Complete data by recoding
Main functions used in examples
| Package | Functions |
|---|---|
| tidyr | fill(); complete() |
| dplyr | coalesce(); rows_update() |
Other functions used in examples
| Package | Functions |
|---|---|
| dplyr | group_by(); ungroup(); arrange(); mutate(); na_if(); across(); rows_patch() |
| tidyselect | contains() |
Resources