Complete - Cghlewis/data-wrangling-functions GitHub Wiki

These are functions to complete your data using values available elsewhere in your data or from external data sources.

Long Data

This includes functions to complete long data, such as filling in missing years and schools in the columns below using existing values. You see this kind of formatting a lot in Excel spreadsheets.

school year grade num_students
School A 2010 5 100
NA NA 6 150
NA NA 7 160

It also includes functions to make implicit missing values into explicit missing values. Like in the case of data below where every student had the opportunity to have data collected in both fall and spring, but we were not able to collect fall and spring data on every student (such as student 234). However, we would like to see the value of both "fall" and "spring" in time for everyone with their missing test_score data.

stu_id time test_score
123 fall 505
123 spring 515
234 fall 580
345 fall 600
345 spring 590

Wide Data

As well as functions to complete wide data, such as filling in missing gender from the 19-20 school year, with gender reported from previous years or external datasets.

stu_id gender_1718 gender_1819 gender_1920
150 male male NA
160 NA female female
170 non-binary NA NA

Complete long data

Complete wide data

Complete data by recoding

  • Fill NA with chosen values (See Recode)
  • Fill missing dates with midpoint value (See Recode)

Main functions used in examples

Package Functions
tidyr fill(); complete()
dplyr coalesce(); rows_update()

Other functions used in examples

Package Functions
dplyr group_by(); ungroup(); arrange(); mutate(); na_if(); across(); rows_patch()
tidyselect contains()

Resources