Restructure - Cghlewis/data-wrangling-functions GitHub Wiki

In longitudinal education research, we often need to structure our data into either wide or long format depending on the analysis.

Say for example, a study collects data on one cohort of teachers over 2 waves of data collection (a fall data collection wave and a spring data collection wave) in one school year.

This data could be structured in wide format (where wave1 and wave2 are added as prefixes in this case). All data collected on a unique participant will be in one row:

tch_id intervention w1_q1 w2_q1
1234 0 5 4
2345 1 4 4
3456 1 2 5

Or the data could be structured in long format (where wave1 and wave2 are added in a "wave" variable). A unique participant will repeat in your dataset for each wave of data collected on them:

tch_id intervention wave q1
1234 0 1 5
1234 0 2 4
2345 1 1 4
2345 1 2 4
3456 1 1 2
3456 1 2 5

Oftentimes we don't have to plan far ahead for how we want our final data to look. We can pick one format to start with and if we change our minds, it is fairly simple to restructure the data to the other format.


We may also need to restructure data for specific statistical tests such as Intraclass Correlation Coefficient (ICC). If we collect an observation measure where, for instance, two raters observe the same classroom, we may want to see how reliable the ratings are. If we enter the ratings in a format where each rater has their own row, we may need to restructure the data to where each rater is their own column in order to run tests such as irr::icc().

Before:

tch_id rater_id score
1234 16 23
1234 22 27
2345 16 18
2345 22 20

After:

rater16 rater22
23 27
18 20

And last, another (there are MANY more) reason for restructuring data is formatting data into a "tidy format" for ease of calculating descriptive statistics and creating visualizations. Having data in tidy format allows us to use tools such as dplyr::group_by().

Before (not tidy):

school enroll_6 enroll_7 enroll_8
schoolx 50 40 70
schooly 75 64 68

After (tidy):

school grade enroll
schoolx 6 50
schoolx 7 40
schoolx 8 70
schooly 6 75
schooly 7 64
schooly 8 68

Into wide format

Into long format

External restructure script


Main functions used in examples

Package Functions
tidyr pivot_wider(); pivot_longer()

Other functions used in examples

Package Functions
tidyselect matches()
dplyr select()

Resources