Restructure - Cghlewis/data-wrangling-functions GitHub Wiki

In longitudinal education research, we often need to structure our data into either wide or long format depending on the analysis.

Say for example, a study collects data on one cohort of teachers over 2 waves of data collection (a fall data collection wave and a spring data collection wave) in one school year.

This data could be structured in wide format (where wave1 and wave2 are added as prefixes in this case). All data collected on a unique participant will be in one row:

tch_id	intervention	w1_q1	w2_q1
1234	0	5	4
2345	1	4	4
3456	1	2	5

Or the data could be structured in long format (where wave1 and wave2 are added in a "wave" variable). A unique participant will repeat in your dataset for each wave of data collected on them:

tch_id	intervention	wave	q1
1234	0	1	5
1234	0	2	4
2345	1	1	4
2345	1	2	4
3456	1	1	2
3456	1	2	5

Oftentimes we don't have to plan far ahead for how we want our final data to look. We can pick one format to start with and if we change our minds, it is fairly simple to restructure the data to the other format.

We may also need to restructure data for specific statistical tests such as Intraclass Correlation Coefficient (ICC). If we collect an observation measure where, for instance, two raters observe the same classroom, we may want to see how reliable the ratings are. If we enter the ratings in a format where each rater has their own row, we may need to restructure the data to where each rater is their own column in order to run tests such as irr::icc().

Before:

tch_id	rater_id	score
1234	16	23
1234	22	27
2345	16	18
2345	22	20

After:

rater16	rater22
23	27
18	20

And last, another (there are MANY more) reason for restructuring data is formatting data into a "tidy format" for ease of calculating descriptive statistics and creating visualizations. Having data in tidy format allows us to use tools such as dplyr::group_by().

Before (not tidy):

school	enroll_6	enroll_7	enroll_8
schoolx	50	40	70
schooly	75	64	68

After (tidy):

school	grade	enroll
schoolx	6	50
schoolx	7	40
schoolx	8	70
schooly	6	75
schooly	7	64
schooly	8	68

Into wide format

Wide format

Into long format

Long format

External restructure script

Longitudinal restructure syntax shared on OSF

Main functions used in examples

Package	Functions
tidyr	pivot_wider(); pivot_longer()

Other functions used in examples

Package	Functions
tidyselect	matches()
dplyr	select()

Resources