# Label Data - Cghlewis/data-wrangling-functions GitHub Wiki

These are functions for adding metadata, such as variable and value labels, to your data. There are several reasons you may want to work with labelled data.

- It improves interpretation if you are importing/exporting to programs that allow for this kind of embedded metadata (SPSS, SAS, or Stata datasets).
- It can also aid in interpretation of information while working in R. You can see variable labels when you view the data.
- It improves readability of outputs such as graphs, tables, and codebooks. Check out Shannon Pileggi's blog post and Posit Conf slides, as well as my R-Ladies NYC presentation for examples of how labelled data is used in various outputs.

Almost all functions I cover here come from the `haven`

or `labelled`

package (with a brief dip into the `rio`

and `sjPlot`

package). I use `labelled`

mainly because it works best for my workflow, where I typically import/export data using the `haven`

package and it works well with the `%>%`

operator as well.

However, I do not cover the `labelled::labelled_spss()`

function in my examples because I find it has compatibility issues with other functions in the `labelled`

package. You can read about it here for more information.

The examples below can apply to SPSS, SAS, or Stata datasets. However the missing value functions I cover are SPSS specific. Functions for working with SAS and Stata missing values (such as tagged NAs) are not covered here but information on those functions can be found here.

Several Notes:

- When you add value labels using the
`labelled`

package, the class for those variables will become*haven_labelled*, unless you add value labels using`labelled::labelled_spss()`

, then the class will be*haven_labelled_spss*. - When you add missing value labels to a variable using any function in
`labelled`

, the class for that variable will become*haven_labelled_spss*. - When you add variable labels to a dataset those variables will not change class to
*haven_labelled*or*haven_labelled_spss*unless you also add value labels or missing value labels using any`labelled`

function or add variable labels using`labelled::labelled_spss()`

. - When you import data from SPSS, SAS, or Stata with labels using
`haven`

, the same rules as above will apply. Any variable with simply a variable label will not change class (ex: numeric). However, any variable with a value label will be*haven_labelled*. Also, if you import an SPSS file with`haven`

using the*user_na=TRUE*option and you have missing value labels in your data, then the class for those variables will be*haven_labelled_spss*.

There is another package `sjlabelled`

that has similar label adding functions but do not update the variable class. The `sjlabelled`

package can be a great one for adding labels for the purposes of plotting, when you don't necessarily want to change your variable classes. More information on `sjlabelled`

can be found here.

**A word of warning.** There are times when the ordering of how you apply labels may matter. Every once in a while I have labels disappear (say if I apply the variable labels first and then later apply the value labels, my variable labels may disappear, I’m not sure why). If you have issues with labels disappearing, consider applying them in this order to preserve information:

- value labels
- assign missing values
- variable labels

## Add value labels

## Assign missing values

## Add variable labels

## Review labelled data

## Copy labels

## Convert numeric values to labels

## Import/Export labelled data

## Calculating variables with labelled NA

- [Calculate row sums or means with labelled NA values](See Calculate Row Values)

**Main functions used in examples**

Package | Functions |
---|---|

haven | read_sav(); write_sav() |

labelled | set_value_labels(); val_labels(); add_value_labels(); labelled(); set_na_values(); na_values(); set_variable_labels(); var_label(); look_for(); copy_labels_from() |

sjPlot | view_df() |

rio | characterize() |

**Other functions used in examples**

Package | Functions |
---|---|

dplyr | across(); mutate(); filter(); select(); all_of() |

snakecase | to_sentence_case() |

tidyselect | starts_with(); everything() |

knitr | kable() |

base | as.list() |

openxlsx | write.xlsx() |

purrr | map() |

stringr | str_replace_all() |

tibble | deframe() |

**Resources**

- https://www.pipinghotdata.com/posts/2020-12-23-leveraging-labelled-data-in-r/
- https://cran.r-project.org/web/packages/labelled/vignettes/intro_labelled.html
- https://cran.r-project.org/web/packages/labelled/labelled.pdf
- https://www.rdocumentation.org/packages/labelled/versions/2.7.0
- https://martinctc.github.io/blog/working-with-spss-labels-in-r/
- https://joseph.larmarange.net/intro_labelled.html
- http://larmarange.github.io/labelled/reference/var_label.html
- https://raw.githubusercontent.com/rstudio/cheatsheets/main/labelled.pdf
- https://wlm.userweb.mwn.de/SPSS/wlmsmiss.htm
- https://stackoverflow.com/questions/43529972/set-missing-values-for-multiple-labelled-variables