Information & Resources - Starryz/fivethirtyeight GitHub Wiki

"Taming" data Principles - summary

1. naming

  • no more 20 char long
  • lower case char, replace space w/ underscores (snake_case instead of camelCase)
  • variable name: underscore instead of spaces

2. observational units

  • observations in the left-hand columns

3. dates

  • if only year β€” numerical
  • if year and month β€” convert to Date (year-month-01) β€” associate all observations from the same month to have a day of 01
  • if year, month, day β€” convert to Date (year-month-day)

4. other class variables

  • ordinal β€” ordered
  • categorical w/ a fixed and known set of levels β€” factor
  • categorical w/ unknown/very large number of possible levels β€” characters
  • β€œyes/no” character, binary variables β€” TRUE/FALSE logical

5. tidy format

  • save all data frames in tidy format
  1. each variable forms a column
  2. each observation forms a row
  3. each type of observational unit forms a table
  • if converting alerts the dataset too much, make the code to convert easily accessible (i.e. Examples in the documentation)