R Resources - selmling/Analytics-and-Data-Exploration GitHub Wiki

  1. dplyr
  2. R techniques
  3. Links for learning R

dplyr:

  • Five main verbs for data manipulation:

    • filter(): pick observation based on their values

    • arrange(): reorder rows

    • mutate(): create new variables as a function of existing variables

    • summarize(): collapse many values down to a single summary

  • These can all be used as a function of the entire dataframe / tibble OR group-wise:

    • group_by(): changes the scope of following function from operating on the entire dataset to operating on it group-by-group.
  • Each of these functions work similarly:

    • First argument of the function is always the data frame.

    • Subsequent arguments describe what to do with the data frame, using the variable names without quotes

    • Result in a new data frame

  • Useful creation functions:

    • x / sum(x): proportion of the total

    • y - mean(y): difference from the mean

    • (n()): to count, or (sum(!is.na(x))): to count nonmissing values

    • quantile(x, 0.25): find value of x that is greater than 25% of the values (and less than 75%)

  • Useful wrangling functions:

    • na.rm = TRUE: drop NA rows
  • Useful operators:

    • %>% pipe operator, hotkey: command+shift+M: use this instead of creating intermediate-step variables, useful to read this symbol as “then”

R techniques:

RStudio themes and fonts:

Links for learning R: