R Resources - selmling/Analytics-and-Data-Exploration GitHub Wiki
dplyr
:
-
Five main verbs for data manipulation:
-
filter()
: pick observation based on their values -
arrange()
: reorder rows -
mutate()
: create new variables as a function of existing variables -
summarize()
: collapse many values down to a single summary
-
-
These can all be used as a function of the entire dataframe / tibble OR group-wise:
group_by()
: changes the scope of following function from operating on the entire dataset to operating on it group-by-group.
-
Each of these functions work similarly:
-
First argument of the function is always the data frame.
-
Subsequent arguments describe what to do with the data frame, using the variable names without quotes
-
Result in a new data frame
-
-
Useful creation functions:
-
x / sum(x)
: proportion of the total -
y - mean(y)
: difference from the mean -
(n())
: to count, or(sum(!is.na(x)))
: to count nonmissing values -
quantile(x, 0.25)
: find value of x that is greater than 25% of the values (and less than 75%)
-
-
Useful wrangling functions:
na.rm = TRUE
: drop NA rows
-
Useful operators:
%>%
pipe operator, hotkey:command+shift+M
: use this instead of creating intermediate-step variables, useful to read this symbol as “then”
R techniques:
-
Get Reliability Metrics in R (Studio):
-
Measure duration of every
.wav
file in a directory
RStudio themes and fonts:
Links for learning R:
-
Sanity check your R code results with Microsoft Excel: