R Resources - selmling/Analytics-and-Data-Exploration GitHub Wiki
dplyr:
-
Five main verbs for data manipulation:
-
filter(): pick observation based on their values -
arrange(): reorder rows -
mutate(): create new variables as a function of existing variables -
summarize(): collapse many values down to a single summary
-
-
These can all be used as a function of the entire dataframe / tibble OR group-wise:
group_by(): changes the scope of following function from operating on the entire dataset to operating on it group-by-group.
-
Each of these functions work similarly:
-
First argument of the function is always the data frame.
-
Subsequent arguments describe what to do with the data frame, using the variable names without quotes
-
Result in a new data frame
-
-
Useful creation functions:
-
x / sum(x): proportion of the total -
y - mean(y): difference from the mean -
(n()): to count, or(sum(!is.na(x))): to count nonmissing values -
quantile(x, 0.25): find value of x that is greater than 25% of the values (and less than 75%)
-
-
Useful wrangling functions:
na.rm = TRUE: drop NA rows
-
Useful operators:
%>%pipe operator, hotkey:command+shift+M: use this instead of creating intermediate-step variables, useful to read this symbol as โthenโ
R techniques:
-
Get Reliability Metrics in R (Studio):
-
Measure duration of every
.wavfile in a directory
RStudio themes and fonts:
Links for learning R:
-
Sanity check your R code results with Microsoft Excel: