7.2.2.Learning about R packages - sj50179/Google-Data-Analytics-Professional-Certificate GitHub Wiki

Packages include:

  • Reusable R functions
  • Documentation about the functions
  • Sample datasets
  • Tests for checking your code

CRAN (Comprehensive R Archive Network)

  • An online archive with R packages, source code, manuals, and documentation

Available R packages

To make the most of R for your data analysis, you will need to install packages. Packages are units of reproducible R code that you can use to add more functionality to R. The best part is that the R community creates and shares packages so that other users can access them! In this reading, you will learn more about widely used packages and where to find them.

Packages can be found in repositories, which are collections of useful packages that are ready to install. You can find repositories on Bioconductor, R-Forge, rOpenSci, or GitHub, but the most commonly used repository is the Comprehensive R Archive Network or CRAN. CRAN stores code and documentation so that you can install packages into your own RStudio space.

Package documentation

Packages will not only include the code itself, but also documentation that explains the package’s author, function, and any other packages that you will need to download. When you are using CRAN, you can find the package documentation in the DESCRIPTION file.

Check out Karl Broman's R Package Primer to learn more.

Choosing the right packages

With so many packages out there, it can be hard to know which ones will be the most useful for your library or directory of installed packages. Luckily, there are some great resources out there:

  • Tidyverse: the tidyverse is a collection of R packages specifically designed for working with data. It’s a standard library for most data analysts, but you can also download the packages individually.
  • Quick list of useful R packages: this is RStudio Support’s list of useful packages with installation instructions and functionality descriptions.
  • CRAN Task Views: this is an index of CRAN packages sorted by task. You can search for the type of task you need to perform and it will pull up a page with packages related to that task for you to explore.

You will discover more packages throughout this course and as you use R more often, but this is a great starting point for building your own library.

Tidyverse (R)

  • A system of packages in R with a common design philosophy for data manipulation, exploration, and visualization

8 core tidyverse packages

  • ggplot2
  • tibble
  • tidyr
  • readr
  • purrr
  • dplyr
  • stringr
  • forcats

update.packages() will update all of the packages

Read tidyverse vignettes

A vignette is documentation that acts as a guide to an R package. A vignette shares details about the problem that the package is designed to solve and how the included functions can help you solve it. The browseVignettes function allows you to read through vignettes of a loaded package.

To check out vignettes for one specific package, type browseVignettes(“packagename”) and press Enter (Windows) or Return (Mac). Remember that functions are case-sensitive in R, so “Vignettes” must have a capital V.

For example if you execute the browseVignettes() function on ggplot2, browseVignettes(“ggplot2”), you will have the following outcome:

If you are using RStudio Cloud, running this function will open a new browser tab with links to the vignettes.

Test your knowledge on R packages

TOTAL POINTS 3

Question 1

When using RStudio, what does the installed.packages() function do?

  • Creates code for analysts to use to edit their packages
  • Selects the best packages to use based on an analyst’s current needs
  • Installs all available packages for use in an RStudio session
  • Presents a list of packages currently installed in an RStudio session

Correct. The installed.packages() function shows a list of packages currently installed in an RStudio session. You can then locate the names of the packages and what’s needed to use functions from the package.

Question 2

In data analytics, what is CRAN?

  • A collection of packages that function together to make analysis in R more efficient
  • A function for finding packages to use for analysis in RStudio
  • An R interface that has many of the same functions as RStudio
  • A commonly used online archive with R packages and other R resources

Correct. CRAN is a commonly used online archive with R packages and other R resources. CRAN makes sure that the R resources it shares follow the required quality standards and are authentic and valid.

Question 3

What are ggplot2, tidyr, dplyr, and forcats all a part of?

  • A list of variables for use in programming in RStudio
  • A list of functions that clean data efficiently
  • A collection of core tidyverse packages
  • A collection of commonly used, CRAN-based data sets

Correct. The packages ggplot2, tidyr, dplyr, and forcats are part of a collection of eight core tidyverse packages. The other core packages are: tibble, readr, purrr, and stringr.


Explore the tidyverse

4 packages that are an essential part of the workflow for data analysts:

  • ggplot2
    • Create a variety of data viz by applying different visual properties to the data variables in R
  • tidyr
    • A package used for data cleaning to make tidy data
  • readr
    • Used for importing data
  • dplyr
    • Offers a consistent set of functions that help completing some common data manipulation tasks

Rest packages

  • tibble
    • Works with data frames
  • purrr
    • Works with functions and vectors helping make the code easier to write and more expressive
  • stringr
    • Includes functions that make it easier to work with strings
  • forcats
    • Provides tools that solve common problems with factors
      • factors (R): Store categorical data in R where the data values are limited and usually based on a finite group like country or year

Using the tidyverse and its packages will help you fine-tune the analysis.

Working with pipes

Nested

  • In programming, describes code that performs a particular function and is contained within code that performs a broader function

When using pipes:

  • Add the pipe operator at the ends of each line of the piped operation except the last one
    • Pipe operator keyboard shortcut: ctrl + shift + m
  • Check your code after you've programmed your pipe
  • Revisit piped operations to check for parts of your code to fix

R resources for more help

The R community is full of dedicated users helping each other find solutions to problems and new ways of using R. There are also a lot of great blogs where you can find tutorials and other resources  Here are a few of them:

  • RStudio: The best place to find help with R is in R itself! You can input ‘?’ or the help() command to search in R. You can also open the Help pane to find more R resources.
  • RStudio Blog: RStudio’s blog is a great place to find information about RStudio, including company news. You can read the most recent featured posts or use the search bar and the list of categories on the left side of the page to explore specific topics you might find interesting or to search for a specific post.
  • Stack Overflow: The Stack Overflow blog posts opinions and advice from other coders. This is a great place to stay in touch with conversations happening in the community.
  • R-Bloggers: The R-Bloggers blog has useful tutorials and news articles posted by other R users in the community.
  • R-Bloggers' tutorials for learning R: This blog post from R-Bloggers compiles some basic R tutorials and also links to more advanced guides.

Test your knowledge on the tidyverse

TOTAL POINTS 3

Question 1

When working in R, for which part of the data analysis process do analysts use the tidyr package?

  • Data calculations
  • Data cleaning
  • Data visualization
  • Data security

Correct. Analysts use the tidyr package for data cleaning. It works with wide and long data to make sure every part of a data table or data frame is the right data type and in the right place.

Question 2

Which tidyverse package contains a set of functions, such as select(), that help with data manipulation?

  • ggplot2
  • dplyr
  • forcats
  • readr

Correct. The dplyr package is the tidyverse package which contains a set of functions, such as select(), that help with data manipulation. For example, select() selects only relevant variables based on their names.

Question 3

An analyst is organizing a dataset in RStudio using the following code: arrange(filter(Storage_1, inventory >= 40), count) Which of the following examples is a nested function in the code?

  • filter
  • arrange
  • count
  • inventory

Correct. In the analyst's code, filter is the nested function. It is embedded in the argument of the broader arrange function.