7.2.3.Learning about R packages - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki
Available R packages
To make the most of R for your data analysis, you will need to install packages. Packages are units of reproducible R code that you can use to add more functionality to R. The best part is that the R community creates and shares packages so that other users can access them! In this reading, you will learn more about widely used packages and where to find them.
Packages can be found in repositories, which are collections of useful packages that are ready to install. You can find repositories on Bioconductor, R-Forge, rOpenSci, or GitHub, but the most commonly used repository is the Comprehensive R Archive Network or CRAN. CRAN stores code and documentation so that you can install packages into your own RStudio space.
Package documentation
Packages will not only include the code itself, but also documentation that explains the package’s author, function, and any other packages that you will need to download. When you are using CRAN, you can find the package documentation in the DESCRIPTION file.
Check out Karl Broman's R Package Primer to learn more.
Choosing the right packages
With so many packages out there, it can be hard to know which ones will be the most useful for your library or directory of installed packages. Luckily, there are some great resources out there:
-
Tidyverse : the tidyverse is a collection of R packages specifically designed for working with data. It’s a standard library for most data analysts, but you can also download the packages individually.
-
Quick list of useful R packages : this is RStudio Support’s list of useful packages with installation instructions and functionality descriptions.
-
CRAN Task Views : this is an index of CRAN packages sorted by task. You can search for the type of task you need to perform and it will pull up a page with packages related to that task for you to explore.
You will discover more packages throughout this course and as you use R more often, but this is a great starting point for building your own library.
Hands-On Activity: Installing and loading tidyverse
Activity overview
In the last activity, you explored the R sandbox and used some R packages such as the tidyverse. In this activity, you’ll explore further with the tidyverse collection of packages and learn about them using the browseVignettes function.
By the end of this activity, you will know how to easily load vignettes. Moving forward, you can use the browseVignettes function to access and review included documentation to better understand each R package you will use.
Install the tidyverse
If you have not yet installed the tidyverse, open RStudio.
Log in, navigate to the console, type install.packages("tidyverse"), and press Enter (Windows) or Return (Mac).
Then wait as RStudio installs the tidyverse packages (be patient, this can take a little bit). You’ll receive a message that the install is done.
Load the tidyverse
Once the tidyverse packages have been installed, load them so that they are available in your current R session. Load the core tidyverse with the library command. The core tidyverse contains the main packages that work together to make your data analysis smooth and efficient.
To load the core tidyverse, type library(tidyverse) and press Enter (Windows) or Return (Mac).
The output in the console indicates that you have loaded the core tidyverse. Each of the core packages has a green check next to it.
The output also lists conflicts. Conflicts report which objects have the same name in two or more places within your session. This usually happens because an object in your workspace or a package you installed is masking a system object of the same name.
Since you most recently loaded the tidyverse packages, they will be the default packages for your current session.
Read tidyverse vignettes
A vignette is documentation that acts as a guide to an R package. A vignette shares details about the problem that the package is designed to solve and how the included functions can help you solve it. The browseVignettes function allows you to read through vignettes of a loaded package.
To check out vignettes for one specific package, type browseVignettes(“packagename”) and press Enter (Windows) or Return (Mac). Remember that functions are case-sensitive in R, so “Vignettes” must have a capital V.
For example if you execute the browseVignettes() function on ggplot2, browseVignettes(“ggplot2”), you will have the following outcome:
If you are using RStudio (Posit) Cloud, running this function on the Posit Cloud server may lead to a page where the linked contents do not exist. If this is the case, downloading the RStudio Desktop version and running the same browseVignette() functions as above will open a new browser tab with the HTML, source, and R code links leading to the vignettes, and these vignette description pages will be functional.
Reflection
In this activity, you explored the tidyverse package and learned about vignettes. In the text box below, write 2-3 sentences (40-60 words) in response to each of the following questions:
- How might the tidyverse and its packages help you as you learn how to program in R?
- What impact will the browseVignettes function have on your analysis?
Explain: Congratulations on completing this hands-on activity! A good response would include how using the tidyverse packages help you read, manipulate, visualize, and do many other important things with data.
Tidyverse was designed to improve the overall workflow for analysts. Since the packages are all integrated with each other, your analysis will be more efficient. You can use the browseVignettes function to find out more about each package and how to use it.
Test your knowledge on R packages
Question 1
When using RStudio, what does the installed.packages() function do?
A. Creates code for analysts to use to edit their packages
B. Presents a list of packages currently installed in an RStudio session
C. Selects the best packages to use based on an analyst’s current needs
D. Installs all available packages for use in an RStudio session
The correct answer is B. Presents a list of packages currently installed in an RStudio session. Explain: The installed.packages() function shows a list of packages currently installed in an RStudio session. You can then locate the names of the packages and what’s needed to use functions from the package.
Question 2
In data analytics, what is CRAN?
A. A collection of packages that function together to make analysis in R more efficient
B. A commonly used online archive with R packages and other R resources
C. A function for finding packages to use for analysis in RStudio
D. An R interface that has many of the same functions as RStudio
The correct answer is B. A commonly used online archive with R packages and other R resources. Explain: CRAN is a commonly used online archive with R packages and other R resources. CRAN makes sure that the R resources it shares follow the required quality standards and are authentic and valid.
Question 3
What are ggplot2, tidyr, dplyr, and forcats all a part of?
A. A list of functions that clean data efficiently
B. A collection of commonly used, CRAN-based data sets
C. A collection of core tidyverse packages
D. A list of variables for use in programming in RStudio
The correct answer is C. A collection of core tidyverse packages. Explain: The packages ggplot2, tidyr, dplyr, and forcats are part of a collection of eight core tidyverse packages. The other core packages are: tibble, readr, purrr, and stringr.