7.4.1.Create data visualization in R - sj50179/Google-Data-Analytics-Professional-Certificate GitHub Wiki

Visualization basics in R and tidyverse

Base R has its own package and there are other useful packages you can add. They'll help you do almost anything you want with your data from making simple pie charts, to creating more complex visuals like interactive graphs and maps.

General-purpose packages like Plotly let you do a wide range of visualization functions. Others like RGL, focus on specific solutions like 3D visuals.

Some of the most popular include:

  • ggplot2
  • Plotly
  • Lattice
  • RGL
  • Dygraphs
  • Leaflet
  • Highcharter
  • Patchwork
  • gganimate
  • ggridges

Benefits of ggplot2:

  • Create different types of plots
  • Customize the look and feel of plots
  • Create high quality visuals
  • Combine data manipulation and visualization

Core concepts in ggplots2:

  • Aesthetics - A visual property of an object in the plot
  • Geoms - The geometric object used to represent the data
  • Facets - Display smaller groups, or subsets, or the data
  • Labels and annotatinos - Customize the plot

Hands-On Activity: Visualizing data with ggplot2

The basics of ggplot2

The ggplot2 package lets you make high quality, customizable plots of your data. As a refresher, ggplot2 is based on the grammar of graphics, which is a system for describing and building data visualizations. The essential idea behind the grammar of graphics is that you can build any plot from the same basic components, like building blocks.

These building blocks include:

  • A dataset
  • A set of geoms: A geom refers to the geometric object used to represent your data. For example, you can use points to create a scatterplot, bars to create a bar chart, lines to create a line diagram, etc.
  • A set of aesthetic attributes: An aesthetic is a visual property of an object in your plot. You can think of an aesthetic as a connection, or mapping, between a visual feature in your plot and a variable in your data. For example, in a scatterplot, aesthetics include things like the size, shape, color, or location (x-axis, y-axis) of your data points.

To create a plot with ggplot2, you first choose a dataset. Then, you determine how to visually organize your data on a coordinate system by choosing a geom to represent your data points and aesthetics to map your variables.

Prepare your data

The ggplot2 package lets you use R code to specify the dataset, geom, and aesthetics of your plot.

To do this, first choose a dataset to work with. For this activity, you will use the Palmer Penguins data that you’re already familiar with from earlier videos. However, you can also use another dataset instead.

Once you decide on your dataset, open RStudio and follow these steps:

  1. If you have not done so before, use the install.packages() function to install both ggplot2 and the Palmer Penguins data set. Type install.packages("ggplot2") and install.packages("palmerpenguins"), then click Run.

  2. Load ggplot2 and the dataset using the library() function. Type library(ggplot2) and library(palmerpenguins).

3.  Now, examine the data frame for the penguins data. To do this, use the data() and View() functions. Use a capital “V” for the View() function since functions in R are case sensitive. Type data(penguins) and View(penguins), then click Run.

The first 10 rows of the data frame should appear like this:

The penguins dataset contains size measurements for three penguin species (Adelie, Chinstrap, and Gentoo) that live on the Palmer Archipelago in Antarctica. The columns include information such as body mass, flipper length, and bill length.

Create a plot in ggplot2

Suppose you want to plot the relationship between body mass and flipper length in the three penguin species. You can choose a specific geom that fits the type of data you have. Points show the relationship between two quantitative variables. A scatterplot of points would be an effective way to display the relationship between the two variables. You can put flipper length on the x-axis and body mass on the y-axis.

Type the following code to create the plot. But before you run it, review the code piece by piece:

ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

ggplot(data = penguins): In ggplot2, you begin a plot with the ggplot() function. The ggplot() function creates a coordinate system that you can add layers to. The first argument of the ggplot() function is the dataset to use in the plot. In this case, it’s “penguins.”

+: Then, you add a “+” symbol to add a new layer to your plot. You complete your plot by adding one or more layers to ggplot().

geom_point(): Next, you choose a geom by adding a geom function. The geom_point() function uses points to create scatterplots, the geom_bar function uses bars to create bar charts, and so on. In this case, choose the geom_point function to create a scatter plot of points. The ggplot2 package comes with many different geom functions. You’ll learn more about geoms later in this course.

(mapping = aes(x = flipper_length_mm, y = body_mass_g)): Each geom function in ggplot2 takes a mapping argument. This defines how variables in your dataset are mapped to visual properties. The mapping argument is always paired with the aes() function. The x and y arguments of the aes() function specify which variables to map to the x-axis and the y-axis of the coordinate system. In this case, you want to map the variable “flipper_length_mm” to the x-axis, and the variable “body_mass_g” to the y-axis.

Now go ahead and run the code. When you do, you get the following plot:

The plot shows a positive relationship between the two variables. In other words, the larger the penguin, the longer the flipper.

Create your own plot

To create your own plot using code, follow these three steps:

  1. Start with the ggplot() function and choose a dataset to work with.

  2. Add a geom_ function to display your data.

  3. Map the variables you want to plot in the arguments of the aes() function.

Try plotting with different datasets using different geoms and mapping arguments. Coming up in this course, you’ll learn even more about the process of creating a plot. You’ll also get a chance to work with the Penguins dataset to create lots of different plots in ggplot2.

Pro-Tip: You can write the same section of code above using a different syntax with the mapping argument inside the ggplot() call: ggplot(data = penguins, mapping = aes(x = flipper_length_mm, y = body_mass_g)) +  geom_point()

The ggplot2 cheat sheet

This is just the beginning of what you can do with ggplot2. If you want to find out more about ggplot2, RStudio has a useful reference guide called the “Data Visualization with ggplot2 Cheat Sheet.” You can use the Cheat Sheet as a quick reference while you work to learn about the main functions and features of ggplot2.

Click the link to check it out: Cheat Sheet

Common problems when visualizing in R

Coding errors are an inevitable part of writing code—especially when you are first beginning to learn a new programming language. In this reading, you will learn how to recognize common coding errors when creating visualizations using ggplot2. You will also find links to some resources that you can use to help address any coding problems you might encounter moving forward.

Common coding errors in ggplot2

When working with R code in ggplot2, a lot of the most common coding errors involve issues with syntax, like misplaced characters. That is why paying attention to details is such an important part of writing code. When there is an error in your code that R is able to detect, it will generate an error message. Error messages can help point you in the right direction, but they won’t always help you figure out the precise problem.

Let’s explore a few of the most common coding errors you might encounter in ggplot2.

Case sensitivity

R code is case sensitive. If you accidentally capitalize the first letter in a certain function, it might affect your code. Here is an example:

Glimpse(penguins)

The error message lets you know that R cannot find a function named “Glimpse”:

Error in Glimpse(penguins) : could not find function "Glimpse"

But you know that the function glimpse (lowercase “g”) does exist. Notice that the error message doesn’t explain exactly what is wrong but does point you in a general direction.

Based on that, you can figure out that this is the correct code:

glimpse(penguins)

Balancing parentheses and quotation marks

Another common R coding error involves parentheses and quotation marks. In R, you need to make sure that every opening parenthesis in your function has a closing parenthesis, and every opening quotation mark has a closing quotation mark. For example, if you run the following code, nothing happens. R does not create the plot. That is because the second line of code is missing two closing parentheses:

ggplot(data = penguins) +
    geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g

RStudio does alert you to the problem. To the left of the line of code in your RStudio source  editor, you might notice a red circle with a white “X” in the center. If you hover over the circle with your cursor, this message appears:

RStudio lets you know that you have an unmatched opening bracket. So, to correct the code, you know that you need to add a closing bracket to match each opening bracket.

Here is the correct code:

ggplot(data = penguins) +
    geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

Using the plus sign to add layers

In ggplot2, you need to add a plus sign (“+”) to your code when you add a new layer to your plot. Putting the plus sign in the wrong place is a common mistake. The plus sign should always be placed at the end of a line of code, and not at the beginning of a line.

Here’s an example of code that includes incorrect placement of the plus sign:

ggplot(data = penguins) +
    geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

In this case, R’s error message identifies the problem, and prompts you to correct it:

Error: Cannot use +.gg() with a single argument. Did you accidentally put + on a new line?

Here is the correct code:

ggplot(data = penguins) +
    geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

You also might accidentally use a pipe instead of a plus sign to add a new layer to your plot, like this:

ggplot(data = penguins)%>%
    geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

You then get the following error message:

Error: datamust be a data frame, or other object coercible byfortify(), not an S3 object with class gg/ggplot

Here is the correct code:

ggplot(data = penguins) +
    geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

Keeping these issues in mind and paying attention to details when you write code will help you reduce errors and save time, so you can stay focused on your analysis.

Help resources

Everyone makes mistakes when writing code–it is just part of the learning process. Fortunately, there are lots of helpful resources available in RStudio and online.

R documentation

R has built-in documentation for all functions and packages. To learn more about any R function, just run the code ?function_name. For example, if you want to learn more about the geom_bar function, type:

?geom_bar

When you run the code, an entry on “geom_bar” appears in the Help viewer in the lower-right pane of your RStudio workspace. The entry begins with a “Description” section that discusses bar charts:

The RDocumentation website contains much of the same content in a slightly different format, with additional examples and links.

ggplot2 documentation

The ggplot2 page, which is part of the official tidyverse documentation, is a great resource for all things related to ggplot2. It includes entries on key topics, useful examples of code, and links to other helpful resources.

Online search

Doing an online search for the error message you are encountering (and including “R” and the function or package name in your search terms) is another option. There is a good chance someone else has already encountered the same error and posted about it online.

The R community

If the other resources don’t help, you can try reaching out to the R community online. There are lots of useful online forums and websites where people ask for and receive help, including:

Test your knowledge on data visualizations in R

TOTAL POINTS 4

Question 1

In ggplot2, you can use the _____ function to specify the data frame to use for your plot.

  • aes()
  • labs()
  • ggplot()
  • geom_point()

Correct. In ggplot2, you can use the ggplot() function to specify the data frame to use for your plot.

Question 2

In ggplot2, you use the plus sign (+) to add a layer to your plot.

  • True
  • False

Correct. In ggplot2, you use the plus sign (+) to add a layer to your plot.

Question 3

In ggplot2, what function do you use to map variables in your data to visual features of your plot?

  • The ggplot() function
  • The aes() function
  • The geom_bar() function
  • The geom_point() function

Correct. In ggplot2, you use the aes() function to map variables in your data to visual features of your plot. These features are known as aesthetics.

Question 4

What type of plot will the following code create?

ggplot(data = penguins) +
    geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
  • Line diagram
  • Scatterplot
  • Boxplot
  • Bar chart

Correct. The code will create a scatterplot. The function geom_point() uses points to create a scatterplot.