7.4.2.Explore aesthetics in analysis - sj50179/Google-Data-Analytics-Professional-Certificate GitHub Wiki

Enhancing visualizations in R

ggplot(data=penguins)+
    geom_point(mapping=aes(x=flipper_length_mm, y=body_mass_g))

ggplot(data=penguins)+
    geom_point(mapping=aes(x=flipper_length_mm, y=body_mass_g, **shape**=species))

ggplot(data=penguins)+
    geom_point(mapping=aes(x=flipper_length_mm, y=body_mass_g, **color**=species))

ggplot(data=penguins)+
    geom_point(mapping=aes(x=flipper_length_mm, y=body_mass_g, **alpha**=species))

ggplot(data=penguins)+
    geom_point(mapping=aes(x=flipper_length_mm, y=body_mass_g, **size**=species))

ggplot(data=penguins)+
    geom_point(mapping=aes(x=flipper_length_mm, y=body_mass_g, **shape**=species), **color**="purple")

Aesthetic attributes

In this reading, you will learn about the three basic aesthetic attributes to consider when creating ggplot2 visualizations in R: color, size, and shape. These attributes are essential tools for creating data visualizations with ggplot2 and are built directly into its code.

Aesthetics in ggplot2

Ggplot2 is an R package that allows you to create different types of data visualizations right in your R workspace. In ggplot2, an **aesthetic ****is defined as a visual property of an object in your plot.

There are three aesthetic attributes in ggplot2:

  • Color: this allows you to change the color of all of the points on your plot, or the color of each data group
  • Size: this allows you to change the size of the points on your plot by data group
  • Shape: this allows you to change the shape of the points on your plot by data group

Here’s an example of how aesthetic attributes are displayed in R:

ggplot(data, aes(x=distance, y= dep_delay, color=carrier, size=air_time, shape = carrier))+
    geom_point()

By applying these aesthetic attributes to your work with ggplot2, you can create data visualizations in R that clearly communicate trends in your data.

Additional resources

For more information about aesthetic attributes, check out these resources:

  • Data visualization with ggplot2 cheat sheet: RStudio’s cheat sheet is a great reference to use while working with ggplot2. It has tons of helpful information, including explanations of how to use geoms and examples of the different visualizations that you can create.
  • Stats Education’s Introduction to R: This resource is a great way to learn the basics of ggplot2 and how to apply aesthetic attributes to your plots. You can return to this tutorial as you work more with ggplot2 and your own data.
  • RDocumentation aes function: This guide describes the syntax of the aes function and explains what each argument does.

Doing more with ggplot

ggplot(data=penguins) +
    geom_smooth(mapping=aes(x=flipper_length_mm, y=body_mass_g))

ggplot(data=penguins) +
    geom_smooth(mapping=aes(x=flipper_length_mm, y=body_mass_g)) +
    geom_point(mapping=aes(x=flipper_length_mm, y=body_mass_g))

ggplot(data=penguins) +
    geom_smooth(mapping=aes(x=flipper_length_mm, y=body_mass_g, linetype=species))

ggplot(data=penguins) +
    geom_smooth(mapping=aes(x=flipper_length_mm, y=body_mass_g, linetype=species, color=species))

ggplot(data=penguins) +
    geom_jitter(mapping=aes(x=flipper_length_mm, y=body_mass_g))

ggplot(data=diamonds) +
    geom_bar(mapping=aes(x=cut))

ggplot(data=diamonds) +
    geom_bar(mapping=aes(x=cut, color=cut))

ggplot(data=diamonds) +
    geom_bar(mapping=aes(x=cut, fill=cut))

ggplot(data=diamonds) +
    geom_bar(mapping=aes(x=cut, fill=clarity))

Smoothing

In this reading, you will learn about smoothing in ggplot2 and how it can be used to make your data visualizations in R clearer and easier to follow. Sometimes it can be hard to understand trends in your data from scatter plots alone. Smoothing enables the detection of a data trend even when you can't easily notice a trend from the plotted data points. Ggplot2’s smoothing functionality is helpful because it adds a smoothing line as another layer to a plot; the smoothing line helps the data to make sense to a casual observer.

# Example code
ggplot(data, aes(x=distance, 
y= dep_delay)) +
    geom_point() +
    geom_smooth()

The example code creates a plot with a trend line similar to the blue line below.

Two types of smoothing

The smoothing functionality in ggplot2 helps make data plots more readable, so you are better able to recognize data trends and make key insights. The first plot below is the data before smoothing, and the second plot below is the same data after smoothing.

Additional resource

For more information about smoothing, refer to the Smoothing section in the Stats Education’s Introduction to R course. It includes detailed descriptions and examples of how to use the different types of smoothing in ggplot2. It also includes links to other lessons about ggplot2. You can explore these to get more familiar with plotting data in R.

Aesthetics and facets

facet functions

  • facet_wrap() - to facet the plot by a single variable
ggplot(data=penguins,aes(x=flipper_length_mm, y=body_mass_g)) +
    geom_point(aes(color=species)) +
    facet_wrap(~species)

ggplot(data=diamonds) +
    geom_bar(mapping=aes(x=color, fill=cut)) +
    facet_wrap(~cut)

  • facet_grid() - split the plot into facets vertically by the values of the first variable and horizontally by the values of the second variable
ggplot(data=penguins,aes(x=flipper_length_mm, y=body_mass_g)) +
    geom_point(aes(color=species)) +
    facet_grid(sex~species)

ggplot(data=penguins,aes(x=flipper_length_mm, y=body_mass_g)) +
    geom_point(aes(color=species)) +
    facet_grid(~sex) +
    theme(axis.text.x = element_text(angle = 45))

Filtering and plots

By this point you have likely downloaded at least a few packages into your R library. The tools in some of these packages can actually be combined and used together to become even more useful. This reading will share a few resources that will teach you how to use the filter function from dplyr to make the plots you create with ggplot2 easier to read.

Example of filtering data for plotting

Filtering your data before you plot it allows you to focus on specific subsets of your data and gain more targeted insights. To do this, just include the dplyr filter() function in your ggplot syntax.

# Example code
data %>%
    filter(variable1 == "DS") %>%  
    ggplot(aes(x = weight, y = variable2, colour = variable1)) +  
    geom_point(alpha = 0.3,  position = position_jitter()) + stat_smooth(method = "lm")

Additional resources

To learn more details about ggplot2 and filtering with dplyr, check out these resources:

  • Putting it all together: (dplyr+ggplot): The RLadies of Sydney’s course on R uses real data to demonstrate R functions. This lesson focuses specifically on combining dplyr and ggplot to filter data before plotting it. The instructional video will guide you through every step in the process while you follow along with the data they have provided.
  • Data transformation: This resource focuses on how to use the filter() function in R, and demonstrates how to combine filter() with ggplot(). This is a useful resource if you are interested in learning more about how filter() can be used before plotting.
  • Visualizing data with ggplot2: This comprehensive guide includes everything from the most basic uses for ggplot2 to creating complicated visualizations. It includes the filter() function in most of the examples so you can learn how to implement it in R to create data visualizations.

Test your knowledge on aesthetics in analysis

TOTAL POINTS 4

Question 1

Which of the following aesthetics attributes can you map to the data in a scatterplot? Select all that apply.

  • Shape
  • Text
  • Color
  • Size

Correct. You can map the color, shape, and size aesthetics to the data in a scatterplot.

Question 2

Which of the following functions let you display smaller groups, or subsets, of your data?

  • geom_point()
  • facet_wrap()
  • ggplot()
  • geom_bar()

Correct. The facet_wrap() function lets you display smaller groups, or subsets, of your data.

Question 3

What is the role of the x argument in the following code?

ggplot(data = diamonds)+
    geom_bar(mapping = aes(x = cut))
  • A variable
  • A function
  • An aesthetic
  • A dataset

Correct. X is an aesthetic that refers to the x-axis of the plot. The x aesthetic maps the variable cut from the diamonds dataset to the x-axis of the plot.

Question 4

A data analyst creates a scatterplot with a lot of data points. It is difficult for the analyst to distinguish the individual points on the plot because they overlap. What function could the analyst use to make the points easier to find?

  • geom_jitter()
  • geom_line()
  • geom_point()
  • geom_bar()

Correct. The analyst could use the geom_jitter() function to make the points easier to find. The geom_jitter() function adds a small amount of random noise to each point in the plot, which helps deal with the overlapping of points.