7.4.2.Explore aesthetics in analysis - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki
Aesthetic attributes
In this reading, you will learn about the three basic aesthetic attributes to consider when creating ggplot2 visualizations in R: color, size, and shape. These attributes are essential tools for creating data visualizations with ggplot2 and are built directly into its code.
Aesthetics in ggplot2
Ggplot2 is an R package that allows you to create different types of data visualizations right in your R workspace. In ggplot2, an aesthetic is defined as a visual property of an object in your plot.
There are three aesthetic attributes in ggplot2:
- Color: this allows you to change the color of all of the points on your plot, or the color of each data group
- Size: this allows you to change the size of the points on your plot by data group
- Shape: this allows you to change the shape of the points on your plot by data group
Here’s an example of how aesthetic attributes are displayed in R:
ggplot(data, aes(x=distance, y= dep_delay, color=carrier, size=air_time, shape = carrier)) +
geom_point()
By applying these aesthetic attributes to your work with ggplot2, you can create data visualizations in R that clearly communicate trends in your data.
Additional resources
For more information about aesthetic attributes, check out these resources:
-
Data visualization with ggplot2 cheat sheet : RStudio’s cheat sheet is a great reference to use while working with ggplot2. It has tons of helpful information, including explanations of how to use geoms and examples of the different visualizations that you can create.
-
Stats Education’s Introduction to R : This resource is a great way to learn the basics of ggplot2 and how to apply aesthetic attributes to your plots. You can return to this tutorial as you work more with ggplot2 and your own data.
-
RDocumentation aes function : This guide describes the syntax of the aes function and explains what each argument does.
Smoothing
In this reading, you will learn about smoothing in ggplot2 and how it can be used to make your data visualizations in R clearer and easier to follow. Sometimes it can be hard to understand trends in your data from scatter plots alone. Smoothing enables the detection of a data trend even when you can't easily notice a trend from the plotted data points. Ggplot2’s smoothing functionality is helpful because it adds a smoothing line as another layer to a plot; the smoothing line helps the data to make sense to a casual observer.
Example code
ggplot(data, aes(x=distance,
y= dep_delay)) +
geom_point() +
geom_smooth()
The example code creates a plot with a trend line similar to the blue line below.
Two types of smoothing
The smoothing functionality in ggplot2 helps make data plots more readable, so you are better able to recognize data trends and make key insights. The first plot below is the data before smoothing, and the second plot below is the same data after smoothing.
Additional resource
For more information about smoothing, refer to the Smoothing section in the Stats Education’s Introduction to R course. It includes detailed descriptions and examples of how to use the different types of smoothing in ggplot2. It also includes links to other lessons about ggplot2. You can explore these to get more familiar with plotting data in R.
Hands-On Activity: Aesthetics and visualizations
Activity overview
In previous activities, you learned about and worked with ggplot2, an R package for data visualization. In this activity, you’ll follow through a scenario and continue to apply ggplot2 to tailor aesthetic features of visualizations.
By the end of this activity, you will be able to use R to create bar charts, update chart labels, and customize the aesthetics of a visualization by specific criteria. This will enable you to make more complex visualizations to demonstrate your findings.
Working in RStudio Cloud
To start, log into your RStudio (Posit) Cloud account. Open the project you will work on in the activity with this link, which opens in a new tab. If you haven't gone through this process already, at the top right portion of the screen you will see a "red stamp" indicating this project as a Temporary Copy. Click on the adjacent button, Save a Permanent Copy, and the project will be saved in your main dashboard for use with future lessons. Once that is completed, navigate to the file explorer in the bottom right and click on the following: Course 7 -> Week 4 -> Lesson3_Aesthetics.Rmd.
The .csv file that you will need, hotel_bookings.csv, is also located in this folder.
If you have trouble finding the correct activity, check out this step-by-step guide on how to navigate in RStudio (Posit) Cloud. Make sure to select the correct R markdown (Rmd) file. The other Rmd files will be used in different activities.
If you are using RStudio Desktop, you can download the Rmd file and the data for this activity directly here:
You can also find the Rmd file with the solutions for this activity here:
Carefully read the instructions in the comments of the Rmd file and complete each step. Some steps may be as simple as running pre-written code, while others may require you to write your own functions. After you finish the steps in the Rmd file, return here to confirm that your work is complete.
Note: In Step #6 of the .RMD exercise attachment, be sure to add the chart variable name deposit_type within the facet_wrap function parentheses. Your code lines should look like this in your R editor:
{r creating a plot}
ggplot(data = hotel_bookings) +
geom_bar(mapping = aes(x = distribution_channel)) +
facet_wrap(~deposit_type)
Confirmation
Based on the bar chart you created in Step 4, which distribution type has the most number of bookings?
A. Direct
B. Corporate
C. GDS
D. TA/TO
The correct answer is D. TA/TO. Explain: The TA/TO distribution type has the most number of bookings. By using ggplot2, you were able to customize the visualization so that it plainly shows which distribution type has the most number of bookings. Going forward, you can change the aesthetics of your visualization to emphasize different aspects of your findings, respond to stakeholder requests, and improve your presentations.
Filtering and plots
By this point you have likely downloaded at least a few packages into your R library. The tools in some of these packages can actually be combined and used together to become even more useful. This reading will share a few resources that will teach you how to use the filter function from dplyr to make the plots you create with ggplot2 easier to read.
Example of filtering data for plotting
Filtering your data before you plot it allows you to focus on specific subsets of your data and gain more targeted insights. To do this, just include the dplyr filter() function in your ggplot syntax.
Example code
data %>%
filter(variable1 == "DS") %>%
ggplot(aes(x = weight, y = variable2, colour = variable1)) +
geom_point(alpha = 0.3, position = position_jitter()) + stat_smooth(method = "lm")
Additional resources
To learn more details about ggplot2 and filtering with dplyr, check out these resources:
-
Putting it all together: (dplyr+ggplot) : The RLadies of Sydney’s course on R uses real data to demonstrate R functions. This lesson focuses specifically on combining dplyr and ggplot to filter data before plotting it. The instructional video will guide you through every step in the process while you follow along with the data they have provided.
-
Data transformation: This resource focuses on how to use the filter() function in R, and demonstrates how to combine filter() with ggplot(). This is a useful resource if you are interested in learning more about how filter() can be used before plotting.
-
Visualizing data with ggplot2: This comprehensive guide includes everything from the most basic uses for ggplot2 to creating complicated visualizations. It includes the filter() function in most of the examples so you can learn how to implement it in R to create data visualizations.
Hands-On Activity: Filters and plots
Activity overview
So far, you have learned a lot about ggplot2 and how to create data visualizations in R. In this activity, you’ll follow through a scenario and use the filters and facets features of ggplot2.
By the end of this activity, you will be able to customize your visualizations by applying filters and highlighting facets. This will enable you to emphasize certain aspects of your insights to create comparisons and more nuanced insights in your presentations.
Working in RStudio Cloud
To start, log into your RStudio (Posit) Cloud account. Open the project you will work on in the activity with this link, which opens in a new tab. If you haven't gone through this process already, at the top right portion of the screen you will see a "red stamp" indicating this project as a Temporary Copy. Click on the adjacent button, Save a Permanent Copy, and the project will be saved in your main dashboard for use with future lessons. Once that is completed, navigate to the file explorer in the bottom right and click on the following: Course 7 -> Week 4 -> Lesson3_Filters.Rmd.
The .csv file that you will need, hotel_bookings.csv, is also located in this folder.
If you have trouble finding the correct activity, check out this step-by-step guide on how to navigate in RStudio (Posit) Cloud. Make sure to select the correct R markdown (Rmd) file. The other Rmd files will be used in different activities.
If you are using RStudio Desktop, you can download the Rmd file and the data for this activity directly here:
You can also find the Rmd file with the solutions for this activity here:
Carefully read the instructions in the comments of the Rmd file and complete each step. Some steps may be as simple as running pre-written code, while others may require you to write your own functions. After you finish the steps in the Rmd file, return here to confirm that your work is complete.
Confirmation
In Step 5 of this activity, you created a data frame onlineta_city_hotels_v2. What is the lead time in the first row created in this data frame?
A. 65
B. 88
C. 92
D. 100
The correct answer is B. 88. Explain: The lead time in the first row of the onlineta_city_hotels_v2 data frame is 88. By using a filter with ggplot2, you are able to select specific segments of your data and plot them using R. Going forward, you can use filters and facets to compare visualizations of different aspects of the same data to gain even deeper insights from your analyses.
Test your knowledge on aesthetics in analysis
Question 1
Which of the following aesthetics attributes can you map to the data in a scatterplot? Select all that apply.
- Color
- Shape
- Size
- Text
Explain: You can map the color, shape, and size aesthetics to the data in a scatterplot.
Question 2
Which of the following functions let you display smaller groups, or subsets, of your data?
A. geom_bar()
B. geom_point()
C. ggplot()
D. facet_wrap()
The correct answer is D. facet_wrap(). Explain: The facet_wrap() function lets you display smaller groups, or subsets, of your data.
Question 3
What is the role of the x argument in the following code?
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
A. A variable
B. A function
C. A dataset
D. An aesthetic
The correct answer is D. An aesthetic. Explain: X is an aesthetic that refers to the x-axis of the plot. The x aesthetic maps the variable cut from the diamonds dataset to the x-axis of the plot.
Question 4
A data analyst creates a scatterplot with a lot of data points. It is difficult for the analyst to distinguish the individual points on the plot because they overlap. What function could the analyst use to make the points easier to find?
A. geom_point()
B. geom_jitter()
C. geom_bar()
D. geom_line()
The correct answer is B. geom_jitter(). Explain: The analyst could use the geom_jitter() function to make the points easier to find. The geom_jitter() function adds a small amount of random noise to each point in the plot, which helps deal with the overlapping of points.