7.5.5.Course wrap‐up - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki
Scenario 1, questions 1-7
As part of the data science team at Gourmet Analytics, you use data analytics to advise companies in the food industry. You clean, organize, and visualize data to arrive at insights that will benefit your clients. As a member of a collaborative team, sharing your analysis with others is an important part of your job.
Your current client is Chocolate and Tea, an up-and-coming chain of cafes.
The eatery combines an extensive menu of fine teas with chocolate bars from around the world. Their diverse selection includes everything from plantain milk chocolate, to tangerine white chocolate, to dark chocolate with pistachio and fig. The encyclopedic list of chocolate bars is the basis of Chocolate and Tea’s brand appeal. Chocolate bar sales are the main driver of revenue.
Chocolate and Tea aims to serve chocolate bars that are highly rated by professional critics. They also continually adjust the menu to make sure it reflects the global diversity of chocolate production. The management team regularly updates the chocolate bar list in order to align with the latest ratings and to ensure that the list contains bars from a variety of countries.
They’ve asked you to collect and analyze data on the latest chocolate ratings. In particular, they’d like to know which countries produce the highest-rated bars of super dark chocolate (a high percentage of cocoa). This data will help them create their next chocolate bar menu.
Your team has received a dataset that features the latest ratings for thousands of chocolates from around the world. Click here to access the dataset. Given the data and the nature of the work you will do for your client, your team agrees to use R for this project.
Your supervisor asks you to write a short summary of the benefits of using R for the project. Which of the following benefits would you include in your summary? Select all that apply.
- Create high-quality data visualizations
- Quickly process lots of data
- Easily reproduce and share the analysis
- Define a problem and ask the right questions
Explain: The benefits of using R for the project include the ability to quickly process lots of data and create high-quality data visualizations. You can also easily reproduce and share your analysis.
Scenario 1, continued
Before you begin working with your data, you need to import it and save it as a data frame. To get started, you open your RStudio workspace and load the tidyverse library. You upload a .csv file containing the data to RStudio and store it in a project folder named flavors_of_cacao.csv.
You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is bars_df and the .csv file is in the working directory. What code chunk lets you create the data frame?
A. bars_df <- read_csv("flavors_of_cacao.csv")
B. bars_df + read_csv("flavors_of_cacao.csv")
C. read_csv("flavors_of_cacao.csv") + bars_df
D. bars_df %>% read_csv("flavors_of_cacao.csv")
The correct answer is A. bars_df <- read_csv("flavors_of_cacao.csv")
. Explain:
The code chunk bars_df <- read_csv("flavors_of_cacao.csv")
lets you create the data frame. In this code chunk:
- bars_df is the name of the data frame that will store the data.
- <- is the assignment operator to assign values to the data frame.
- read_csv() is the function that will import the data to the data frame.
- "flavors_of_cacao.csv" is the file name that read.csv() function takes for its argument.
Scenario 1, continued
Now that you’ve created a data frame, you want to find out more about how the data is organized. The data frame has hundreds of rows and lots of columns.
Assume the name of your data frame is flavors_df. What code chunk lets you review the structure of the data frame?
A. filter(flavors_df)
B. summarize(flavors_df)
C. select(flavors_df)
D. str(flavors_df)
The correct answer is D. str(flavors_df)
. Explain: You write the code chunk str(flavors_df)
. In this code chunk:
-
str()
is the function that will return the structure of the data frame, and give you high-level information like the column names and the type of data contained in those columns. -
flavors_df
is the name of the data frame that the str() function takes for its argument.
Scenario 1, continued
Next, you begin to clean your data. When you check out the column headings in your data frame you notice that the first column is named Company...Maker.if.known. (Note: The period after known is part of the variable name.) For the sake of clarity and consistency, you decide to rename this column Brand (without a period at the end).
Assume the first part of your code chunk is:
flavors_df %>%
What code chunk do you add to change the column name?
A. rename(Brand = Company...Maker.if.known.)
B. rename(Brand, Company...Maker.if.known.)
C. rename(Company...Maker.if.known. , Brand)
D. rename(Company...Maker.if.known. = Brand)
The correct answer is A. rename(Brand = Company...Maker.if.known.)
. Explain:
You write the code chunk rename(Brand = Company...Maker.if.known.).
In this code chunk:
-
rename()
is the function that will change the name of your column. - Inside the parentheses of the function, write the new name (
Brand
), then an equals sign, then the name you want to change(Company...Maker.if.known.)
.
After previewing and cleaning your data, you determine what variables are most relevant to your analysis. Your main focus is on Rating, Cocoa.Percent, and Bean.Type. You decide to use the select() function to create a new data frame with only these three variables.
Assume the first part of your code is:
trimmed_flavors_df <- flavors_df %>%
Add the code chunk that lets you select the three variables.
select(Rating, Cocoa.Percent, Bean.Type)
Output:
# A tibble: 1,795 <U+00D7> 3
Rating Cocoa.Percent Bean.Type
<dbl> <chr> <chr>
1 3.75 63%
2 2.75 70%
3 3.00 70%
4 3.50 70%
5 3.50 70%
6 2.75 70% Criollo
7 3.50 70%
8 3.50 70% Criollo
9 3.75 70% Criollo
10 4.00 70%
# ... with 1,785 more rows
What bean type appears in row 6 of your tibble?
A. Forastero
B. Criollo
C. Trinitario
D. Beniano
The correct answer is B. Criollo. Explain: You add the code chunk select(Rating, Cocoa.Percent, Bean.Type)
to select the three variables. The correct code is trimmed_flavors_df <- flavors_df %>% select(Rating, Cocoa.Percent, Bean.Type)
. In this code chunk:
- The select() function lets you select specific variables for your new data frame.
- select() takes the names of the variables you want to choose as its argument: Rating, Cocoa.Percent, Bean.Type.
The bean type Criollo appears in row 6 of your tibble.
Next, you select the basic statistics that can help your team better understand the ratings system in your data.
Assume the first part of your code is:
trimmed_flavors_df %>%
You want to use the summarize() and max() functions to find the maximum rating for your data. Add the code chunk that lets you find the maximum value for the variable Rating.
summarize(max(Rating))
Output:
# A tibble: 1 <U+00D7> 1
`max(Rating)`
<dbl>
1 5
What is the maximum rating?
A. 6
B. 4.5
C. 5.5
D. 5
The correct answer is D. 5. Explain: You add the code chunk summarize(max(Rating))
to find the maximum value for the variable Rating. The correct code is trimmed_flavors_df %>% summarize(max(Rating))
. In this code chunk:
- The summarize() function lets you display summary statistics. You can use the summarize() function in combination with other functions such as mean(), max(), and min() to calculate specific statistics.
- In this case, you use max() to calculate the maximum value for the variable Rating.
The maximum rating is 5.
After completing your analysis of the rating system, you determine that any rating greater than or equal to 3.5 points can be considered a high rating. You also know that Chocolate and Tea considers a bar to be super dark chocolate if the bar's cocoa percent is greater than or equal to 70%. You decide to create a new data frame to find out which chocolate bars meet these two conditions.
Assume the first part of your code is:
best_trimmed_flavors_df <- trimmed_flavors_df %>%
You want to apply the filter() function to the variables Cocoa.Percent and Rating. Add the code chunk that lets you filter the data frame for chocolate bars that contain at least 70% cocoa and have a rating of at least 3.5 points.
filter(Cocoa.Percent >= 70, Rating >= 3.5)
Output:
# A tibble: 574 <U+00D7> 3
Rating Cocoa.Percent Company.Location
<dbl> <chr> <chr>
1 3.50 70% France
2 3.50 70% France
3 3.50 70% France
4 3.50 70% France
5 3.75 70% France
6 4.00 70% France
7 3.75 70% France
8 4.00 70% France
9 3.50 70% France
10 3.50 70% France
# ... with 564 more rows
What rating appears in row 1 of your tibble?
A. 3.50
B. 3.75
C. 4.25
D. 4.00
The correct answer is A. 3.50. Explain: The code chunk filter(Cocoa.Percent >= 70, Rating >= 3.5)
lets you filter the data frame for chocolate bars that contain at least 70% cocoa and have a rating of at least 3.5 points. The correct code is best_trimmed_flavors_df <- trimmed_flavors_df %>% filter(Cocoa.Percent >= 70, Rating >= 3.5)
. In this code chunk:
- The filter() function lets you filter your data frame based on specific criteria. Cocoa.Percent and Rating refer to the variables you want to filter.
- The >= operator signifies “greater than or equal to.”
- The new data frame will show all the values of Cocoa.Percent greater than or equal to 70, and all the values of Rating greater than or equal to 3.5.
The rating 3.50 appears in row 1 of your tibble.
Now that you’ve cleaned and organized your data, you’re ready to create some useful data visualizations. Your team assigns you the task of creating a series of visualizations based on requests from the Chocolate and Tea management team. You decide to use ggplot2 to create your visuals.
Assume your first line of code is:
ggplot(data = best_trimmed_flavors_df) +
You want to use the geom_bar() function to create a bar chart. Add the code chunk that lets you create a bar chart with the variable Rating on the x-axis.
geom_bar(mapping = aes(x = Rating))
Output:
How many bars does your bar chart display?
A. 3
B. 5
C. 6
D. 2
The correct answer is D. 2. Explain: You add the code chunk geom_bar(mapping = aes(x = Rating))
to create a bar chart with the variable Rating on the x-axis. The correct code is ggplot(data = best_trimmed_flavors_df) + geom_bar(mapping = aes(x = Rating))
. In this code chunk:
- geom_bar() is the geom function that uses bars to create a bar chart.
- Inside the parentheses of the aes() function, the code
x = Rating
maps the x aesthetic to the variable Rating. - Rating will appear on the x-axis of the plot.
- By default, R will put a count of the variable Rating on the y-axis.
Your bar chart displays 2 bars.
Your bar chart reveals the locations that produce the highest-rated chocolate bars. To get a better idea of the specific rating for each location, you’d like to highlight each bar.
Assume that you are working with the code chunk:
ggplot(data = best_trimmed_flavors_df) +
geom_bar(mapping = aes(x = Company.Location))
Add a code chunk to the second line of code to map the aesthetic alpha to the variable Rating.
NOTE: the three dots (...) indicate where to add the code chunk.
geom_bar(mapping = aes(x = Company.Location, fill=Rating))
Output:
According to your bar chart, which two company locations produce the highest rated chocolate bars?
A. Canada and Amsterdam
B. Canada and France
C. Scotland and Amsterdam
D. U.S.A. and France
The correct answer is B. Canada and France.
Scenario 2, continued
A teammate creates a new plot based on the chocolate bar data. The teammate asks you to make some revisions to their code.
Assume your teammate shares the following code chunk:
ggplot(data = best_trimmed_flavors_df) +
geom_bar(mapping = aes(x = Cocoa.Percent)) +
What code chunk do you add to the third line to create wrap around facets of the variable Cocoa.Percent?
A. facet_wrap(%>%Cocoa.Percent)
B. facet_wrap(Cocoa.Percent~)
C. facet_wrap(~Cocoa.Percent)
D. facet(=Cocoa.Percent)
The correct answer is C. facet_wrap(~Cocoa.Percent)
. Explain: You write the code chunk facet_wrap(~Cocoa.Percent)
. In this code chunk:
-
facet_wrap()
is the function that lets you create wrap around facets of a variable. - Inside the parentheses of the
facet_wrap()
function, type a tilde symbol (~) followed by the name of the variable (Cocoa.Percent
).
Scenario 2, continued
Your team has created some basic visualizations to explore different aspects of the chocolate bar data. You’ve volunteered to add titles to the plots. You begin with a scatterplot.
Assume the first part of your code chunk is:
ggplot(data = trimmed_flavors_df) +
geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +
What code chunk do you add to the third line to add the title Suggested Chocolate to your plot?
A. labs(title = “Suggested Chocolate”)
B. labs <- "Suggested Chocolate"
C. labs(Suggested Chocolate = title)
D. labs(Suggested Chocolate)
The correct answer is A. labs(title = “Suggested Chocolate”)
. Explain: You write the code chunk labs(title = “Suggested Chocolate”)
. In this code chunk:
-
labs()
is the function that lets you add a title to your plot. - In the parentheses of the labs() function, write the word title, then an equals sign, then the specific text of the title in quotation marks (
“Suggested Chocolate”
).
Scenario 2, continued
Next, you create a new scatterplot to explore the relationship between different variables. You want to save your plot so you can access it later on. You know that the ggsave() function defaults to saving the last plot that you displayed in RStudio, so you’re ready to write the code to save your scatterplot.
Assume your first two lines of code are:
ggplot(data = trimmed_flavors_df) +
geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +
What code chunk do you add to the third line to save your plot as a jpeg file with chocolate as the file name?
A. ggsave(“chocolate.png”)
B. ggsave(“jpeg.chocolate”)
C. ggsave(chocolate.jpeg)
D. ggsave(“chocolate.jpeg”)
The correct answer is D. ggsave(“chocolate.jpeg”)
. Explain: You add the code chunk ggsave(“chocolate.jpeg”)
to save your plot as a jpeg file with “chocolate” as the file name. In this code chunk:
- Inside the parentheses of the ggsave() function, type a quotation mark followed by the file name (chocolate), then a period, then the type of file format (jpeg), then a closing quotation mark.
Scenario 2, continued
As a final step in the analysis process, you create a report to document and share your work. Before you share your work with the management team at Chocolate and Tea, you are going to meet with your team and get feedback. Your team wants the documentation to include all your code and display all your visualizations.
You decide to create an R Markdown notebook to document your work. What are your reasons for choosing an R Markdown notebook? Select all that apply.
- It displays your data visualizations
- It lets you record and share every step of your analysis
- It automatically creates a website to show your work
- It allows users to run your code
Explain: You choose an R Markdown notebook to document your work because it lets you record and share every step of your analysis. The notebook allows users to run your code and also displays your data visualizations.
Scenario 1, questions 1-7
As part of the data science team at Gourmet Analytics, you use data analytics to advise companies in the food industry. You clean, organize, and visualize data to arrive at insights that will benefit your clients. As a member of a collaborative team, sharing your analysis with others is an important part of your job.
Your current client is Chocolate and Tea, an up-and-coming chain of cafes.
The eatery combines an extensive menu of fine teas with chocolate bars from around the world. Their diverse selection includes everything from plantain milk chocolate, to tangerine white chocolate, to dark chocolate with pistachio and fig. The encyclopedic list of chocolate bars is the basis of Chocolate and Tea’s brand appeal. Chocolate bar sales are the main driver of revenue.
Chocolate and Tea aims to serve chocolate bars that are highly rated by professional critics. They also continually adjust the menu to make sure it reflects the global diversity of chocolate production. The management team regularly updates the chocolate bar list in order to align with the latest ratings and to ensure that the list contains bars from a variety of countries.
They’ve asked you to collect and analyze data on the latest chocolate ratings. In particular, they’d like to know which countries produce the highest-rated bars of super dark chocolate (a high percentage of cocoa). This data will help them create their next chocolate bar menu.
Your team has received a dataset that features the latest ratings for thousands of chocolates from around the world. Click here to access the dataset. Given the data and the nature of the work you will do for your client, your team agrees to use R for this project.
Your supervisor asks you to write a short summary of the benefits of using R for the project. Which of the following benefits would you include in your summary? Select all that apply.
- Easily reproduce and share the analysis
- Quickly process lots of data
- Define a problem and ask the right questions
- Create high-quality data visualizations
Scenario 1, continued
Before you begin working with your data, you need to import it and save it as a data frame. To get started, you open your RStudio workspace and load the tidyverse library. You upload a .csv file containing the data to RStudio and store it in a project folder named flavors_of_cacao.csv.
You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is bars_df and the .csv file is in the working directory. What code chunk lets you create the data frame?
A. bars_df + read_csv("flavors_of_cacao.csv")
B. bars_df %>% read_csv("flavors_of_cacao.csv")
C. bars_df <- read_csv("flavors_of_cacao.csv")
D. read_csv("flavors_of_cacao.csv") + bars_df
The correct answer is C. bars_df <- read_csv("flavors_of_cacao.csv")
. Explain: The code chunk bars_df <- read_csv("flavors_of_cacao.csv")
lets you create the data frame. In this code chunk:
- bars_df is the name of the data frame that will store the data.
- <- is the assignment operator to assign values to the data frame.
- read_csv() is the function that will import the data to the data frame.
- "flavors_of_cacao.csv" is the file name that read.csv() function takes for its argument.
Scenario 1, continued
Now that you’ve created a data frame, you want to find out more about how the data is organized. The data frame has hundreds of rows and lots of columns.
Assume the name of your data frame is flavors_df. What code chunk lets you review the column names in the data frame?
A. arrange(flavors_df)
B. colnames(flavors_df)
C. col(flavors_df)
D. rename(flavors_df)
The correct answer is B. colnames(flavors_df)
. Explain: You write the code chunk colnames(flavors_df)
. In this code chunk:
-
colnames()
is the function that will let you review the column names in the data frame. -
flavors_df
is the name of the data frame that the colnames() function takes for its argument.
Scenario 1, continued
Next, you begin to clean your data. When you check out the column headings in your data frame you notice that the first column is named Company...Maker.if.known. (Note: The period after known is part of the variable name.) For the sake of clarity and consistency, you decide to rename this column Company (without a period at the end).
Assume the first part of your code chunk is:
flavors_df %>%
What code chunk do you add to change the column name?
A. rename(Company...Maker.if.known. = Company)
B. rename(Company = Company...Maker.if.known.)
C. rename(Company...Maker.if.known. <- Company)
D. rename(Company <- Company...Maker.if.known.)
The correct answer is B. rename(Company = Company...Maker.if.known.)
. Explain: You write the code chunk rename(Company = Company...Maker.if.known.)
.
In this code chunk:
-
rename()
is the function that will change the name of your column. - Inside the parentheses of the function, write the new name (
Company
), then an equals sign, then the name you want to change (Company...Maker.if.known.
).
After previewing and cleaning your data, you determine what variables are most relevant to your analysis. Your main focus is on Rating, Cocoa.Percent, and Company.Location. You decide to use the select() function to create a new data frame with only these three variables.
Assume the first part of your code is:
trimmed_flavors_df <- flavors_df %>%
Add the code chunk that lets you select the three variables.
select(Rating, Cocoa.Percent, Company.Location)
Output:
# A tibble: 1,795 <U+00D7> 3
Rating Cocoa.Percent Company.Location
<dbl> <chr> <chr>
1 3.75 63% France
2 2.75 70% France
3 3.00 70% France
4 3.50 70% France
5 3.50 70% France
6 2.75 70% France
7 3.50 70% France
8 3.50 70% France
9 3.75 70% France
10 4.00 70% France
# ... with 1,785 more rows
What company location appears in row 1 of your tibble?
A. France
B. Colombia
C. Canada
D. Scotland
The correct answer is A. France. Explain: You add the code chunk select(Rating, Cocoa.Percent, Company.Location)
to select the three variables. The correct code is trimmed_flavors_df <- flavors_df %>% select(Rating, Cocoa.Percent, Company.Location)
. In this code chunk:
- The select() function lets you select specific variables for your new data frame.
- select() takes the names of the variables you want to choose as its argument: Rating, Cocoa.Percent, Company.Location.
The company location France appears in row 1 of your tibble.
Next, you select the basic statistics that can help your team better understand the ratings system in your data.
Assume the first part of your code is:
trimmed_flavors_df %>%
You want to use the summarize() and max() functions to find the maximum rating for your data. Add the code chunk that lets you find the maximum value for the variable Rating.
summarize(max(Rating))
Output:
# A tibble: 1 <U+00D7> 1
`max(Rating)`
<dbl>
1 5
What is the maximum rating?
A. 4.5
B. 6
C. 5
D. 5.5
The correct answer is C. 5. Explain: You add the code chunk summarize(max(Rating))
to find the maximum value for the variable Rating. The correct code is trimmed_flavors_df %>% summarize(max(Rating))
. In this code chunk:
- The summarize() function lets you display summary statistics. You can use the summarize() function in combination with other functions such as mean(), max(), and min() to calculate specific statistics.
- In this case, you use max() to calculate the maximum value for the variable Rating.
The maximum rating is 5.
After completing your analysis of the rating system, you determine that any rating greater than or equal to 3.9 points can be considered a high rating. You also know that Chocolate and Tea considers a bar to be super dark chocolate if the bar's cocoa percent is greater than or equal to 75%. You decide to create a new data frame to find out which chocolate bars meet these two conditions.
Assume the first part of your code is:
best_trimmed_flavors_df <- trimmed_flavors_df %>%
You want to apply the filter() function to the variables Cocoa.Percent and Rating. Add the code chunk that lets you filter the data frame for chocolate bars that contain at least 75% cocoa and have a rating of at least 3.9 points.
filter(Cocoa.Percent >= 75, Rating >= 3.9)
Output:
# A tibble: 20 <U+00D7> 3
Rating Cocoa.Percent Company.Location
<dbl> <chr> <chr>
1 4 75% Italy
2 4 75% France
3 4 75% France
4 4 75% France
5 4 75% France
6 4 75% France
7 4 75% France
8 4 75% France
9 4 75% France
10 4 75% Sao Tome
11 4 75% Scotland
12 4 75% U.S.A.
13 4 75% France
14 4 75% France
15 4 80% France
16 4 75% U.S.A.
17 4 75% U.S.A.
18 4 78% U.S.A.
19 4 75% Canada
20 4 88% Canada
What value for cocoa percent appears in row 1 of your tibble?
A. 75%
B. 80%
C. 88%
D. 78%
The correct answer is A. 75%. Explain: The code chunk filter(Cocoa.Percent >= 75, Rating >= 3.9)
lets you filter the data frame for chocolate bars that contain at least 75% cocoa and have a rating of at least 3.9 points. The correct code is best_trimmed_flavors_df <- trimmed_flavors_df %>% filter(Cocoa.Percent >= 75, Rating >= 3.9)
. In this code chunk:
- The filter() function lets you filter your data frame based on specific criteria.
- Cocoa.Percent and Rating refer to the variables you want to filter.
- The >= operator signifies “greater than or equal to.”
- The new data frame will show all the values of Cocoa.Percent greater than or equal to 75, and all the values of Rating greater than or equal to 3.9.
The value 75% for cocoa percent appears in row 1 of your tibble.
Now that you’ve cleaned and organized your data, you’re ready to create some useful data visualizations. Your team assigns you the task of creating a series of visualizations based on requests from the Chocolate and Tea management team. You decide to use ggplot2 to create your visuals.
Assume your first line of code is:
ggplot(data = best_trimmed_flavors_df) +
You want to use the geom_bar() function to create a bar chart. Add the code chunk that lets you create a bar chart with the variable Company.Location on the x-axis.
geom_bar(mapping = aes(x = Company.Location))
Output:
How many bars does your bar chart display?
A. 4
B. 6
C. 5
D. 3
The correct answer is C. 5. Explain: You add the code chunk geom_bar(mapping = aes(x = Company.Location))
to create a bar chart with the variable Company.Location on the x-axis. The correct code is ggplot(data = best_trimmed_flavors_df) + geom_bar(mapping = aes(x = Company.Location))
. In this code chunk:
- geom_bar() is the geom function that uses bars to create a bar chart.
- Inside the parentheses of the aes() function, the code x = Company.Location maps the x aesthetic to the variable Company.Location.
- Company.Location will appear on the x-axis of the plot.
- By default, R will put a count of the variable Company.Location on the y-axis.
Your bar chart displays 5 bars.
Your bar chart reveals the locations that produce the highest rated chocolate bars. To get a better idea of the specific rating for each location, you’d like to highlight each bar.
Assume that you are working with the following code:
ggplot(data = best_trimmed_flavors_df) +
geom_bar(mapping = aes(x = Company.Location))
Add a code chunk to the second line of code to map the aesthetic fill to the variable Rating.
NOTE: the three dots (...) indicate where to add the code chunk.
geom_bar(mapping = aes(x = Company.Location, fill=Rating))
Output:
According to your bar chart, which two company locations produce the highest rated chocolate bars?
A. Canada and France
B. Amsterdam and France
C. Scotland and U.S.A.
D. Scotland and Canada
The correct answer is A. Canada and France. Explain: You add the code chunk fill = Rating
to the second line of code to map the aesthetic fill to the variable Rating. The correct code is ggplot(data = best_trimmed_flavors_df) + geom_bar(mapping = aes(x = Company.Location, fill = Rating))
. In this code chunk:
- Inside the parentheses of the aes() function, after the comma that follows x = Company.Location, write the aesthetic (fill), then an equals sign, then the variable (Rating).
- The specific rating of each location will appear as a specific color inside each bar of your bar chart.
On your visualization, the legend titled "Rating" shows the color coding for the variable Rating. Lighter blues correspond to higher ratings and darker blues correspond to lower ratings.
According to your bar chart, the two company locations that produce the highest rated chocolate bars are Canada and France.
Scenario 2, continued
A teammate creates a new plot based on the chocolate bar data. The teammate asks you to make some revisions to their code.
Assume your teammate shares the following code chunk:
ggplot(data = best_trimmed_flavors_df) +
geom_bar(mapping = aes(x = Cocoa.Percent)) +
What code chunk do you add to the third line to create wrap around facets of the variable Cocoa.Percent?
A. facet_wrap(Cocoa.Percent~)
B. facet_wrap(%>%Cocoa.Percent)
C. facet(=Cocoa.Percent)
D. facet_wrap(~Cocoa.Percent)
The correct answer is D. facet_wrap(~Cocoa.Percent)
. Explain: You write the code chunk facet_wrap(~Cocoa.Percent)
. In this code chunk:
-
facet_wrap()
is the function that lets you create wrap around facets of a variable. - Inside the parentheses of the
facet_wrap()
function, type a tilde symbol (~) followed by the name of the variable (Cocoa.Percent
).
Scenario 2, continued
Your team has created some basic visualizations to explore different aspects of the chocolate bar data. You’ve volunteered to add titles to the plots. You begin with a scatterplot.
Assume the first part of your code chunk is:
ggplot(data = trimmed_flavors_df) +
geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +
What code chunk do you add to the third line to add the title Suggested Chocolate to your plot?
A. labs(Suggested Chocolate)
B. labs(title = “Suggested Chocolate”)
C. labs(Suggested Chocolate = title)
D. labs <- "Suggested Chocolate"
The correct answer is B. labs(title = “Suggested Chocolate”)
. Explain: You write the code chunk labs(title = “Suggested Chocolate”)
. In this code chunk:
-
labs()
is the function that lets you add a title to your plot. - In the parentheses of the labs() function, write the word title, then an equals sign, then the specific text of the title in quotation marks
(“Suggested Chocolate”)
.
Scenario 2, continued
Next, you create a new scatterplot to explore the relationship between different variables. You want to save your plot so you can access it later on. You know that the ggsave() function defaults to saving the last plot that you displayed in RStudio, so you’re ready to write the code to save your scatterplot.
Assume your first two lines of code are:
ggplot(data = trimmed_flavors_df) +
geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +
What code chunk do you add to the third line to save your plot as a jpeg file with chocolate as the file name?
A. ggsave(“chocolate.jpeg”)
B. ggsave(“jpeg.chocolate”)
C. ggsave(“chocolate.png”)
D. ggsave(chocolate.jpeg)
The correct answer is A. ggsave(“chocolate.jpeg”)
. Explain: You add the code chunk ggsave(“chocolate.jpeg”)
to save your plot as a jpeg file with “chocolate” as the file name. In this code chunk:
- Inside the parentheses of the ggsave() function, type a quotation mark followed by the file name (chocolate), then a period, then the type of file format (jpeg), then a closing quotation mark.
Scenario 2, continued
As a final step in the analysis process, you create a report to document and share your work. Before you share your work with the management team at Chocolate and Tea, you are going to meet with your team and get feedback. Your team wants the documentation to include all your code and display all your visualizations.
Fill in the blank: You want to record and share every step of your analysis, let teammates run your code, and display your visualizations. You decide to create _____ to document your work.
A. a database
B. a data frame
C. an R Markdown notebook
D. a spreadsheet
Explain: You use an R Markdown notebook to document your work. The notebook lets you record and share every step of your analysis, lets your teammates run your code, and displays your visualizations.
Congratulations on completing the seventh course in the Google Data Analytics Certificate!
To make continuing with the program easy, go to the next course by clicking this link: Google Data Analytics Capstone: Complete a Case Study .
Keep up the great work; you're almost at the finish line!