7.3.4.Weekly challenge 3 - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki
We’ve covered a lot of terms—some of which you may have already known, and some of which are new. To make it easy to remember what a word means, we created this glossary of terms and definitions.
To use the glossary for this course item, click the link below and select “Use Template.”
Link to glossary: Week 3 Glossary
OR
If you don’t have a Google account, you can download the glossary directly from the attachment below.
Course 7 Week 3 Glossary _ DA terms and definitions
A data analyst creates a data frame with data that has more than 50,000 observations in it. When they print their data frame, it slows down their console. To avoid this, they decide to switch to a tibble. Why would a tibble be more useful in this situation?
A. Tibbles only include a limited number of data items
B. Tibbles will automatically create row names to make the data easier to read
C. Tibbles will automatically change the names of variables to make them shorter and easier to read
D. Tibble_s won’t overload the console because they automatically only print the first 10 rows of data and as many variables as will fit on the screen_
The correct answer is D. Tibbles won’t overload the console because they automatically only print the first 10 rows of data and as many variables as will fit on the screen
A data analyst wants a high level summary of the structure of their data frame, including the column names, the number of rows and variables, and type of data within a given column. What function should they use?
A. head()
B. str()
C. colnames()
D. rename_with()
The correct answer is B. str()
You are working with the ToothGrowth dataset. You want to use the skim_without_charts() function to get a comprehensive view of the dataset. Write the code chunk that will give you this view.
skim_without_charts(ToothGrowth)
Output:
__ Data Summary _____________________________
Values
Name ToothGrowth
Number of rows 60
Number of columns 3
_______________________
Column type frequency:
factor 1
numeric 2
________________________
Group variables None
__ Variable type: factor ________________________________________________________
skim_variable n_missing complete_rate ordered n_unique top_counts
1 supp 0 1 FALSE 2 OJ: 30, VC: 30
__ Variable type: numeric _______________________________________________________
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
1 len 0 1 18.8 7.65 4.2 13.1 19.2 25.3 33.9
How many rows does the ToothGrowth dataset contain?
A. 50
B. 25
C. 60
D. 40
The correct answer is C. 60. Explain: The code chunk skim_without_charts(ToothGrowth)
gives you a comprehensive view of the dataset. Inside the parentheses of the skim_without_charts() function is the name of the dataset you want to view. The code returns a summary with the name of the dataset and the number of rows and columns. It also shows the column types and data types contained in the dataset. The ToothGrowth dataset contains 60 rows.
A data analyst is working with a data frame named sales. They write the following code:
sales %>%
The data frame contains a column named q1_sales. What code chunk does the analyst add to change the name of the column from q1_sales to quarter1_sales?
A. rename(quarter1_sales <- “q1_sales”)
B. rename(q1_sales <- “quarter1_sales”)
C. rename(quarter1_sales = q1_sales)
D. rename(q1_sales == quarter1_sales)
The correct answer is C. rename(quarter1_sales = q1_sales)
A data analyst is working with the penguins data. The analyst wants to sort the data by flipper_length_m from longest to shortest. What code chunk will allow them to sort the data in the desired order?
A. penguins %>% arrange(flipper_length_mm, desc=FALSE)
B. penguins %>% arrange(flipper_length_mm, desc=TRUE)
C. penguins %>% arrange(flipper_length_mm)
D. penguins %>% arrange(-flipper_length_mm)
The correct answer is D. penguins %>% arrange(-flipper_length_mm)
You are working with the penguins dataset. You want to use the summarize() and min() functions to find the minimum value for the variable bill_depth_mm. At this point, the following code has already been written into the script:
penguins %>%
drop_na() %>%
group_by(species) %>%
Add the code chunk that lets you find the minimum value for the variable bill_depth_mm.
(Note: do not type the above code into the code block editor, as it has already been inputted. Simply add a single line of code based on the prompt.)
summarize(min(bill_depth_mm))
Output:
# A tibble: 3 <U+00D7> 2
species `min(bill_depth_mm)`
<chr> <dbl>
1 Adelie 15.5
2 Chinstrap 16.4
3 Gentoo 13.1
What is the minimum bill depth in mm for the Chinstrap species?
A. 12.4
B. 15.5
C. 13.1
D. 16.4
The correct answer is D. 16.4. Explain: The code chunk summarize(min(bill_depth_mm))
lets you find the minimum value for the variable bill_depth_mm. The correct code is penguins %>% drop_na() %>% group_by(species) %>% summarize(min(bill_depth_mm))
. The summarize() function displays summary statistics. You can use the summarize() function in combination with other functions -- such as mean(), max(), and min() -- to calculate specific statistics. In this case, you use min() to calculate the minimum value for bill depth. The minimum bill depth for the Chinstrap species is 16.4mm.
A data analyst is working with a data frame called athletes. The data frame contains a column names record that represents an athlete's wins and losses separated by a hyphen (-). They want to turn this single column into individual columns for wins and losses. Which code chunk lets the analyst split the record column?
A. separate(athletes, record, into=c("wins”, “losses”), sep="-")
B. separate(record, athletes, into=c("wins”, “losses”), sep="-")
C. separate(record, athletes, into=c("wins”, “losses”), delim="-")
D. separate(athletes, record, into=c("wins”, “losses”), delim="-")
The correct answer is A. separate(athletes, record, into=c("wins”, “losses”), sep="-")
A data analyst is working with a data frame named retail. It has separate columns for dollars (price_dollars) and cents (price_cents). The analyst wants to combine the two columns into a single column named price, with the dollars and cents separated by a decimal point. For example, if the value in the price_dollars column is 10, and the value in the price_cents column is 50, the value in the pricecolumn will be 10.50. What code chunk lets the analyst create the price column?
A. unite(retail, “price”, price_dollars, price_cents)
B. unite(retail, price_dollars, price_cents, sep=”.”)
C. unite(retail, “price”, price_dollars, price_cents, sep=”.”)
D. unite(retail, “price”, price_cents, sep=”.”)
The correct answer is C. unite(retail, “price”, price_dollars, price_cents, sep=”.”)
A data analyst is using statistical measures to get a better understanding of their data. What function can they use to determine how strongly related are two of the variables?
A. bias()
B. cor()
C. sd()
D. mean()
The correct answer is B. cor()
A data analyst is studying weather data. They write the following code chunk:
bias(actual_temp, predicted_temp)
What will this code chunk calculate?
A. The maximum difference between the actual and predicted values
B. The minimum difference between the actual and predicted values
C. The total average of the values
D. The average difference between the actual and predicted values
The correct answer is D. The average difference between the actual and predicted values
A data analyst is working with a dataset in R that has more than 50,000 observations. Why might they choose to use a tibble instead of the standard data frame? Select all that apply.
- Tibbles automatically only preview the first 10 rows of data
- Tibbles can automatically change the names of variables
- Tibbles automatically only preview as many columns as fit on screen
- Tibbles can create row names
A data analyst wants a high level summary of the structure of their data frame, including the column names, the number of rows and variables, and type of data within a given column. What function should they use?
A. colnames()
B. head()
C. str()
D. rename_with()
The correct answer is C. str()
You are working with the ToothGrowth dataset. You want to use the select()
function to view all columns except the supp column. Write the code chunk that will give you this view.
How many columns does the resulting data frame contain?
A. 2
B. 4
C. 3
D. 1
The correct answer is A. 2
You have a data frame named employees with a column named last_name. What will the name of the employees column be in the results of the function rename_with(employees, toupper)
?
A. Last_Name
B. Last_name
C. last_name
D. LAST_NAME
The correct answer is D. LAST_NAME
A data analyst is working with the penguins data. The analyst wants to sort the data by flipper_length_m from longest to shortest. What code chunk will allow them to sort the data in the desired order?
A. penguins %>% arrange(-flipper_length_mm)
B. penguins %>% arrange(flipper_length_mm)
C. penguins %>% arrange(flipper_length_mm, desc=FALSE)
D. penguins %>% arrange(flipper_length_mm, desc=TRUE)
The correct answer is A. penguins %>% arrange(-flipper_length_mm)
You are working with the penguins dataset. You want to use the summarize() and mean() functions to find the mean value for the variable body_mass_g. At this point, the following code has already been written into your script:
penguins %>%
drop_na() %>%
group_by(species) %>%
Add the code chunk that lets you find the mean value for the variable body_mass_g.
(Note: do not type the above code into the code block editor, as it has already been inputted. Simply add a single line of code based on the prompt.)
summarize(mean(body_mass_g))
What is the mean body mass in g for the Adelie species?
A. 5092.437
B. 3733.088
C. 4207.433
D. 3706.164
Explain: The code chunk summarize(mean(body_mass_g))
lets you find the mean value for the variable body_mass_g. The correct code is penguins %>% drop_na() %>% group_by(species) %>% summarize(mean(body_mass_g))
. The summarize() function displays summary statistics. You can use the summarize() function in combination with other functions -- such as mean(), max(), and min() -- to calculate specific statistics. In this case, you use mean() to calculate the mean value for body mass. The mean body mass for the Adelie species is 3706.164g.
A data analyst is working with a data frame named salary_data. They want to create a new column named wages that includes data from the rate column multiplied by 40. What code chunk lets the analyst create the wages column?
A. mutate(wages = rate * 40)
B. mutate(salary_data, rate = wages * 40)
C. mutate(salary_data, wages = rate * 40)
D. mutate(salary_data, wages = rate + 40)
The correct answer is C. mutate(salary_data, wages = rate * 40)
A data analyst is working with a data frame named weather. It has separate columns for temperatures (temp) and measurement units (unit). The analyst wants to combine the two columns into a single column called display_temp, with the temperature and unit separated by the string “ Degrees “. What code chunk lets the analyst create the display_temp column?
A. weather %>% unite(weather, "display_temp", weather, temp, delim = " Degrees ")
B. unite(" Degrees ", weather, temp, "display_temp")
C. unite(weather, "display_temp", temp, unit, sep = " Degrees ")
D. weather %>% unite(" Degrees ", weather, temp, "display_temp")
The correct answer is C. unite(weather, "display_temp", temp, unit, sep = " Degrees ")
A data analyst is using statistical measures to get a better understanding of their data. What function can they use to determine how strongly related are two of the variables?
A. cor()
B. sd()
C. mean()
D. bias()
The correct answer is A. cor()
A data analyst wants to check the average difference between the actual and predicted values of a model. What single function can they use to calculate this statistic?
A. sd()
B. cor()
C. bias()
D. mean()
The correct answer is C. bias()
What is an advantage of using data frames instead of tibbles?
A. Data frames store never change variable names
B. Data frames allow you to create row names
C. Data frames make printing easier
D. Data frames allow you to use column names
The correct answer is B. Data frames allow you to create row names
A data analyst wants to learn more about a specific data frame. Which function will allow them to review the data types of each column in the data frame?
A. package()
B. str()
C. library()
D. colnames()
The correct answer is B. str()
You are working with the ToothGrowth dataset. You want to use the head()
function to get a preview of the dataset. Write the code chunk that will give you this preview.
head(ToothGrowth)
Output:
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
What are the names of the columns in the ToothGrowth dataset?
A. VC, supp, dose
B. len, supp, VC
C. len, VC, dose
D. len, supp, dose
The correct answer is D. len, supp, dose. Explain: The code chunk head(ToothGrowth)
gives you a preview of the dataset. Inside the parentheses of the head() function is the name of the dataset you want to preview. The code returns a view of the column names and the first few rows of the dataset. The names of the columns in the ToothGrowth dataset are len, supp, dose.
You are cleaning a data frame with improperly formatted column names. In order to clean the data frame you want to use the clean_names()
function. Which column names will be changed using the clean_names()
with default parameters? Select all that apply.
-
column 2
-
column.1
-
column_3
-
column4
A data analyst is working with the penguins data. The variable species includes three penguin species: Adelie, Chinstrap, and Gentoo. The analyst wants to create a data frame that only includes the Adelie species. The analyst receives an error message when they run the following code:
penguins %>%
filter(species <- “Adelie”)
How can the analyst change the second line of code to correct the error?
A. filter(Adelie == species)
B. filter(species == “Adelie”)
C. filter(“Adelie” <- species)
D. filter(“Adelie”)
The correct answer is B. filter(species == “Adelie”)
You are working with the penguins dataset and want to understand the year of data collection for all combinations of species, island, and sex. At this point, the following code has already been written into your script:
penguins %>%
drop_na() %>%
group_by(species) %>%
summarize(min = min(year), max = max(year))
When you run the code in the code box, how many separate observational rows are returned by this code chunk?
A. 2
B. 10
C. 3
D. 6
The correct answer is D. 6
A data analyst is working with a data frame called athletes. The data frame contains a column names record that represents an athlete's wins and losses separated by a hyphen (-). They want to turn this single column into individual columns for wins and losses. Which code chunk lets the analyst split the record column?
A. separate(record, athletes, into=c("wins”, “losses”), delim="-")
B. separate(athletes, record, into=c("wins”, “losses”), delim="-")
C. separate(athletes, record, into=c("wins”, “losses”), sep="-")
D. separate(record, athletes, into=c("wins”, “losses”), sep="-")
The correct answer is C. separate(athletes, record, into=c("wins”, “losses”), sep="-")
A data analyst is working with a data frame named retail. It has separate columns for dollars (price_dollars) and cents (price_cents). The analyst wants to combine the two columns into a single column named price, with the dollars and cents separated by a decimal point. For example, if the value in the price_dollars column is 10, and the value in the price_cents column is 50, the value in the price column will be 10.50. What code chunk lets the analyst create the price column?
A. unite(retail, “price”, price_dollars, price_cents, sep=”.”)
B. unite(retail, “price”, price_dollars, price_cents)
C. unite(retail, price_dollars, price_cents, sep=”.”)
D. unite(retail, “price”, price_cents, sep=”.”)
The correct answer is A. unite(retail, “price”, price_dollars, price_cents, sep=”.”)
In R, which statistical measure can help you understand the spread of values in a dataset and describe how far each value is from the mean?
A. Average
B. Maximum
C. Standard deviation
D. Correlation
The correct answer is C. Standard deviation
A data analyst creates two different predictive models for the same dataset. They use the bias()
function on both models. The first model has a bias of -40. The second model has a bias of 1. Which model is less biased?
A. It can’t be determined from this information
B. The first model
C. The second model
The correct answer is C. The second model
A data analyst is working with a dataset in R that has more than 50,000 observations. Why might they choose to use a tibble instead of the standard data frame? Select all that apply.
- Tibbles automatically only preview the first 10 rows of data
- Tibbles can automatically change the names of variables
- Tibbles automatically only preview as many columns as fit on screen
- Tibbles can create row names
A data analyst is exploring their data to get more familiar with it. They want a preview of just the first six rows to get a better idea of how the data frame is laid out. What function should they use?
A. print()
B. head()
C. preview()
D. colnames()
The correct answer is B. head()
You are working with the ToothGrowth dataset. You want to use the glimpse() function to get a quick summary of the dataset. Write the code chunk that will give you this summary.
glimpse(ToothGrowth)
output:
Observations: 60
Variables: 3
$ len <dbl> 4.2, 11.5, 7.3, 5.8, 6.4, 10.0, 11.2, 11.2, 5.2, 7.0, 16.5, 16...
$ supp <fctr> VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, V...
$ dose <dbl> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.0, 1.0, 1....
How many variables does the ToothGrowth dataset contain?
A. 3
B. 5
C. 4
D. 2
The correct answer is A. 3. Explain: The code chunk glimpse(ToothGrowth)
gives you a quick summary of the dataset. Inside the parentheses of the glimpse() function is the name of the dataset you want to view. The code returns a summary of the number of rows and columns in the dataset. It also shows the names of the columns and the type of data they contain. The ToothGrowth dataset contains 3 variables.
You have a data frame named employees with a column named Last_NAME. What will the name of the employees column be in the results of the function rename_with(employees, tolower)
?
A. last_nAME
B. Last_NAME
C. last_name
D. lAST_nAME
The correct answer is D. lAST_nAME
A data analyst is working with the penguins dataset and wants to sort the penguins by body_mass_g from least to greatest. When they run the following code the penguin body mass data is not displayed in the correct order.
penguins %>% arrange(body_mass_g)
head(penguins)
What can the data analyst do to fix their code?
A. Save the results of arrange()
to a variable that gets passed to head()
B. Add a minus sign in front of body_mass_g
to reverse the order
C. Correct the capitalization of arrange()
to Arrange()
D. Use the print()
function instead of the head()
function
You are working with the penguins dataset. You want to use the summarize() and min() functions to find the minimum value for the variable bill_depth_mm. At this point, the following code has already been written into the script:
penguins %>%
drop_na() %>%
group_by(species) %>%
Add the code chunk that lets you find the minimum value for the variable bill_depth_mm.
(Note: do not type the above code into the code block editor, as it has already been inputted. Simply add a single line of code based on the prompt.)
summarize(min(bill_depth_mm))
Output:
# A tibble: 3 <U+00D7> 2
species `min(bill_depth_mm)`
<chr> <dbl>
1 Adelie 15.5
2 Chinstrap 16.4
3 Gentoo 13.1
What is the minimum bill depth in mm for the Chinstrap species?
A. 16.4
B. 13.1
C. 15.5
D. 12.4
The correct answer is A. 16.4
A data analyst is working with a data frame called zoo_records. They want to create a new column named is_large_animal that signifies if an animal has a weight of more than 199 kilograms. What code chunk lets the analyst create the is_large_animal column?
A. zoo_records %>% mutate(weight > 199 <- is_large_animal)
B. zoo_records %>% mutate(weight > 199 = is_large_animal)
C. zoo_records %>% mutate(is_large_animal == weight > 199)
D. zoo_records %>% mutate(is_large_animal = weight > 199)
The correct answer is D. zoo_records %>% mutate(is_large_animal = weight > 199)
A data analyst is working with a data frame named retail. It has separate columns for dollars (price_dollars) and cents (price_cents). The analyst wants to combine the two columns into a single column named price, with the dollars and cents separated by a decimal point. For example, if the value in the price_dollars column is 10, and the value in the price_cents column is 50, the value in the price column will be 10.50. What code chunk lets the analyst create the price column?
A. unite(retail, “price”, price_cents, sep=”.”)
B. unite(retail, price_dollars, price_cents, sep=”.”)
C. unite(retail, “price”, price_dollars, price_cents)
D. unite(retail, “price”, price_dollars, price_cents, sep=”.”)
The correct answer is D. unite(retail, “price”, price_dollars, price_cents, sep=”.”)
You are compiling an analysis of the average monthly costs for your company. What summary statistic function should you use to calculate the average?
A. mean()
B. max()
C. min()
D. cor()
The correct answer is A. mean()
A data analyst creates two different predictive models for the same dataset. They use the bias()
function on both models. The first model has a bias of -40. The second model has a bias of 1. Which model is less biased?
A. The second model
B. It can’t be determined from this information
C. The first model
The correct answer is A. The second model
Passed 100%
A data analyst is considering using tibbles instead of basic data frames. What are some of the limitations of tibbles? Select all that apply.
- Tibbles can never create row names
- Tibbles can overload a console
- Tibbles won't automatically change the names of variables
- Tibbles can never change the input type of the data
A data analyst wants to learn more about a specific data frame. Which function will allow them to review the data types of each column in the data frame?
A. colnames()
B. library()
C. str()
D. package()
The correct answer is C. str()
You are working with the ToothGrowth dataset. You want to use the head() function to get a preview of the dataset. Write the code chunk that will give you this preview.
head(ToothGrowth)
Output:
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
What are the names of the columns in the ToothGrowth dataset?
A. VC, supp, dose
B. len, supp, dose
C. len, VC, dose
D. len, supp, VC
The correct answer is B. len, supp, dose. Explain: The code chunk head(ToothGrowth)
gives you a preview of the dataset. Inside the parentheses of the head() function is the name of the dataset you want to preview. The code returns a view of the column names and the first few rows of the dataset. The names of the columns in the ToothGrowth dataset are len, supp, dose.
A data analyst is working with a data frame named cars. The analyst notices that all the column names in the data frame are capitalized. What code chunk lets the analyst change all the column names to lowercase?
A. rename_with(toupper, cars)
B. rename_with(tolower, cars)
C. rename_with(cars, tolower)
D. rename_with(cars, toupper)
The correct answer is C. rename_with(cars, tolower)
A data analyst is working with the penguins dataset in R. What code chunk will allow them to sort the penguins data by the variable bill_length_mm?
A. arrange(penguins)
B. arrange(=bill_length_mm)
C. arrange(bill_length_mm, penguins)
D. arrange(penguins, bill_length_mm)
The correct answer is D. arrange(penguins, bill_length_mm)
You are working with the penguins dataset. You want to use the summarize() and mean() functions to find the mean value for the variable body_mass_g. At this point, the following code has already been written into your script:
penguins %>%
drop_na() %>%
group_by(species) %>%
Add the code chunk that lets you find the mean value for the variable body_mass_g.
(Note: do not type the above code into the code block editor, as it has already been inputted. Simply add a single line of code based on the prompt.)
summarize(mean(body_mass_g))
Output:
# A tibble: 3 <U+00D7> 2
species `mean(body_mass_g)`
<chr> <dbl>
1 Adelie 3706.164
2 Chinstrap 3733.088
3 Gentoo 5092.437
What is the mean body mass in g for the Adelie species?
A. 4207.433
B. 3733.088
C. 5092.437
D. 3706.164
The correct answer is D. 3706.164. Explain: The code chunk summarize(mean(body_mass_g))
lets you find the mean value for the variable body_mass_g. The correct code is penguins %>% drop_na() %>% group_by(species) %>% summarize(mean(body_mass_g))
. The summarize() function displays summary statistics. You can use the summarize() function in combination with other functions -- such as mean(), max(), and min() -- to calculate specific statistics. In this case, you use mean() to calculate the mean value for body mass. The mean body mass for the Adelie species is 3706.164g.
A data analyst is working with a data frame called salary_data. They want to create a new column named hourly_salary that includes data from the wages column divided by 40. What code chunk lets the analyst create the hourly_salary column?
A. mutate(hourly_salary, salary_data = wages / 40)
B. mutate(hourly_salary = wages / 40)
C. mutate(salary_data, hourly_salary = wages / 40)
D. mutate(salary_data, hourly_salary = wages * 40)
The correct answer is C. mutate(salary_data, hourly_salary = wages / 40)
A data analyst is working with a data frame named weather. It has separate columns for temperatures (temp) and measurement units (unit). The analyst wants to combine the two columns into a single column called display_temp, with the temperature and unit separated by the string “ Degrees “. What code chunk lets the analyst create the display_temp column?
A. unite(" Degrees ", weather, temp, "display_temp")
B. weather %>% unite(weather, "display_temp", weather, temp, delim = " Degrees ")
C. weather %>% unite(" Degrees ", weather, temp, "display_temp")
D. unite(weather, "display_temp", temp, unit, sep = " Degrees ")
The correct answer is D. unite(weather, "display_temp", temp, unit, sep = " Degrees ")
You are compiling an analysis of the average monthly costs for your company. What summary statistic function should you use to calculate the average?
A. min()
B. cor()
C. mean()
D. max()
The correct answer is C. mean()
A data analyst creates two different predictive models for the same dataset. They use the bias()
function on both models. The first model has a bias of 20. The second model has a bias of 0.1. Which model is less biased?
A. It can’t be determined from this information
B. The second model
C. The first model
The correct answer is B. The second model