4.3.5.Video quiz - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki

Understanding SQL capabilities

SQL and spreadsheets process large amounts of data at the same speed. True or False?

A. True

B. False

It is false statement. Explain: SQL can process large amounts of data much more quickly than spreadsheets. This is one of the reasons why data analysts use SQL when working with vast, complex datasets.

Spreadsheets versus SQL

A team of analysts is working on a data analytics project. How could data in a SQL database be more useful to the team than data in spreadsheets? Select all that apply.

  • They can use SQL to make working with smaller datasets easier
  • They can track changes to SQL queries across the team.
  • They can use SQL to pull information from the database at the same time.
  • They can use SQL to interact with the database program.

Explain: Data stored in a SQL database is useful to a project with multiple team members because they can access the data at the same time, use SQL to interact with the database program, and track changes to SQL queries across the team.

Widely used SQL queries

The media_types table contains the following columns: MediaTypeId and Name.

Create a query to return the Name column from this table.

SELECT Name FROM media_types;

What is the media type name of the 1st result returned? (Enter the exact name that appears in the table.)

Results:

+-----------------------------+
| Name                        |
+-----------------------------+
| MPEG audio file             |
| Protected AAC audio file    |
| Protected MPEG-4 video file |
| Purchased AAC audio file    |
| AAC audio file              |
+-----------------------------+

MPEG audio file is the media type name of the 1st result returned when making the following query: SELECT Name FROM media_types;

Heads up! If you are following along in this video using a BigQuery sandbox account, you won't be able to use INSERT INTO or UPDATE in a query, but you can continue watching the steps to understand how to perform these queries.

Optional procedure below:

If you would like to continue following along, you may set up a free trial account which supports queries to update data in tables.

Important note: If you want to continue using your BigQuery sandbox account after your free trial ends, when you set up your free trial account, use a different gmail account. Otherwise, your sandbox projects will be migrated to the billable account and you will no longer have access to the sandbox.

To set up a free trial account, follow the step-by-step instructions referenced in the Using BigQuery reading.

The free trial requires a method of payment to set up and offers a $300 credit for use in Google Cloud. After 90 days, your free trial will expire and you will need to personally select to upgrade to a paid account to keep using this BigQuery account. Your method of payment will never be automatically charged after your free trial ends.

The customers table contains the following columns: CustomerId, FirstName, LastName, Company, Address, City, State, Country, PostalCode, Phone, Fax, Email, SupportRepId.

Create a query to return the LastName and Country columns from this table for only customers in Germany. (Note: contrary to syntax typed within BigQuery from the video, the parameter value within the WHERE argument should be contained between two ticks, i.e. ' value' for the code block below)

SELECT LastName, Country FROM customers WHERE Country = 'Germany';

What is the last name of the customer for the 3rd result returned? (It is important the name is spelled correctly)

Results:

+------------+---------+
| LastName   | Country |
+------------+---------+
| Köhler     | Germany |
| Schneider  | Germany |
| Zimmermann | Germany |
| Schröder   | Germany |
+------------+---------+

Explain: Great job. Zimmermann is the last name of the customer for the 3rd result returned when making the following query:

SELECT LastName,Country FROM customers WHERE Country = 'Germany';

Cleaning string variables using SQL

If adding DISTINCT to the query didn't make sense, you may need a refresher on how to write a query. Try the following.

The tracks table contains the following columns: TrackId, Name, AlbumId, MediaTypeId, GenreId, Composer, Milliseconds, Bytes, and UnitPrice.

Create a query to return the TrackId, AlbumId, Composer and UnitPrice columns from this table.

SELECT TrackId, AlbumId, Composer, UnitPrice FROM tracks;

Results:

+---------+---------+------------------------------------------------------------------------+-----------+
| TrackId | AlbumId | Composer                                                               | UnitPrice |
+---------+---------+------------------------------------------------------------------------+-----------+
|       1 |       1 | Angus Young, Malcolm Young, Brian Johnson                              |      0.99 |
|       2 |       2 | None                                                                   |      0.99 |
|       3 |       3 | F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman                    |      0.99 |
|       4 |       3 | F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman |      0.99 |
|       5 |       3 | Deaffy & R.A. Smith-Diesel                                             |      0.99 |
|       6 |       1 | Angus Young, Malcolm Young, Brian Johnson                              |      0.99 |
|       7 |       1 | Angus Young, Malcolm Young, Brian Johnson                              |      0.99 |
|       8 |       1 | Angus Young, Malcolm Young, Brian Johnson                              |      0.99 |
|       9 |       1 | Angus Young, Malcolm Young, Brian Johnson                              |      0.99 |
|      10 |       1 | Angus Young, Malcolm Young, Brian Johnson                              |      0.99 |
|      11 |       1 | Angus Young, Malcolm Young, Brian Johnson                              |      0.99 |
|      12 |       1 | Angus Young, Malcolm Young, Brian Johnson                              |      0.99 |
|      13 |       1 | Angus Young, Malcolm Young, Brian Johnson                              |      0.99 |
|      14 |       1 | Angus Young, Malcolm Young, Brian Johnson                              |      0.99 |
|      15 |       4 | AC/DC                                                                  |      0.99 |
|      16 |       4 | AC/DC                                                                  |      0.99 |
|      17 |       4 | AC/DC                                                                  |      0.99 |
|      18 |       4 | AC/DC                                                                  |      0.99 |
|      19 |       4 | AC/DC                                                                  |      0.99 |
|      20 |       4 | AC/DC                                                                  |      0.99 |
|      21 |       4 | AC/DC                                                                  |      0.99 |
|      22 |       4 | AC/DC                                                                  |      0.99 |
|      23 |       5 | Steven Tyler, Joe Perry, Jack Blades, Tommy Shaw                       |      0.99 |
|      24 |       5 | Steven Tyler, Joe Perry                                                |      0.99 |
|      25 |       5 | Steven Tyler, Joe Perry, Jim Vallance, Holly Knight                    |      0.99 |
+---------+---------+------------------------------------------------------------------------+-----------+
(Output limit exceeded, 25 of 3503 total rows shown)

What is the AlbumId of the 6th track? Please respond using a numeric value, i.e. enter '4' and not 'four'.

AlbumId of the 6th track is number 1. Explain: One is the AlbumId of the 6th track returned when making the following query:

SELECT TrackId, AlbumId, Composer, UnitPrice FROM tracks;

In a query, if you use the LENGTH, SUBSTR, or TRIM function in a WHERE clause, you can select data based on a string condition. If you are having trouble with these, you may need a refresher on how you created a WHERE clause with a numeric condition. The concept is the same, so try the following:

The invoices table contains the following columns: InvoiceId, CustomerId, InvoiceDate, BillingAddress, BillingCity, BillingState, BillingCountry, BillingPostalCode, Total.

Create a query to return the CustomerId, InvoiceDate and Total columns from this table for only invoice totals over $20.

SELECT CustomerId,InvoiceDate,Total FROM invoices WHERE Total > 20;

How many results are returned?

There are 4 results are returned:

+------------+---------------------+-------+
| CustomerId | InvoiceDate         | Total |
+------------+---------------------+-------+
|         45 | 2010-02-18 00:00:00 | 21.86 |
|         46 | 2011-04-28 00:00:00 | 21.86 |
|         26 | 2012-08-05 00:00:00 | 23.86 |
|          6 | 2013-11-13 00:00:00 | 25.86 |
+------------+---------------------+-------+

Four results are returned when making the following query:

SELECT CustomerId,InvoiceDate,Total FROM invoices WHERE Total > 20;

Advanced data cleaning functions, part 1

Note: In this case of the advanced data cleaning function, CAST(), it was used instead of the UPDATE() function due to the data type of the 'purchase_price' column being mislabeled.

It's also important to note that creating queries in SQL is a non-destructive way to manipulate datasets for data cleaning purposes, and it's good practice to retain the dataset document in its original form (or create back-ups). Errors can happen when manipulating original data sets, so having an original copy of the data set will save the data analyst a lot of extra work.

Advanced data-cleaning functions, part 2

Data analysts use which function to add strings together and create new text strings for unique keys?

A. LENGTH

B. TRIM

C. CONCAT

D. CAST

The correct answer is C. CONCAT. Explain: Data analysts use the CONCAT function to add strings together and create new text strings for unique keys.