5.3.2.Use JOINS to aggregate data in SQL - quanganh2001/Google-Data-Analytics-Professional-Certificate-Coursera GitHub Wiki

Optional: Upload the employee dataset to BigQuery

The next video demonstrates how to use JOINS to merge and return data from two tables based on a common attribute used in both tables.

If you would like to follow along with the instructor, you will need to log in to your BigQuery account and upload the employee data provided as two CSV files. If you have hopped around courses, Using BigQuery in the Prepare Data for Exploration course covers how to set up a BigQuery account.

Prepare for the next video

  • First, download the CSV files from the attachments below:

Employees Table - Understanding JOINS

Departments Table - Understanding JOINS

  • Next, complete the following steps in your BigQuery console to upload the employees and departments tables.

Step 1: Open your BigQuery console and click on the project you want to upload the data to.

Step 2: In the Explorer on the left, click the Actions icon (three vertical dots) next to your project name and select Create dataset.

pYj9p7XFS-uI_ae1xYvr6w_05852f3ecf20415fba255e844eff49f1_actions_icon_create_dataset

Step 3: Enter employee_data for the Dataset ID.

Step 4: Click CREATE DATASET (blue button) to add the dataset to your project.

Step 5: In the Explorer on the left, click to expand your project, and then click the employee_data dataset you just created.

Step 6: Click the Actions icon (three vertical dots) next to employee_data and select Open.

Step 7: Click the blue + icon at the top right to open the Create table window.

gwqLOvPyQo2Kizrz8tKNDA_2acc3838a6ed418596c2828d5a007df1_create_table_icon

Step 8: Under Source, for the Create table from selection, choose where the data will be coming from.

  • Select Upload.
  • Click Browse to select the Employees Table CSV file you downloaded.
  • Choose CSV from the file format drop-down.

Step 9: For Table name, enter employees if you plan to follow along with the video.

Step 10: For Schema, click the Auto detect check box.

Step 11: Click Create table (blue button). You will now see the employees table under your employee_data dataset in your project.

Step 12: Click the employee_data dataset again.

Step 13: Click the icon to open the Create table window again.

Step 14: Under Source, for the Create table from selection, choose where the data will be coming from.

  • Select Upload.
  • Click Browse to select the Departments Table CSV file you downloaded.
  • Choose CSV from the file format drop-down.

Step 15: For Table name, enter departments if you plan to follow along with the video.

Step 16: For Schema, click the Auto detect check box.

Step 17: Click Create table (blue button). You will now see the departments table under your employee_data dataset in your project.

Step 18: Click the employees table and click the Preview tab to verify that you have the data shown below.

UDs6GVPPQcK7OhlTz4HCgA_cacfe707cd0041708715da1eb56276f1_employees

Step 19: Click the departments table and click the Preview tab to verify that you have the data shown below.

UfshmGpYRMi7IZhqWLTIEg_2b5b26aa82134731b6b555b5bb73c1f1_departments

If your data previews match, you are ready to follow along with the next video.

Secret identities: The importance of aliases

In this reading, you will learn about using aliasing to simplify your SQL queries. Aliases are used in SQL queries to create temporary names for a column or table. Aliases make referencing tables and columns in your SQL queries much simpler when you have table or column names that are too long or complex to make use of in queries. Imagine a table name like special_projects_customer_negotiation_mileages. That would be difficult to retype every time you use that table. With an alias, you can create a meaningful nickname that you can use for your analysis. In this case “special_projects_customer_negotiation_mileages” can be aliased to simply “mileage.” Instead of having to write out the long table name, you can use a meaningful nickname that you decide.

Basic syntax for aliasing

Aliasing is the process of using aliases. In SQL queries, aliases are implemented by making use of the AS command. The basic syntax for the AS command can be seen in the following query for aliasing a table:

SELECT column_name(s)
FROM table_name AS alias_name;

Notice that AS is preceded by the table name and followed by the new nickname. It is a similar approach to aliasing a column:

SELECT column_name AS alias_name
FROM table_name;

In both cases, you now have a new name that you can use to refer to the column or table that was aliased.

Alternate syntax for aliases If using AS results in an error when running a query because the SQL database you are working with doesn't support it, you can leave it out. In the previous examples, the alternate syntax for aliasing a table or column would be:

  • FROM table_name alias_name
  • SELECT column_name alias_name

The key takeaway is that queries can run with or without using AS for aliasing, but using AS has the benefit of making queries more readable. It helps to make aliases stand out more clearly.

Aliasing in action

Let’s check out an example of a SQL query that uses aliasing. Let’s say that you are working with two tables: one of them has employee data and the other one has department data. The FROM statement to alias those tables could be:

FROM work_day.employees AS employees

These aliases still let you know exactly what is in these tables, but now you don’t have to manually input those long table names. Aliases can be really helpful for long, complicated queries. It is easier to read and write your queries when you have aliases that tell you what is included within your tables.

For more information

If you are interested in learning more about aliasing, here are some resources to help you get started:

  • SQL Aliases : This tutorial on aliasing is a really useful resource to have when you start practicing writing queries and aliasing tables on your own. It also demonstrates how aliasing works with real tables.
  • SQL Alias : This detailed introduction to aliasing includes multiple examples. This is another great resource to reference if you need more examples.
  • Using Column Aliasing: This is a guide that focuses on column aliasing specifically. Generally, you will be aliasing entire tables, but if you find yourself needing to alias just a column, this is a great resource to have bookmarked.

Using JOINs effectively

In this reading, you will review how JOINs are used and will be introduced to some resources that you can use to learn more about them. A JOIN combines tables by using a primary or foreign key to align the information coming from both tables in the combination process. JOINs use these keys to identify relationships and corresponding values across tables.

If you need a refresher on primary and foreign keys, refer to the glossary for this course, or go back to Databases in data analytics .

The general JOIN syntax

SELECT
  -- table columns from tables are inserted here
  table_name1.column_name
  table_name2.column_name
FROM
   table_name1
JOIN
   table_name2
ON table_name1.column_name = table_name2.column_name

As you can see from the syntax, the JOIN statement is part of the FROM clause of the query. JOIN in SQL indicates that you are going to combine data from two tables. ON in SQL identifies how the tables are to be matched for the correct information to be combined from both.

Type of JOINs

There are four general ways in which to conduct JOINs in SQL queries: INNER, LEFT, RIGHT, and FULL OUTER.

INNER JOIN

INNER is optional in this SQL query because it is the default as well as the most commonly used JOIN operation. You may see this as JOIN only. INNER JOIN returns records if the data lives in both tables. For example, if you use INNER JOIN for the 'customers' and 'orders' tables and match the data using the customer_id key, you would combine the data for each customer_id that exists in both tables. If a customer_id exists in the customers table but not the orders table, data for that customer_id isn’t joined or returned by the query.

SELECT
  customers.customer_name,
  orders_product_id,
  orders.ship_date
FROM
  customers
INNER JOIN
  orders
ON customers.customer_id = orders_customer_id

The results from the query might look like the following, where customer_name is from the customers table and product_id and ship_date are from the orders table:

customer_name product_id ship_date
Martin's Ice Cream 043998 2021-02-23
Beachside Treats 872012 2021-02-25
Mona's Natural Flavors 724956 2021-02-28
... etc. ... etc. ... etc.

The data from both tables was joined together by matching the customer_id common to both tables. Notice that customer_id doesn’t show up in the query results. It is simply used to establish the relationship between the data in the two tables so the data can be joined and returned.

LEFT JOIN

You may see this as LEFT OUTER JOIN, but most users prefer LEFT JOIN. Both are correct syntax. LEFT JOIN returns all the records from the left table and only the matching records from the right table. Use LEFT JOIN whenever you need the data from the entire first table and values from the second table, if they exist. For example, in the query below, LEFT JOIN will return customer_name with the corresponding sales_rep, if it is available. If there is a customer who did not interact with a sales representative, that customer would still show up in the query results but with a NULL value for sales_rep.

SELECT
   customers.customer_name,
   sales.sales_rep
FROM
   customers
LEFT JOIN
   sales
ON customers.customer_id = sales_customer_id

The results from the query might look like the following where customer_name is from the customers table and sales_rep is from the sales table. Again, the data from both tables was joined together by matching the customer_id common to both tables even though customer_id wasn't returned in the query results.

customer_name sales_rep
Martin's Ice Cream Luis Reyes
Beachside Treats NULL
Mona's Natural Flavors Geri Hall
... etc. ... etc.

RIGHT JOIN

You may see this as RIGHT OUTER JOIN or RIGHT JOIN. RIGHT JOIN returns all records from the right table and the corresponding records from the left table. Practically speaking, RIGHT JOIN is rarely used. Most people simply switch the tables and stick with LEFT JOIN. But using the previous example for LEFT JOIN, the query using RIGHT JOIN would look like the following:

SELECT
   sales.sales_rep,
   customers.customer_name
FROM
   sales
RIGHT JOIN
   customers
ON sales.customer_id = customers.customer_id

The query results are the same as the previous LEFT JOIN example.

customer_name sales_rep
Martin's Ice Cream Luis Reyes
Beachside Treats NULL
Mona's Natural Flavors Geri Hall
... etc. ... etc.

FULL OUTER JOIN

You may sometimes see this as FULL JOIN. FULL OUTER JOIN returns all records from the specified tables. You can combine tables this way, but remember that this can potentially be a large data pull as a result. FULL OUTER JOIN returns all records from both tables even if data isn’t populated in one of the tables. For example, in the query below, you will get all customers and their products’ shipping dates. Because you are using a FULL OUTER JOIN, you may get customers returned without corresponding shipping dates or shipping dates without corresponding customers. A NULL value is returned if corresponding data doesn’t exist in either table.

SELECT
   customers.customer_name,
   orders.ship_date
FROM
   customers
FULL OUTER JOIN
   orders
ON customers.customer_id = orders.customer_id

The results from the query might look like the following.

customer_name ship_date
Martin's Ice Cream 2021-02-23
Beachside Treats 2021-02-25
NULL 2021-02-25
The Daily Scoop NULL
Mountain Ice Cream NULL
Mona's Natual Flavors 2021-02-28
... etc. ... etc.

For more information

JOINs are going to be useful for working with relational databases and SQL—and you will have plenty of opportunities to practice them on your own. Here are a few other resources that can give you more information about JOINs and how to use them:

  • SQL JOINs : This is a good basic explanation of JOINs with examples. If you need a quick reminder of what the different JOINs do, this is a great resource to bookmark and come back to later.
  • Database JOINs - Introduction to JOIN Types and Concepts: This is a really thorough introduction to JOINs. Not only does this article explain what JOINs are and how to use them, but it also explains the various scenarios in more detail of when and why you would use the different JOINs. This is a great resource if you are interested in learning more about the logic behind JOINing.
  • SQL JOIN Types Explained in Visuals: This resource has a visual representation of the different JOINs. This is a really useful way to think about JOINs if you are a visual learner, and it can be a really useful way to remember the different JOINs.
  • SQL JOINs: Bringing Data Together One Join at a Time: Not only does this resource have a detailed explanation of JOINs with examples, but it also provides example data that you can use to follow along with their step-by-step guide. This is a useful way to practice JOINs with some real data.
  • SQL JOIN: This is another resource that provides a clear explanation of JOINs and uses examples to demonstrate how they work. The examples also combine JOINs with aliasing. This is a great opportunity to see how JOINs can be combined with other SQL concepts that you have been learning about in this course.

Hands-On Activity: Queries for JOINS

Question 1

Activity overview

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

You’ve come a long way in working with relational databases and SQL. Now, you’ll gain practice writing queries that join multiple tables together.

In this activity, you’ll work with the World Bank’s International Education Dataset. By mastering JOIN statements, you’ll be able to fully harness the power of relational databases by combining data from tables linked by keys.

Load and examine the dataset

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

  1. Log in to BigQuery Sandbox. If you have a free trial version of BigQuery, you can use that instead. On the BigQuery page, click the Go to BigQuery button.
  • Note: BigQuery Sandbox frequently updates its user interface. The latest changes may not be reflected in the screenshots presented in this activity, but the principles remain the same. Adapting to changes in software updates is an essential skill for data analysts, and it’s helpful for you to practice troubleshooting. You can also reach out to your community of learners on the discussion forum for help.
  1. If you have never created a BigQuery project before, click CREATE PROJECT on the right side of the screen. If you have created a project before, you can use an existing one or create a new one by clicking the project drop-down in the blue header bar and selecting NEW PROJECT.
  2. Name your project something that will help you identify it later. You can give it a unique project ID or use an auto-generated one. Don’t worry about selecting an organization if you don’t know what to put.
  3. Now, you’ll see the Editor interface. In the middle of the screen is a window where you can type code, and to the left is the Explorer menu where you can search for datasets.

nKrbQFquT_a7l2f1uhRb2w_9ace07ec78804c48baecfa2feb15fdf1_1 Explorer

Before you begin joining two tables together, you’ll first need to figure out which tables to join together. Recall that two tables must be connected in order to join them. Two tables can be joined if the primary key for one table is included in the other table as a foreign key.

To determine what information the tables contain and identify keys you can join them on, review the schema of the tables. To access the schema:

  1. Click + ADD at the top right of the Explorer menu, then scroll down and click on the Public Datasets option from the resulting drop-down list.

IF3dXVDlQza8mq15hy-o1Q_403b9daa480041f199387f0c1b9fb9f1_1 PublicDatasets

  1. In the Search Marketplace search bar, type international education and press return to find the search results.

CRrNgPJvTLii7mznOYmJrA_0e9c902bd7c544c68cb4748cb5330ff1_2 marketplace_intedu

  1. Click the first result, The World Bank's International Education dataset.

sEos92bURwmpdOgFRg0w-w_6bcf4acc006446639bdeb4b56192ebf1_3 inteducationlogo

  1. Click View Dataset. This will bring you back to the BigQuery Sandbox interface in a new tab.
  • Note: You may want to star the bigquery-public-data drop-down to the Explorer menu. You can use this to browse datasets and tables without navigating to the data marketplace in the future.
  1. In the Explorer menu, search for world_bank_intl_education. Click the drop-down arrow to expand the dataset. You may also need to click on the SHOW MORE button for the additional subsets to appear.

XPJ8Ub_XTY-QzlXrJRcwQA_b3a3d7b8edee4e1c82d6b8f09a6b9ff1_1 worldbank

  1. Click the international_education table. This will bring up the table’s schema. If the schema doesn’t appear, click on the Schema tab in the table viewer.
  2. Next, select the country_summary table and examine its schema. You’ll find that the country_code column appears in both the table schemas.
  • Note: Foreign keys don’t always have the same names across tables. If you’re ever unsure if the columns are the same, you can always double-check. You can click the Details tab from each table and confirm that they contain the same kinds of information.

Review JOINS

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

Before you join the two tables with queries, take a moment to review the various kinds of JOIN statements.

fI4oxJLYRwaOKMSS2IcGcQ_40bd0a350be44bbfa9e575de2f80ea2a_DAC5M3L3SR1---Queries-for-JOINS-2

7OOXvooNTXGjl76KDV1xsg_1ddca5ba2b754e55bf4b4fd3a1e6444d_DAC5M3L3SR1---Queries-for-JOINS-1

The two most common kinds of JOIN statements are INNER JOINs and OUTER LEFT JOINs (also known simply as LEFT JOINs). As a review:

  • INNER JOIN: Returns only the rows where the target appears in both tables.
  • LEFT JOIN: Returns every row from the left table, as well as any rows from the right table with matching keys found in the left table.

Note the difference between the INNER JOIN and LEFT JOIN and the implications for when each should be used. Consider the following queries:

-- Let's say table_1 has 100 rows and table_2 has 10 rows.
-- They share 10 keys in common.

-- Using INNER JOIN --> We get 10 rows in our results.
SELECT
    COUNT(*)
FROM
    table_1
INNER JOIN
    table_2
ON table_1.key = table_2.key;

-- Using LEFT JOIN --> We get 100 rows in our results.
SELECT
    COUNT(*)
FROM
    table_1
LEFT JOIN
    table_2
ON table_1.key = table_2.key;

The takeaway here is that the sort of JOIN you use matters. When writing a query, you can draw out a Venn diagram like the example graphic above to help you decide which sort of JOIN you need.

Queries with JOINS and aliases

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

Now, it’s time to actually query the dataset. As a starting point, try a query that pulls information from both the international_education and country_summary tables. Copy, paste, and run the following query:

SELECT 
    `bigquery-public-data.world_bank_intl_education.international_education`.country_name, 
    `bigquery-public-data.world_bank_intl_education.country_summary`.country_code, 
    `bigquery-public-data.world_bank_intl_education.international_education`.value
FROM 
    `bigquery-public-data.world_bank_intl_education.international_education`
INNER JOIN 
    `bigquery-public-data.world_bank_intl_education.country_summary` 
ON `bigquery-public-data.world_bank_intl_education.country_summary`.country_code = `bigquery-public-data.world_bank_intl_education.international_education`.country_code

This basic query joins the tables on the country_code foreign key, and returns the country name, country code, and value column. This is quite a long, unwieldy query for such a basic result! The length of each table name (which must include the full address for each table for BigQuery to know where to pull the data from) makes this hard to read and work with.

However, you can solve this by setting an alias for each table.

Use descriptive aliases

Try using descriptive aliases that tell you what they represent. This next query is the same query as the previous one, but with aliases to improve readability. Copy, paste, and run the following query:

SELECT 
    edu.country_name,
    summary.country_code,
    edu.value
FROM 
    bigquery-public-data.world_bank_intl_education.international_education AS edu
INNER JOIN 
    bigquery-public-data.world_bank_intl_education.country_summary AS summary
ON edu.country_code = summary.country_code

This query is much easier to read and understand. Recall that you can set aliases for tables by specifying the alias for the table after the table’s name in FROM and/or JOIN statements.

For this example, the international_education table was renamed as edu, and the country_summary table as summary. Using descriptive aliases is a best practice and will help you keep your queries clean, readable, and easy to work with.

Use a JOIN to answer a question

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

Now that you’ve confirmed that the JOIN statement works, try to answer an actual data question using this dataset. What is the average amount of money spent per region on education? Copy, paste, and run the following query:

SELECT 
    AVG(edu.value) average_value, summary.region
FROM 
    `bigquery-public-data.world_bank_intl_education.international_education` AS edu
INNER JOIN 
    `bigquery-public-data.world_bank_intl_education.country_summary` AS summary
ON edu.country_code = summary.country_code
WHERE summary.region IS NOT null
GROUP BY summary.region
ORDER BY average_value DESC

Your results should appear like this:

qglYtONSStaJWLTjUnrWSg_2df6d0871ab74e20be0dd04582d34bf1_Screenshot-2021-05-02-4 46 56-PM

Notice how in this query, an alias is also set to give the AVG(edu.value) a more descriptive name for the temporary table the query returns.

Also note that the WHERE statement excludes rows with any null information. This is necessary to present the data succinctly and display only seven rows for the seven regions represented in the data. However, this WHERE statement means that the results will return the same regardless of which JOIN you use. In the next section, you’ll explore a situation where you need to use a specific kind of join in your query...

INNER JOINS versus OUTER JOINS

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

In the last query, you used an INNER JOIN to find the average amount of money spent per region on education. Because of the WHERE statement in this query, using any kind of JOIN produces the same result.

Now, you will write a LEFT JOIN, a type of OUTER JOIN, for a situation where the type of query you use will change the result you return.

Consider this scenario:

You have been tasked to provide data for a feature sports article on Michael Jordan’s basketball career. The writer wants to include a funny twist and asks you to find out if Michael Jordan played better at schools with animal mascots.

To analyze his early career, you start with the years he played basketball in college. You need to examine National Collegiate Athletic Association (NCAA) college basketball stats from 1984.

You’ll need a list of all NCAA Division I colleges and universities; their mascots, if applicable; and their number of wins and losses. You can find this information by typing ncaa_basketball in the Explorer tab dataset search bar on BigQuery.

Next, you will start a new query tab by clicking on the blue + button. Your query should join the season statistics from one table with the mascot information from another. You need to use a LEFT JOIN instead of an INNER JOIN because not all teams have mascots. If you use an INNER JOIN, you would exclude teams with no mascot.

To demonstrate this, copy, paste, and run the following query:

SELECT
 seasons.market AS university,
 seasons.name AS team_name,
 seasons.wins,
 seasons.losses,
 seasons.ties,
 mascots.mascot AS team_mascot
FROM
 `bigquery-public-data.ncaa_basketball.mbb_historical_teams_seasons` AS seasons
LEFT JOIN
 `bigquery-public-data.ncaa_basketball.mascots` AS mascots
ON
 seasons.team_id = mascots.id
WHERE
 seasons.season = 1984
 AND seasons.division = 1
ORDER BY
 seasons.market

This is an example of when a LEFT JOIN is more helpful than an INNER JOIN. With this query, you can look at college basketball statistics to get a better sense of Michael Jordan’s early career, find out more information about which teams had mascots, and answer your business question.

Confirmation and reflection

G-KePoi0R2Sinj6ItBdkMg_2d69ab4b929f40f2b472a78fdd5ed880_line-y

In the last query, you use a LEFT JOIN instead of an INNER JOIN to find the correct information. Beneath the query results, you’ll find that the number of rows in your joined table is 281. If you rerun the query with an INNER JOIN instead of a LEFT JOIN, how many rows would it return?

A. 274

B. 281

C. 301

D. 324

The correct answer is A. 274. Explain: The number of rows returned by an INNER JOIN is 274. When you run the query with an INNER JOIN instead of a LEFT JOIN, you exclude universities without mascots and return fewer rows of data. Knowing which JOIN to use is very important for analyzing data. Going forward, you can use your knowledge of JOINs to properly combine data from multiple tables.

Question 2

In this activity, you used JOIN statements to combine data from multiple tables. In the text box below, write 2-3 sentences (40-60 words) in response to each of the following questions.

  • Why do you think JOIN statements are important for working with databases?
  • How do you distinguish INNER JOINS from OUTER JOINS?

Explain: Congratulations on completing this hands-on activity! A good response would include that JOINs allow you to combine data from linked tables, which helps you make comparisons and answer business questions.

Mastering JOIN statements is one of the most important parts of SQL, as combining data from multiple database tables is a core skill for data analysts. And when you apply for jobs, remember that JOIN statements are a common topic in data analyst interviews! The more JOIN statements you write, the more prepared you will be for a data analyst role.

Optional: Upload the warehouse dataset to BigQuery

The next video demonstrates how to use COUNT and COUNT DISTINCT in SQL to count and return the number of certain values in a dataset.

If you would like to follow along with the instructor, you will need to log in to your BigQuery account and upload the warehouse data provided as two CSV files. If you have hopped around courses, Using BigQuery in the Prepare Data for Exploration course covers how to set up a BigQuery account.

Prepare for the next video

  • First, download the two CSV files from the attachments below:

Warehouse Orders - Warehouse

Warehouse Orders - Orders

  • Next, complete the following steps in your BigQuery console to upload the Warehouse Orders dataset with the two Warehouse and Orders tables.

Step 1: Open your BigQuery console and click on the project you want to upload the data to.

Step 2: In the Explorer on the left, click the Actions icon (three vertical dots) next to your project name and select Create dataset.

pYj9p7XFS-uI_ae1xYvr6w_05852f3ecf20415fba255e844eff49f1_actions_icon_create_dataset

Step 3: In the upcoming video, the name "warehouse_orders" will be used for the dataset. If you plan to follow along with the video, enter warehouse_orders for the Dataset ID.

gQd3h1iJThCHd4dYid4QoQ_7956d57d4c464acdad3adec2379ee7f1_warehouse_orders

Step 4: Click CREATE DATASET (blue button) to add the dataset to your project.

Step 5: In the Explorer on the left, click to expand your project, and then click the warehouse_orders dataset you just created.

Step 6: Click the Actions icon (three vertical dots) next to warehouse_orders and select Open.

Step 7: Click the blue + icon at the top right to open the Create table window.

gwqLOvPyQo2Kizrz8tKNDA_2acc3838a6ed418596c2828d5a007df1_create_table_icon

Step 8: Under Source, for the Create table from selection, choose where the data will be coming from.

  • Select Upload.
  • Click Browse to select the Warehouse Orders - Warehouse CSV file you downloaded.
  • Choose CSV from the file format drop-down.

Step 9: For Table name, enter Warehouse if you plan to follow along with the video.

Step 10: For Schema, click the Auto detect check box.

Step 11: Click Create table (blue button). You will now see the Warehouse table under your warehouse_orders dataset in your project.

Step 12: Click the warehouse_orders dataset again.

Step 13: Click the icon to open the Create table window again.

Step 14: Under Source, for the Create table from selection, choose where the data will be coming from.

  • Select Upload.
  • Click Browse to select the Warehouse Orders - Orders CSV file you downloaded.
  • Choose CSV from the file format drop-down.

Step 15: For Table name, enter Orders if you plan to follow along with the video.

Step 16: For Schema, click the Auto detect check box.

Step 17: Click Create table (blue button). You will now see the Orders table under your warehouse_orders dataset in your project.

Step 18: Click the Warehouse table and click the Preview tab to verify that you have 10 rows of data.

HItXPe74Sh6LVz3u-Moegw_9474b2eb1d4d4d71864944613b7b35f1_Warehouse

Step 19: Click the Orders table and click the Preview tab to verify that you have the data shown below.

gxm9QwxrTuWZvUMMa47lpQ_45f359d9a046449ea566e3a65ca84cf1_Orders

If your data previews match, you are ready to follow along with the next video.

Test your knowledge on using JOINS to aggregate data

Question 1

A data analyst wants to retrieve only records from a database that have matching values in two different tables. Which JOIN function should they use?

A. RIGHT JOIN

B. INNER JOIN

C. LEFT JOIN

D. OUTER JOIN

The correct answer is B. INNER JOIN. Explain: To retrieve only records from a database that have matching values in two different tables, the analyst should use INNER JOIN.

Question 2

You are writing a SQL query to instruct a database to count values in a specified range. You only want to count each value once, even if it appears multiple times. Which function should you include in your query?

A. COUNT VALUES

B. COUNT DISTINCT

C. COUNT RANGE

D. COUNT

The correct answer is B. COUNT DISTINCT. Explain: To tell a database to return distinct values in a specified range, the analyst should use COUNT DISTINCT in their query.

Question 3

A data analyst wants to temporarily name a column in their query to make it easier to read and write. What technique should they use?

A. Naming

B. Tagging

C. Filtering

D. Aliasing

The correct answer is D. Aliasing. Explain: To temporarily name a column in a query to make it easier to read and write, the analyst should use aliasing.