5.3.3.Work with subqueries - sj50179/Google-Data-Analytics-Professional-Certificate GitHub Wiki

Queries within queries

Subquery

  • A SQL query that is nested inside a larger query

The inner query executes first so that the results can be passed on to the outer query to use. Subqueries can get a little confusing because there's so many layers. But if you keep in mind that the innermost query executes first, it'll be easier to order your subqueries when you want them to execute.

Subqueries can also be nested inside all sorts of other queries. Usually you'll find subqueries nested in FROM or WHERE clauses.

Example:

SELECT
    station_id,
    num_bikes_available,
    (SELECT
        AVG(num_bikes_available)
    FROM bigquery-public-data.new_york_citibike.citibike_stations) AS avg_num_bike_available
FROM
    bigquery-public-data.new_york_citibike.citibike_stations
SELECT
    station_id,
    name,
    number_of_rides AS number_of_rides_starting_at_station
FROM
    (
        SELECT
            start_station_id,
            COUNT(*) number_of_rides
        FROM
            bigquery-public-data.new_york_citibike.citibike_trips
        GROUP BY 
            start_station_id
    )
AS station_num_trips
INNER JOIN
    bigquery-public-data.new_york_citibike.citibike_stations ON station_id = start_station_id
ORDER BY
    number_of_rides DESC
SELECT
    station_id,
    name
FROM
    bigquery-public-data.new_york_citibike.citibike_stations
WHERE
    station_id IN
    (
        SELECT
            start_station_id
        FROM
            bigquery-public-data.new_york_citibike.citibike_trips
        WHERE
            usertype = 'Subscriber'
    )

Using subqueries to aggregate data

HAVING

  • Allows you to add a filter to your query instead of the underlying table that can only be used with aggregate functions

CASE

  • Returns records with your conditions by allowing you to include if/then statements in your query

Example:

SELECT
    Warehouse.warehouse_id,
    CONCAT(Warehouse.state, ': ', Warehouse.warehouse_alias) AS warehouse_name,
    COUNT(Orders.order_id) AS number_of_orders,
    (SELECT
        COUNT(*)
    FROM warehouse_orders.Orders AS Orders) AS total_orders,
    CASE 
        WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM warehouse_orders.Orders AS Orders) <= 0.20
        THEN "Fulfilled 0-20% of Orders"
        WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM warehouse_orders.Orders AS Orders) > 0.20
        AND COUNT(Orders.order_id)/(SELECT COUNT(*) FROM warehouse_orders.Orders AS Orders) <= 0.60
        THEN "Fulfilled 21-60% of Orders"
    ELSE "Fulffiled more than 60% of Orders"
    END AS fulfillment_summary
FROM warehouse_orders.Warehouse AS Warehouse
LEFT JOIN warehouse_orders.Orders AS Orders
    ON Orders.warehouse_id = Warehouse.warehouse_id
GROUP BY
    Warehouse.warehouse_id, warehouse_name
HAVING
    COUNT(Orders.order_id) > 0

SQL functions and subqueries: A functional friendship

How do SQL functions, function?

SQL functions are what help make data aggregation possible. (As a reminder, data aggregation is the process of gathering data from multiple sources in order to combine it into a single, summarized collection.) So, how do SQL functions work? Going back to W3Schools, let’s review some of these functions to get a better understanding of how to run these queries:

  • SQL HAVING: This is an overview of the HAVING clause, including what it is and a tutorial on how and when it works.
  • SQL CASE: Explore the usage of the CASE statement and examples of how it works.
  • SQL IF: This is a tutorial of the IF function and offers examples that you can practice with.
  • SQL COUNT: The COUNT function is just as important as all the rest, and this tutorial offers multiple examples to review.

Subqueries - the cherry on top

Think of a query as a cake. A cake can have multiple layers contained within it and even layers within those layers. Each of these layers are our subqueries, and when you put all of the layers together, you get a cake (query). Usually, you will find subqueries nested in the SELECT, FROM, and/or WHERE clauses. There is no general syntax for subqueries, but the syntax for a basic subquery is as follows:

SELECT account_table.*
FROM (
        SELECT *
        FROM transactino.sf_model_feature_2014_01
        WHERE day_of_week = 'Friday'
    ) account_table
WHERE account_table.availability = 'YES'

You will find that, within the first SELECT clause is another SELECT clause. The second SELECT clause marks the start of the subquery in this statement. There are many different ways in which you can make use of subqueries, and resources referenced will provide additional guidance as you learn. But first, let’s recap the subquery rules.

There are a few rules that subqueries must follow:

  • Subqueries must be enclosed within parentheses
  • A subquery can have only one column specified in the SELECT clause. But if you want a subquery to compare multiple columns, those columns must be selected in the main query.
  • Subqueries that return more than one row can only be used with multiple value operators, such as the IN operator which allows you to specify multiple values in a WHERE clause.
  • A subquery can’t be nested in a SET command. The SET command is used with UPDATE to specify which columns (and values) are to be updated in a table.

Additional resources

The following resources offer more guidance into subqueries and their usage:

  • SQL subqueries: This detailed introduction includes the definition of a subquery, its purpose in SQL, when and how to use it, and what the results will be
  • Writing subqueries in SQL: Explore the basics of subqueries in this interactive tutorial, including examples and practice problems that you can work through

As you continue to learn more about using SQL, functions, and subqueries, you will realize how much time you can truly save when memorizing these tips and tricks.

Test your knowledge on working with subqueries

TOTAL POINTS 3

Question 1

Which of the following queries contain subqueries? Select all that apply.

  1. SELECT 
        price
    FROM 
        sales
    WHERE 
        price = (SELECT MAX(salary) 
    	         FROM sales)
    
  2. SELECT 
        call
    FROM 
        recordings
    ORDER BY 
        call.employee_id, 
        call.start_time
    
  3. SELECT 
        first_name, last_name
    FROM 
        customers
    WHERE 
        customer_id NOT IN (SELECT customer_id 
                            FROM customers
                            WHERE store = 704)
    
  4. SELECT
        employee_id
    FROM
        employees
    WHERE
        department_id IN (SELECT department_id
                          FROM departments
                          WHERE location_id = 1000)
    

ANSWER: 1,3,4

Correct. The three queries with statements in parentheses contain subqueries.

Question 2

Fill in the blank: A data analyst uses aliasing to make it easier to read and write a query. Aliasing involves temporarily _____ a table or column in a query.

  • removing
  • naming
  • copying
  • hiding

Correct. Aliasing involves temporarily naming a table or column in a query.

Question 3

When working with subqueries, the outer query executes first.

  • True
  • False

Correct. The inner query executes first, then the results are passed onto the outer query to use.