5.3.3.Work with subqueries - sj50179/Google-Data-Analytics-Professional-Certificate GitHub Wiki
Queries within queries
Subquery
- A SQL query that is nested inside a larger query
The inner query executes first so that the results can be passed on to the outer query to use. Subqueries can get a little confusing because there's so many layers. But if you keep in mind that the innermost query executes first, it'll be easier to order your subqueries when you want them to execute.
Subqueries can also be nested inside all sorts of other queries. Usually you'll find subqueries nested in FROM or WHERE clauses.
Example:
SELECT
station_id,
num_bikes_available,
(SELECT
AVG(num_bikes_available)
FROM bigquery-public-data.new_york_citibike.citibike_stations) AS avg_num_bike_available
FROM
bigquery-public-data.new_york_citibike.citibike_stations
SELECT
station_id,
name,
number_of_rides AS number_of_rides_starting_at_station
FROM
(
SELECT
start_station_id,
COUNT(*) number_of_rides
FROM
bigquery-public-data.new_york_citibike.citibike_trips
GROUP BY
start_station_id
)
AS station_num_trips
INNER JOIN
bigquery-public-data.new_york_citibike.citibike_stations ON station_id = start_station_id
ORDER BY
number_of_rides DESC
SELECT
station_id,
name
FROM
bigquery-public-data.new_york_citibike.citibike_stations
WHERE
station_id IN
(
SELECT
start_station_id
FROM
bigquery-public-data.new_york_citibike.citibike_trips
WHERE
usertype = 'Subscriber'
)
Using subqueries to aggregate data
HAVING
- Allows you to add a filter to your query instead of the underlying table that can only be used with aggregate functions
CASE
- Returns records with your conditions by allowing you to include if/then statements in your query
Example:
SELECT
Warehouse.warehouse_id,
CONCAT(Warehouse.state, ': ', Warehouse.warehouse_alias) AS warehouse_name,
COUNT(Orders.order_id) AS number_of_orders,
(SELECT
COUNT(*)
FROM warehouse_orders.Orders AS Orders) AS total_orders,
CASE
WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM warehouse_orders.Orders AS Orders) <= 0.20
THEN "Fulfilled 0-20% of Orders"
WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM warehouse_orders.Orders AS Orders) > 0.20
AND COUNT(Orders.order_id)/(SELECT COUNT(*) FROM warehouse_orders.Orders AS Orders) <= 0.60
THEN "Fulfilled 21-60% of Orders"
ELSE "Fulffiled more than 60% of Orders"
END AS fulfillment_summary
FROM warehouse_orders.Warehouse AS Warehouse
LEFT JOIN warehouse_orders.Orders AS Orders
ON Orders.warehouse_id = Warehouse.warehouse_id
GROUP BY
Warehouse.warehouse_id, warehouse_name
HAVING
COUNT(Orders.order_id) > 0
SQL functions and subqueries: A functional friendship
How do SQL functions, function?
SQL functions are what help make data aggregation possible. (As a reminder, data aggregation is the process of gathering data from multiple sources in order to combine it into a single, summarized collection.) So, how do SQL functions work? Going back to W3Schools, let’s review some of these functions to get a better understanding of how to run these queries:
- SQL HAVING: This is an overview of the HAVING clause, including what it is and a tutorial on how and when it works.
- SQL CASE: Explore the usage of the CASE statement and examples of how it works.
- SQL IF: This is a tutorial of the IF function and offers examples that you can practice with.
- SQL COUNT: The COUNT function is just as important as all the rest, and this tutorial offers multiple examples to review.
Subqueries - the cherry on top
Think of a query as a cake. A cake can have multiple layers contained within it and even layers within those layers. Each of these layers are our subqueries, and when you put all of the layers together, you get a cake (query). Usually, you will find subqueries nested in the SELECT, FROM, and/or WHERE clauses. There is no general syntax for subqueries, but the syntax for a basic subquery is as follows:
SELECT account_table.*
FROM (
SELECT *
FROM transactino.sf_model_feature_2014_01
WHERE day_of_week = 'Friday'
) account_table
WHERE account_table.availability = 'YES'
You will find that, within the first SELECT clause is another SELECT clause. The second SELECT clause marks the start of the subquery in this statement. There are many different ways in which you can make use of subqueries, and resources referenced will provide additional guidance as you learn. But first, let’s recap the subquery rules.
There are a few rules that subqueries must follow:
- Subqueries must be enclosed within parentheses
- A subquery can have only one column specified in the SELECT clause. But if you want a subquery to compare multiple columns, those columns must be selected in the main query.
- Subqueries that return more than one row can only be used with multiple value operators, such as the IN operator which allows you to specify multiple values in a WHERE clause.
- A subquery can’t be nested in a SET command. The SET command is used with UPDATE to specify which columns (and values) are to be updated in a table.
Additional resources
The following resources offer more guidance into subqueries and their usage:
- SQL subqueries: This detailed introduction includes the definition of a subquery, its purpose in SQL, when and how to use it, and what the results will be
- Writing subqueries in SQL: Explore the basics of subqueries in this interactive tutorial, including examples and practice problems that you can work through
As you continue to learn more about using SQL, functions, and subqueries, you will realize how much time you can truly save when memorizing these tips and tricks.
Test your knowledge on working with subqueries
TOTAL POINTS 3
Question 1
Which of the following queries contain subqueries? Select all that apply.
-
SELECT price FROM sales WHERE price = (SELECT MAX(salary) FROM sales)
-
SELECT call FROM recordings ORDER BY call.employee_id, call.start_time
-
SELECT first_name, last_name FROM customers WHERE customer_id NOT IN (SELECT customer_id FROM customers WHERE store = 704)
-
SELECT employee_id FROM employees WHERE department_id IN (SELECT department_id FROM departments WHERE location_id = 1000)
ANSWER: 1,3,4
Correct. The three queries with statements in parentheses contain subqueries.
Question 2
Fill in the blank: A data analyst uses aliasing to make it easier to read and write a query. Aliasing involves temporarily _____ a table or column in a query.
removing- naming
copyinghiding
Correct. Aliasing involves temporarily naming a table or column in a query.
Question 3
When working with subqueries, the outer query executes first.
True- False
Correct. The inner query executes first, then the results are passed onto the outer query to use.