GroupBy - rFronteddu/general_wiki GitHub Wiki
Aggregators
The main idea behind an aggregate function is to take multiple inputs and return a single output.
The most common aggregate functions:
- AVG() - returns average value (you can use ROUND() to specify precision)
- COUNT() - returns number of values (simply returns the number of rows so we just use COUNT(*))
- MAX() - returns maximum value
- MIN() - returns minimum value
- SUM() - returns the sum of all values
Aggregate function calls happen only in the SELECT clause or the HAVING clause.
SELECT MIN(replacement_cost) FROM film;
SELECT MAX(replacement_cost), MIN(replacement_cost) FROM film;
SELECT ROUND(AVG(replacement_cost), 2)
FROM film;
SELECT SUM(replacement_cost)
FROM film;
GROUP BY
GROUP BY allows us to aggregate columns per some category.
Note that GROUP BY must appear right after a FROM or WHERE statement.
Note that in the SELECT statement, columns must either have an aggregate function or be in the GROUP BY call.
- If we are selecting a category_column in the GROUP BY it must be in the SELECT statement.
- If a column has an aggregate function, it does need to be included in the GROUP BY call.
- WHERE statements should not refer to the aggregation result (you should use HAVING for that).
# this will show sum of sales grouped by company first and then division
SELECT company, division, SUM(sales)
FROM finance_table
GROUP BY company, division
- If you want to sort results based on the aggregate, make sure to reference the entire function
SELECT company, SUM(sales)
FROM finance_table
GROUP BY company
ORDER BY SUM(sales)
LIMIT 5;
SELECT category_col, AGG(data_col)
FROM table
GROUP BY category_col;
SELECT category_col, AGG(data_col)
FROM table
WHERE category_col != 'A'
GROUP BY category_col;
SELECT customer_id, SUM(amount)
FROM payment
GROUP BY customer_id
ORDER BY SUM(amount);
# How much each customer spent for each staff ID
SELECT customer_id, staff_id, SUM(amount)
FROM payment
GROUP BY staff_id, customer_id
ORDER BY customer_id;`
HAVING
The HAVING clause allows us to filter after an aggregation has taken place.
# We could filter before executing the GROUP BY with a where, but if we need to wait for an aggregate, we can use having.
* We cannot use WHERE to filter based of aggregates results, because those happen after WHERE is executed.
* whenever you can is better to do the WHERE first to filter out rows to process.
SELECT company, SUM(sales) FROM finance_table GROUP BY company HAVING SUM(sales) > 1000;