A Comprehensive Guide to SQL Window Functions - maxxts7/Techbook GitHub Wiki
Window functions are a powerful feature in SQL that allow you to perform calculations across a set of rows that are related to the current row. They are incredibly useful for complex analytical queries and can significantly simplify your SQL code. In this comprehensive guide, we'll explore all the major window functions available in SQL.
Understanding Window Functions
Before we dive into specific functions, let's understand what window functions are and how they work.
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is similar to the type of calculation you can do with a GROUP BY clause. However, unlike GROUP BY, the rows retain their separate identities and are not grouped into a single output row.
The general syntax for a window function is:
function_name(argument1, argument2, ...) OVER (
[PARTITION BY partition_expression, ...]
[ORDER BY sort_expression [ASC | DESC], ...]
[frame_clause]
)
Now, let's explore the different types of window functions available in SQL.
1. Ranking Functions
RANK()
Assigns a rank to each row within a partition of a result set. Ties are assigned the same rank, and the next rank(s) are skipped.
SELECT product_name, category, price,
RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank
FROM products;
DENSE_RANK()
Similar to RANK(), but doesn't skip ranks in case of ties.
SELECT product_name, category, price,
DENSE_RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_dense_rank
FROM products;
ROW_NUMBER()
Assigns a unique number to each row within a partition.
SELECT product_name, category, price,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS price_row_number
FROM products;
NTILE(n)
Divides the rows into n
roughly equal groups.
SELECT product_name, category, price,
NTILE(4) OVER (PARTITION BY category ORDER BY price DESC) AS price_quartile
FROM products;
2. Offset Functions
LAG(column, offset, default)
Accesses data from a previous row in the result set.
SELECT sale_date, amount,
LAG(amount, 1, 0) OVER (ORDER BY sale_date) AS previous_day_sales
FROM sales;
LEAD(column, offset, default)
Accesses data from a subsequent row in the result set.
SELECT sale_date, amount,
LEAD(amount, 1, 0) OVER (ORDER BY sale_date) AS next_day_sales
FROM sales;
3. Aggregate Window Functions
These functions perform aggregate operations over a window of rows.
SUM()
SELECT sale_date, amount,
SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_3day_sum
FROM sales;
AVG()
SELECT product_name, category, price,
AVG(price) OVER (PARTITION BY category) AS category_avg_price
FROM products;
COUNT()
SELECT customer_id, order_date,
COUNT(*) OVER (PARTITION BY customer_id ORDER BY order_date) AS customer_order_count
FROM orders;
MIN() and MAX()
SELECT product_name, category, price,
MIN(price) OVER (PARTITION BY category) AS category_min_price,
MAX(price) OVER (PARTITION BY category) AS category_max_price
FROM products;
4. Distribution Functions
PERCENT_RANK()
Calculates the percentile rank of a row within a partition.
SELECT product_name, category, price,
PERCENT_RANK() OVER (PARTITION BY category ORDER BY price) AS price_percent_rank
FROM products;
CUME_DIST()
Calculates the cumulative distribution of a value within a partition.
SELECT product_name, category, price,
CUME_DIST() OVER (PARTITION BY category ORDER BY price) AS price_cume_dist
FROM products;
5. Value Functions
FIRST_VALUE() and LAST_VALUE()
Returns the first or last value in an ordered set of values.
SELECT product_name, category, price,
FIRST_VALUE(price) OVER (PARTITION BY category ORDER BY price) AS lowest_price_in_category,
LAST_VALUE(price) OVER (PARTITION BY category ORDER BY price
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS highest_price_in_category
FROM products;
NTH_VALUE(column, n)
Returns the nth value in an ordered set of values.
SELECT product_name, category, price,
NTH_VALUE(price, 2) OVER (PARTITION BY category ORDER BY price
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS second_lowest_price
FROM products;
Conclusion
Window functions are a powerful tool in SQL that can greatly simplify complex analytical queries. They allow you to perform calculations across sets of rows while still maintaining the granularity of your data. Whether you're doing ranking, running totals, moving averages, or complex statistical calculations, window functions can often provide a concise and efficient solution.
As with any advanced SQL feature, the key to mastering window functions is practice. Try incorporating these functions into your queries and see how they can provide new insights into your data. Happy querying!
artifact https://claude.site/artifacts/b1c9a142-8e00-489b-8660-2cd5dcc942dd