SQL Query Methods : CTEs & Window Functions - Govarthan-Boopalan/Customer_Behaviour_Analysis GitHub Wiki
Let’s break down the key SQL techniques used in the scripts (CTEs and Window Functions) with simple analogies, examples, and a step-by-step guide.
1. Common Table Expressions (CTEs)
What is a CTE?
A CTE (Common Table Expression) is like a temporary, named result set you can reference within a SQL query. Think of it as writing a draft section of a report that you’ll reference later in the final version.
Analogy
Imagine planning a road trip:
- You first list all cities you’ll visit (CTE).
- Then, you plan the route between those cities (main query).
The CTE is your list of cities; the main query is the route built using that list.
Example from the Scripts
In 05_Cx_experience_Analysis_tables.py
, this CTE calculates customer retention:
WITH customer_retention AS (
SELECT
CustomerID,
COUNT(DISTINCT VisitDate) AS Visits,
SUM(CASE WHEN Stage = 'Checkout' AND Action = 'Purchase' THEN 1 ELSE 0 END) AS Purchases
FROM customer_journey
GROUP BY CustomerID
)
SELECT
COUNT(DISTINCT CustomerID) AS Total_Customers,
SUM(CASE WHEN Visits > 1 THEN 1 ELSE 0 END) AS Retained_Customers,
ROUND((SUM(CASE WHEN Visits > 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(DISTINCT CustomerID)), 2) AS Retention_Rate
FROM customer_retention;
Step-by-Step Breakdown
- Define the CTE:
- Name it
customer_retention
. - Calculate total visits and purchases per customer.
- Name it
- Use the CTE in the main query:
- Calculate retention rate using the precomputed
Visits
andPurchases
from the CTE.
- Calculate retention rate using the precomputed
Why Use CTEs?
- Simplify complex logic: Break queries into readable chunks.
- Reuse subqueries: Avoid repeating the same subquery multiple times.
2. Window Functions
What is a Window Function?
A window function performs calculations across a set of rows related to the current row, without collapsing rows. Think of it as analyzing data through a "sliding window."
Analogy
Imagine ranking students in a class:
- You sort them by grades (ORDER BY Grade).
- Assign each a rank based on their position in the sorted list (ROW_NUMBER()).
The "window" is the entire class, and the ranking is done within that window.
Example (Hypothetical)
While the scripts don’t explicitly use window functions, here’s how they might be applied:
SELECT
ProductID,
ProductName,
Rating,
AVG(Rating) OVER (PARTITION BY CategoryID) AS AvgCategoryRating,
ROW_NUMBER() OVER (ORDER BY Rating DESC) AS Rank
FROM products;
Step-by-Step Breakdown
- Define the Window:
OVER (PARTITION BY CategoryID)
: Group products by category.OVER (ORDER BY Rating DESC)
: Sort all products by rating.
- Apply the Function:
AVG(Rating) OVER (...)
: Calculate average rating per category.ROW_NUMBER() OVER (...)
: Rank products by rating.
Key Window Functions
Function | Purpose | Example |
---|---|---|
ROW_NUMBER() |
Assign a unique rank to each row. | Rank products by sales. |
RANK() |
Rank rows with gaps for ties. | Rank employees by salary. |
SUM() OVER() |
Running total. | Calculate cumulative sales. |
LEAD()/LAG() |
Compare current row to next/previous row. | Track month-over-month growth. |
Step-by-Step Guide to Writing SQL Queries
1. Start with the End Goal
Ask: "What do I need to show?"
- Example: "Find the top 5 worst-rated products."
2. Identify Data Sources
List tables and columns needed:
customer_reviews
→ProductID
,Rating
products
→ProductName
3. Break Down Logic
- Filter: Ratings ≤ 2.
- Aggregate: Average rating per product.
- Sort: Order by average rating ascending.
4. Choose the Right Tool
- Use a CTE for multi-step calculations.
- Use Window Functions for rankings or running totals.
Example Query Using Both CTE & Window Function
Goal: "Find products with below-average ratings in their category."
WITH CategoryAverages AS (
SELECT
CategoryID,
AVG(Rating) AS AvgCategoryRating
FROM products
GROUP BY CategoryID
)
SELECT
p.ProductID,
p.ProductName,
p.Rating,
ca.AvgCategoryRating,
CASE WHEN p.Rating < ca.AvgCategoryRating THEN 'Below Average' ELSE 'Above Average' END AS Status
FROM products p
JOIN CategoryAverages ca ON p.CategoryID = ca.CategoryID;
Steps Explained
- CTE (
CategoryAverages
): Calculate average rating per category. - Main Query: Compare each product’s rating to its category average.
When to Use CTEs vs. Window Functions
Scenario | Use CTE | Use Window Function |
---|---|---|
Reusing a subquery multiple times | ✅ Yes | ❌ No |
Calculating running totals | ❌ No | ✅ Yes (e.g., SUM() OVER() ) |
Breaking down complex logic | ✅ Yes (simplifies code) | ❌ No |
Ranking or partitioning data | ❌ No | ✅ Yes (e.g., ROW_NUMBER() , RANK() ) |
Key Takeaways
- CTEs organize complex logic into reusable blocks.
- Window Functions analyze data in partitions without collapsing rows.
- Combine both for advanced analytics (e.g., "Compare sales to category averages using a CTE, then rank results with a window function").
By mastering these techniques, you’ll write SQL queries like a chef crafts a recipe: breaking tasks into steps (CTEs) and adding precise flavor (window functions)! 🧑🍳🔍