SQL Query Methods : CTEs & Window Functions - Govarthan-Boopalan/Customer_Behaviour_Analysis GitHub Wiki


Let’s break down the key SQL techniques used in the scripts (CTEs and Window Functions) with simple analogies, examples, and a step-by-step guide.


1. Common Table Expressions (CTEs)

What is a CTE?

A CTE (Common Table Expression) is like a temporary, named result set you can reference within a SQL query. Think of it as writing a draft section of a report that you’ll reference later in the final version.

Analogy

Imagine planning a road trip:

  • You first list all cities you’ll visit (CTE).
  • Then, you plan the route between those cities (main query).
    The CTE is your list of cities; the main query is the route built using that list.

Example from the Scripts

In 05_Cx_experience_Analysis_tables.py, this CTE calculates customer retention:

WITH customer_retention AS (
    SELECT 
        CustomerID, 
        COUNT(DISTINCT VisitDate) AS Visits,
        SUM(CASE WHEN Stage = 'Checkout' AND Action = 'Purchase' THEN 1 ELSE 0 END) AS Purchases 
    FROM customer_journey 
    GROUP BY CustomerID
)
SELECT 
    COUNT(DISTINCT CustomerID) AS Total_Customers,
    SUM(CASE WHEN Visits > 1 THEN 1 ELSE 0 END) AS Retained_Customers,
    ROUND((SUM(CASE WHEN Visits > 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(DISTINCT CustomerID)), 2) AS Retention_Rate 
FROM customer_retention;

Step-by-Step Breakdown

  1. Define the CTE:
    • Name it customer_retention.
    • Calculate total visits and purchases per customer.
  2. Use the CTE in the main query:
    • Calculate retention rate using the precomputed Visits and Purchases from the CTE.

Why Use CTEs?

  • Simplify complex logic: Break queries into readable chunks.
  • Reuse subqueries: Avoid repeating the same subquery multiple times.

2. Window Functions

What is a Window Function?

A window function performs calculations across a set of rows related to the current row, without collapsing rows. Think of it as analyzing data through a "sliding window."

Analogy

Imagine ranking students in a class:

  • You sort them by grades (ORDER BY Grade).
  • Assign each a rank based on their position in the sorted list (ROW_NUMBER()).
    The "window" is the entire class, and the ranking is done within that window.

Example (Hypothetical)

While the scripts don’t explicitly use window functions, here’s how they might be applied:

SELECT 
    ProductID,
    ProductName,
    Rating,
    AVG(Rating) OVER (PARTITION BY CategoryID) AS AvgCategoryRating,
    ROW_NUMBER() OVER (ORDER BY Rating DESC) AS Rank
FROM products;

Step-by-Step Breakdown

  1. Define the Window:
    • OVER (PARTITION BY CategoryID): Group products by category.
    • OVER (ORDER BY Rating DESC): Sort all products by rating.
  2. Apply the Function:
    • AVG(Rating) OVER (...): Calculate average rating per category.
    • ROW_NUMBER() OVER (...): Rank products by rating.

Key Window Functions

Function Purpose Example
ROW_NUMBER() Assign a unique rank to each row. Rank products by sales.
RANK() Rank rows with gaps for ties. Rank employees by salary.
SUM() OVER() Running total. Calculate cumulative sales.
LEAD()/LAG() Compare current row to next/previous row. Track month-over-month growth.

Step-by-Step Guide to Writing SQL Queries

1. Start with the End Goal

Ask: "What do I need to show?"

  • Example: "Find the top 5 worst-rated products."

2. Identify Data Sources

List tables and columns needed:

  • customer_reviewsProductID, Rating
  • productsProductName

3. Break Down Logic

  • Filter: Ratings ≤ 2.
  • Aggregate: Average rating per product.
  • Sort: Order by average rating ascending.

4. Choose the Right Tool

  • Use a CTE for multi-step calculations.
  • Use Window Functions for rankings or running totals.

Example Query Using Both CTE & Window Function

Goal: "Find products with below-average ratings in their category."

WITH CategoryAverages AS (
    SELECT 
        CategoryID,
        AVG(Rating) AS AvgCategoryRating
    FROM products
    GROUP BY CategoryID
)
SELECT 
    p.ProductID,
    p.ProductName,
    p.Rating,
    ca.AvgCategoryRating,
    CASE WHEN p.Rating < ca.AvgCategoryRating THEN 'Below Average' ELSE 'Above Average' END AS Status
FROM products p
JOIN CategoryAverages ca ON p.CategoryID = ca.CategoryID;

Steps Explained

  1. CTE (CategoryAverages): Calculate average rating per category.
  2. Main Query: Compare each product’s rating to its category average.

When to Use CTEs vs. Window Functions

Scenario Use CTE Use Window Function
Reusing a subquery multiple times ✅ Yes ❌ No
Calculating running totals ❌ No ✅ Yes (e.g., SUM() OVER())
Breaking down complex logic ✅ Yes (simplifies code) ❌ No
Ranking or partitioning data ❌ No ✅ Yes (e.g., ROW_NUMBER(), RANK())

Key Takeaways

  1. CTEs organize complex logic into reusable blocks.
  2. Window Functions analyze data in partitions without collapsing rows.
  3. Combine both for advanced analytics (e.g., "Compare sales to category averages using a CTE, then rank results with a window function").

By mastering these techniques, you’ll write SQL queries like a chef crafts a recipe: breaking tasks into steps (CTEs) and adding precise flavor (window functions)! 🧑🍳🔍