DAY 27: SQL - Advanced Analytics (Cohort, Retention, RFM Analysis)

Guru Charan Pathalla

DAY 27: SQL - Advanced Analytics (Cohort, Retention, RFM Analysis)

Learning Objectives: - Perform cohort analysis - Calculate retention rates - Conduct RFM analysis - Build customer segmentation - Apply advanced analytical techniques

Topics Covered (60 min):

1. Cohort Analysis - Group customers by acquisition date - Track behavior over time by cohort - Example: Month 0, Month 1, Month 2 retention - Use DATEDIFF and CASE for cohort buckets - Pivot results for cohort matrix

2. Retention Analysis - Calculate repeat purchase rate - Monthly/quarterly retention cohorts - Churn rate calculation - Retention curve visualization - Identify patterns in customer lifecycle

3. RFM Analysis - Recency: Days since last purchase - Frequency: Number of purchases - Monetary: Total spending - Score each dimension (1-5 scale) - Combine for customer segments - Example: 555 = Champions, 111 = Lost

4. Customer Segmentation - Use NTILE for percentile-based scoring - Create meaningful segments - Action strategies per segment - Lifetime value estimation - Predictive scoring

Video Resources:

  1. Excel Macros & VBA for Beginners (1:14)
    https://www.youtube.com/watch?v=m_FgV7i7X3I

  2. Excel Macros & VBA - Tutorial for Beginners (0:29)
    https://www.youtube.com/watch?v=IJQHMFLXk_c

  3. Excel VBA Macros - Beginner to PRO Masterclass (3:05)
    https://www.youtube.com/watch?v=7hd1Nn4cNyI

AI Learning Prompts (Concept Understanding Only):

Understanding Cohort Analysis:

I'm learning cohort and retention analysis. Please explain:

1. What is cohort analysis and why is it useful?
2. How do I group customers by signup month and track retention?
3. Show me how to calculate month-over-month retention
4. What is churn rate and how do I calculate it?
5. Provide SQL examples using customer order data

Use SQL Server syntax with clear examples.

Understanding RFM Analysis:

Help me understand RFM analysis:

1. What is RFM and what do Recency, Frequency, Monetary mean?
2. How do I calculate RFM scores (1-5 for each)?
3. Show me how to segment customers based on RFM
4. What actions should I take for each segment?
5. Provide complete SQL example with customer data

Explain with step-by-step implementation.

Guided Practice (60 min):

📝 How to approach Guided Practice:

  • Follow step-by-step: Work through each exercise in order
  • Use provided resources: Reference the video tutorials and AI prompts if you get stuck
  • Verify your work: Check your results match expected outcomes
  • Take notes: Document key learnings and shortcuts you discover
  • Ask for help: Use AI assistants to clarify concepts you don't understand

Exercise 1: Build Cohort Analysis for Customer Retention

Goal: Create a cohort retention matrix showing what percentage of customers return in subsequent months.

Step-by-Step Instructions:

  1. Create sample customer orders data:
    CREATE TABLE CustomerOrders (
        OrderID INT,
        CustomerID INT,
        OrderDate DATE,
        Amount DECIMAL(10,2)
    );
    
    -- Insert sample data spanning multiple months
    INSERT INTO CustomerOrders VALUES
    (1, 101, '2024-01-15', 100),
    (2, 102, '2024-01-20', 150),
    (3, 101, '2024-02-10', 120), -- Customer 101 returns
    (4, 103, '2024-02-05', 200),
    (5, 102, '2024-03-15', 130), -- Customer 102 returns
    -- Add more data...
  2. Identify each customer's first purchase month (cohort):
    WITH FirstPurchase AS (
        SELECT
            CustomerID,
            MIN(OrderDate) AS FirstOrderDate,
            DATE_TRUNC('month', MIN(OrderDate)) AS CohortMonth
        FROM CustomerOrders
        GROUP BY CustomerID
    )
  3. Calculate months since first purchase for each order:
    WITH FirstPurchase AS (
        SELECT CustomerID, MIN(OrderDate) AS FirstOrderDate,
               DATE_TRUNC('month', MIN(OrderDate)) AS CohortMonth
        FROM CustomerOrders
        GROUP BY CustomerID
    ),
    CohortData AS (
        SELECT
            fp.CohortMonth,
            co.CustomerID,
            DATE_TRUNC('month', co.OrderDate) AS OrderMonth,
            DATEDIFF(month, fp.CohortMonth, DATE_TRUNC('month', co.OrderDate)) AS MonthNumber
        FROM CustomerOrders co
        JOIN FirstPurchase fp ON co.CustomerID = fp.CustomerID
    )
  4. Create retention matrix:
    SELECT
        CohortMonth,
        COUNT(DISTINCT CASE WHEN MonthNumber = 0 THEN CustomerID END) AS Month_0,
        COUNT(DISTINCT CASE WHEN MonthNumber = 1 THEN CustomerID END) AS Month_1,
        COUNT(DISTINCT CASE WHEN MonthNumber = 2 THEN CustomerID END) AS Month_2,
        COUNT(DISTINCT CASE WHEN MonthNumber = 3 THEN CustomerID END) AS Month_3,
        COUNT(DISTINCT CASE WHEN MonthNumber = 4 THEN CustomerID END) AS Month_4
    FROM CohortData
    GROUP BY CohortMonth
    ORDER BY CohortMonth;
  5. Calculate retention percentages:
    SELECT
        CohortMonth,
        Month_0 AS Cohort_Size,
        ROUND(100.0 * Month_1 / Month_0, 1) AS Month_1_Pct,
        ROUND(100.0 * Month_2 / Month_0, 1) AS Month_2_Pct,
        ROUND(100.0 * Month_3 / Month_0, 1) AS Month_3_Pct
    FROM (previous query) AS CohortCounts;

💡 Tips:

  • Cohort = acquisition month: Groups customers by when they first purchased
  • DATEDIFF calculates month gaps: How many months since first purchase
  • CASE WHEN for pivoting: Creates columns for each month
  • Visualize in Excel: Export to Excel, create heat map with conditional formatting
  • Look for patterns: Does retention improve over time? Which cohorts are strongest?

✅ Expected Result: Cohort retention matrix showing customer return rates by month, revealing retention patterns

🔄 Connection to Previous Learning: Uses CTEs (Day 16), date functions (Day 12), and CASE statements (Day 18)

⚠️ SQLite Syntax Note: The query below is adapted for SQLite. We use strftime('%Y-%m-01', OrderDate) instead of DATE_TRUNC and calculate months mathematically.

Exercise 2: Calculate RFM (Recency, Frequency, Monetary) Scores

Goal: Score customers based on how recently they purchased, how often, and how much they spend.

Step-by-Step Instructions:

  1. Calculate RFM metrics for each customer:
    WITH RFM_Calc AS (
        SELECT
            CustomerID,
            DATEDIFF(day, MAX(OrderDate), GETDATE()) AS Recency_Days,
            COUNT(DISTINCT OrderID) AS Frequency,
            SUM(Amount) AS Monetary_Value
        FROM CustomerOrders
        GROUP BY CustomerID
    )
  2. Score customers using NTILE (divide into 5 quintiles):
    WITH RFM_Calc AS (
        -- Previous query
    ),
    RFM_Scores AS (
        SELECT
            CustomerID,
            Recency_Days,
            Frequency,
            Monetary_Value,
            NTILE(5) OVER (ORDER BY Recency_Days ASC) AS R_Score,  -- Lower recency = better
            NTILE(5) OVER (ORDER BY Frequency DESC) AS F_Score,    -- Higher frequency = better
            NTILE(5) OVER (ORDER BY Monetary_Value DESC) AS M_Score -- Higher monetary = better
        FROM RFM_Calc
    )
  3. Create combined RFM score:
    SELECT
        CustomerID,
        Recency_Days,
        Frequency,
        Monetary_Value,
        R_Score,
        F_Score,
        M_Score,
        CAST(R_Score AS VARCHAR) + CAST(F_Score AS VARCHAR) + CAST(M_Score AS VARCHAR) AS RFM_Score,
        (R_Score + F_Score + M_Score) AS Total_Score
    FROM RFM_Scores
    ORDER BY Total_Score DESC;
  4. Interpret scores: - RFM Score "555" = Best customers (recent, frequent, high-value) - RFM Score "111" = At-risk customers (not recent, infrequent, low-value) - RFM Score "511" = Recent buyers but low frequency/value (growth potential)

💡 Tips:

  • NTILE(5) creates quintiles: Divides customers into 5 equal groups (1=worst, 5=best)
  • Recency inverted: Lower days = more recent = better score
  • Adjust scoring: Use NTILE(10) for deciles, or manual CASE for custom ranges
  • Focus on high-value segments: RFM 4-5-5, 5-4-5, 5-5-4 are your VIP customers
  • Automate campaigns: Use RFM scores to trigger targeted marketing

✅ Expected Result: Each customer scored 1-5 on Recency, Frequency, and Monetary value for targeted marketing

🔄 Connection to Previous Learning: Uses window functions NTILE (Day 12) and aggregations (Day 8)

Exercise 3: Create Customer Segments Based on RFM

Goal: Classify customers into actionable segments like Champions, Loyal, At Risk, and Lost.

Step-by-Step Instructions:

  1. Build on RFM scores from Exercise 2
  2. Create segment classifications using CASE:
    WITH RFM_Scores AS (
        -- Use query from Exercise 2
    ),
    CustomerSegments AS (
        SELECT
            CustomerID,
            R_Score,
            F_Score,
            M_Score,
            CASE
                WHEN R_Score >= 4 AND F_Score >= 4 AND M_Score >= 4 THEN 'Champions'
                WHEN R_Score >= 3 AND F_Score >= 3 AND M_Score >= 3 THEN 'Loyal Customers'
                WHEN R_Score >= 4 AND F_Score <= 2 THEN 'New Customers'
                WHEN R_Score >= 3 AND F_Score <= 2 AND M_Score <= 2 THEN 'Promising'
                WHEN R_Score <= 2 AND F_Score >= 3 THEN 'At Risk'
                WHEN R_Score <= 2 AND F_Score <= 2 AND M_Score >= 3 THEN 'Can't Lose Them'
                WHEN R_Score <= 2 AND F_Score <= 2 THEN 'Lost Customers'
                ELSE 'Others'
            END AS Customer_Segment,
            Recency_Days,
            Frequency,
            Monetary_Value
        FROM RFM_Scores
    )
  3. Analyze segment distribution:
    SELECT
        Customer_Segment,
        COUNT(*) AS Customer_Count,
        ROUND(AVG(Recency_Days), 0) AS Avg_Recency,
        ROUND(AVG(Frequency), 1) AS Avg_Frequency,
        ROUND(AVG(Monetary_Value), 2) AS Avg_Monetary,
        SUM(Monetary_Value) AS Total_Revenue,
        ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 1) AS Pct_of_Customers,
        ROUND(100.0 * SUM(Monetary_Value) / SUM(SUM(Monetary_Value)) OVER(), 1) AS Pct_of_Revenue
    FROM CustomerSegments
    GROUP BY Customer_Segment
    ORDER BY Total_Revenue DESC;
  4. Create actionable recommendations:
    SELECT
        Customer_Segment,
        COUNT(*) AS Customer_Count,
        CASE Customer_Segment
            WHEN 'Champions' THEN 'Reward with VIP benefits, early access to new products'
            WHEN 'Loyal Customers' THEN 'Upsell premium products, loyalty program'
            WHEN 'At Risk' THEN 'Win-back campaign with personalized offers'
            WHEN 'Lost Customers' THEN 'Re-engagement campaign, survey for feedback'
            WHEN 'New Customers' THEN 'Onboarding series, educate about product range'
            ELSE 'Standard marketing'
        END AS Recommended_Action
    FROM CustomerSegments
    GROUP BY Customer_Segment;

💡 Tips:

  • Customize segments: Adjust thresholds based on your business model
  • Champions are gold: Focus retention efforts here - they drive revenue
  • "Can't Lose Them": High spenders who haven't bought recently - urgent action needed
  • Automate tagging: Save segment as customer attribute for marketing automation
  • Track segment movement: Monitor customers moving between segments monthly

✅ Expected Result: Clear customer segments with specific marketing actions for each group

🔄 Connection to Previous Learning: Uses complex CASE logic (Day 18) and window functions for percentages

Exercise 4: Calculate Customer Lifetime Value (CLV)

Goal: Estimate the total value a customer will bring over their entire relationship with your business.

Step-by-Step Instructions:

  1. Calculate average order metrics:
    WITH CustomerMetrics AS (
        SELECT
            CustomerID,
            COUNT(DISTINCT OrderID) AS Total_Orders,
            SUM(Amount) AS Total_Revenue,
            AVG(Amount) AS Avg_Order_Value,
            DATEDIFF(day, MIN(OrderDate), MAX(OrderDate)) AS Customer_Lifespan_Days,
            DATEDIFF(day, MIN(OrderDate), MAX(OrderDate)) * 1.0 / NULLIF(COUNT(DISTINCT OrderID) - 1, 0) AS Avg_Days_Between_Orders
        FROM CustomerOrders
        GROUP BY CustomerID
        HAVING COUNT(DISTINCT OrderID) >= 2  -- Need at least 2 orders to calculate frequency
    )
  2. Calculate purchase frequency per year:
    SELECT
        CustomerID,
        Total_Orders,
        Total_Revenue,
        Avg_Order_Value,
        Customer_Lifespan_Days,
        Avg_Days_Between_Orders,
        365.0 / Avg_Days_Between_Orders AS Purchases_Per_Year
    FROM CustomerMetrics;
  3. Estimate Customer Lifetime Value:
    WITH CustomerMetrics AS (
        -- Previous query
    ),
    CLV_Calculation AS (
        SELECT
            CustomerID,
            Avg_Order_Value,
            365.0 / Avg_Days_Between_Orders AS Purchases_Per_Year,
            3.0 AS Estimated_Lifetime_Years,  -- Assumption: average customer stays 3 years
            0.20 AS Profit_Margin,            -- Assumption: 20% profit margin
    
            -- CLV Formula: AOV × Purchase Frequency × Lifetime × Profit Margin
            Avg_Order_Value *
            (365.0 / Avg_Days_Between_Orders) *
            3.0 *
            0.20 AS Customer_Lifetime_Value
        FROM CustomerMetrics
    )
    SELECT
        CustomerID,
        ROUND(Customer_Lifetime_Value, 2) AS CLV,
        CASE
            WHEN Customer_Lifetime_Value > 1000 THEN 'High Value'
            WHEN Customer_Lifetime_Value > 500 THEN 'Medium Value'
            ELSE 'Low Value'
        END AS Value_Segment
    FROM CLV_Calculation
    ORDER BY Customer_Lifetime_Value DESC;
  4. Compare CLV to acquisition cost:
    SELECT
        Value_Segment,
        COUNT(*) AS Customer_Count,
        ROUND(AVG(CLV), 2) AS Avg_CLV,
        50.00 AS Avg_Acquisition_Cost,  -- Your actual CAC
        ROUND(AVG(CLV) / 50.00, 2) AS CLV_to_CAC_Ratio
    FROM (previous query)
    GROUP BY Value_Segment;
    Ideal ratio: CLV should be 3x+ acquisition cost

💡 Tips:

  • CLV is an estimate: Based on historical behavior, not guaranteed future
  • Adjust assumptions: Lifetime years and profit margin vary by industry
  • Focus acquisition on high-CLV segments: Know which customer types are most valuable
  • Monitor CLV trends: Track monthly to see if value is increasing or decreasing
  • Use in budgeting: Set max acquisition cost based on CLV (aim for 1:3 ratio)

✅ Expected Result: Estimated lifetime value for each customer, guiding acquisition and retention budgets

🔄 Connection to Previous Learning: Complex calculations building on aggregations and date functions

Exercise 5: Build Complete Retention Dashboard Query

Goal: Create one comprehensive query that provides all key retention metrics for executive reporting.

Step-by-Step Instructions:

  1. Combine all metrics into one dashboard query:
    -- Overall Retention Metrics Dashboard
    WITH CustomerStats AS (
        SELECT
            COUNT(DISTINCT CustomerID) AS Total_Customers,
            COUNT(DISTINCT CASE WHEN OrderDate >= DATEADD(month, -3, GETDATE()) THEN CustomerID END) AS Active_Last_3_Months,
            COUNT(DISTINCT CASE WHEN OrderDate >= DATEADD(month, -1, GETDATE()) THEN CustomerID END) AS Active_Last_Month
        FROM CustomerOrders
    ),
    ChurnMetrics AS (
        SELECT
            COUNT(DISTINCT CustomerID) AS Churned_Customers
        FROM CustomerOrders
        WHERE CustomerID NOT IN (
            SELECT DISTINCT CustomerID
            FROM CustomerOrders
            WHERE OrderDate >= DATEADD(month, -6, GETDATE())
        )
    ),
    RFM_Summary AS (
        SELECT
            COUNT(DISTINCT CASE WHEN R_Score >= 4 AND F_Score >= 4 THEN CustomerID END) AS Champions_Count,
            COUNT(DISTINCT CASE WHEN R_Score <= 2 AND F_Score <= 2 THEN CustomerID END) AS At_Risk_Count
        FROM (
            SELECT
                CustomerID,
                NTILE(5) OVER (ORDER BY MAX(OrderDate) DESC) AS R_Score,
                NTILE(5) OVER (ORDER BY COUNT(*) DESC) AS F_Score
            FROM CustomerOrders
            GROUP BY CustomerID
        ) rfm
    ),
    RevenueMetrics AS (
        SELECT
            AVG(Amount) AS Avg_Order_Value,
            SUM(Amount) AS Total_Revenue,
            SUM(CASE WHEN OrderDate >= DATEADD(month, -1, GETDATE()) THEN Amount END) AS Revenue_Last_Month
        FROM CustomerOrders
    )
    
    -- Final Dashboard
    SELECT
        'Customer Metrics' AS Category,
        cs.Total_Customers,
        cs.Active_Last_3_Months,
        ROUND(100.0 * cs.Active_Last_3_Months / cs.Total_Customers, 1) AS Active_Rate_Pct,
        cm.Churned_Customers,
        ROUND(100.0 * cm.Churned_Customers / cs.Total_Customers, 1) AS Churn_Rate_Pct,
        rfm.Champions_Count,
        rfm.At_Risk_Count,
        ROUND(rm.Avg_Order_Value, 2) AS Avg_Order_Value,
        ROUND(rm.Total_Revenue, 2) AS Total_Revenue,
        ROUND(rm.Revenue_Last_Month, 2) AS Revenue_Last_Month
    FROM CustomerStats cs
    CROSS JOIN ChurnMetrics cm
    CROSS JOIN RFM_Summary rfm
    CROSS JOIN RevenueMetrics rm;
  2. Create monthly trend view:
    SELECT
        DATE_TRUNC('month', OrderDate) AS Month,
        COUNT(DISTINCT CustomerID) AS Active_Customers,
        COUNT(DISTINCT OrderID) AS Total_Orders,
        SUM(Amount) AS Revenue,
        ROUND(AVG(Amount), 2) AS Avg_Order_Value
    FROM CustomerOrders
    WHERE OrderDate >= DATEADD(month, -12, GETDATE())
    GROUP BY DATE_TRUNC('month', OrderDate)
    ORDER BY Month;
  3. Export to Excel and visualize: - Import both queries into Excel Power Query - Create KPI cards for key metrics - Line chart for monthly trends - Bar chart comparing segments

💡 Tips:

  • CROSS JOIN for single-row summary: Combines metrics from different CTEs
  • Dashboard queries are expensive: Consider creating materialized view or caching
  • Schedule daily refresh: Automate this query to run daily for up-to-date metrics
  • Add year-over-year comparisons: Include same metrics from last year for trend analysis
  • Alert on thresholds: Set up alerts when churn rate exceeds acceptable level

✅ Expected Result: Single comprehensive query providing all key retention and engagement metrics for leadership

🔄 Connection to Previous Learning: Culmination of all advanced SQL techniques (CTEs, window functions, aggregations)

Independent Practice (60 min):

💡 Hints for Independent Practice:

  • Start simple: Begin with basic variations before attempting complex modifications
  • Break it down: Divide complex problems into smaller, manageable steps
  • Test incrementally: Verify each step works before moving to the next
  • Use documentation: Refer to official docs or tutorials when needed
  • Experiment: Try different approaches to find what works best

Apply the concepts learned to different datasets and scenarios.

Daily Assignment (60 min):

Assignment:

Expected Outcomes:


📝 Day 27 Quick Quiz - Test Your Knowledge!

Q1. As you near the end of the program, what does Day 27 emphasize?

Q2. How should you approach Day 27 exercises?

Q3. What makes Day 27 content valuable for your career?

Q4. By Day 27, what should you be comfortable with?

Q5. What's the best mindset for Day 27?

Have feedback or questions?