DAY 27: SQL - Advanced Analytics (Cohort, Retention, RFM Analysis)
Learning Objectives: - Perform cohort analysis - Calculate retention rates - Conduct RFM analysis - Build customer segmentation - Apply advanced analytical techniques
Topics Covered (60 min):
1. Cohort Analysis - Group customers by acquisition date - Track behavior over time by cohort - Example: Month 0, Month 1, Month 2 retention - Use DATEDIFF and CASE for cohort buckets - Pivot results for cohort matrix
2. Retention Analysis - Calculate repeat purchase rate - Monthly/quarterly retention cohorts - Churn rate calculation - Retention curve visualization - Identify patterns in customer lifecycle
3. RFM Analysis - Recency: Days since last purchase - Frequency: Number of purchases - Monetary: Total spending - Score each dimension (1-5 scale) - Combine for customer segments - Example: 555 = Champions, 111 = Lost
4. Customer Segmentation - Use NTILE for percentile-based scoring - Create meaningful segments - Action strategies per segment - Lifetime value estimation - Predictive scoring
Video Resources:
-
Excel Macros & VBA for Beginners (1:14)
https://www.youtube.com/watch?v=m_FgV7i7X3I -
Excel Macros & VBA - Tutorial for Beginners (0:29)
https://www.youtube.com/watch?v=IJQHMFLXk_c -
Excel VBA Macros - Beginner to PRO Masterclass (3:05)
https://www.youtube.com/watch?v=7hd1Nn4cNyI
AI Learning Prompts (Concept Understanding Only):
Understanding Cohort Analysis:
I'm learning cohort and retention analysis. Please explain:
1. What is cohort analysis and why is it useful?
2. How do I group customers by signup month and track retention?
3. Show me how to calculate month-over-month retention
4. What is churn rate and how do I calculate it?
5. Provide SQL examples using customer order data
Use SQL Server syntax with clear examples.
Understanding RFM Analysis:
Help me understand RFM analysis:
1. What is RFM and what do Recency, Frequency, Monetary mean?
2. How do I calculate RFM scores (1-5 for each)?
3. Show me how to segment customers based on RFM
4. What actions should I take for each segment?
5. Provide complete SQL example with customer data
Explain with step-by-step implementation.
Guided Practice (60 min):
📝 How to approach Guided Practice:
- Follow step-by-step: Work through each exercise in order
- Use provided resources: Reference the video tutorials and AI prompts if you get stuck
- Verify your work: Check your results match expected outcomes
- Take notes: Document key learnings and shortcuts you discover
- Ask for help: Use AI assistants to clarify concepts you don't understand
Exercise 1: Build Cohort Analysis for Customer Retention
Goal: Create a cohort retention matrix showing what percentage of customers return in subsequent months.
Step-by-Step Instructions:
- Create sample customer orders data:
CREATE TABLE CustomerOrders ( OrderID INT, CustomerID INT, OrderDate DATE, Amount DECIMAL(10,2) ); -- Insert sample data spanning multiple months INSERT INTO CustomerOrders VALUES (1, 101, '2024-01-15', 100), (2, 102, '2024-01-20', 150), (3, 101, '2024-02-10', 120), -- Customer 101 returns (4, 103, '2024-02-05', 200), (5, 102, '2024-03-15', 130), -- Customer 102 returns -- Add more data... - Identify each customer's first purchase month (cohort):
WITH FirstPurchase AS ( SELECT CustomerID, MIN(OrderDate) AS FirstOrderDate, DATE_TRUNC('month', MIN(OrderDate)) AS CohortMonth FROM CustomerOrders GROUP BY CustomerID ) - Calculate months since first purchase for each order:
WITH FirstPurchase AS ( SELECT CustomerID, MIN(OrderDate) AS FirstOrderDate, DATE_TRUNC('month', MIN(OrderDate)) AS CohortMonth FROM CustomerOrders GROUP BY CustomerID ), CohortData AS ( SELECT fp.CohortMonth, co.CustomerID, DATE_TRUNC('month', co.OrderDate) AS OrderMonth, DATEDIFF(month, fp.CohortMonth, DATE_TRUNC('month', co.OrderDate)) AS MonthNumber FROM CustomerOrders co JOIN FirstPurchase fp ON co.CustomerID = fp.CustomerID ) - Create retention matrix:
SELECT CohortMonth, COUNT(DISTINCT CASE WHEN MonthNumber = 0 THEN CustomerID END) AS Month_0, COUNT(DISTINCT CASE WHEN MonthNumber = 1 THEN CustomerID END) AS Month_1, COUNT(DISTINCT CASE WHEN MonthNumber = 2 THEN CustomerID END) AS Month_2, COUNT(DISTINCT CASE WHEN MonthNumber = 3 THEN CustomerID END) AS Month_3, COUNT(DISTINCT CASE WHEN MonthNumber = 4 THEN CustomerID END) AS Month_4 FROM CohortData GROUP BY CohortMonth ORDER BY CohortMonth; - Calculate retention percentages:
SELECT CohortMonth, Month_0 AS Cohort_Size, ROUND(100.0 * Month_1 / Month_0, 1) AS Month_1_Pct, ROUND(100.0 * Month_2 / Month_0, 1) AS Month_2_Pct, ROUND(100.0 * Month_3 / Month_0, 1) AS Month_3_Pct FROM (previous query) AS CohortCounts;
💡 Tips:
- Cohort = acquisition month: Groups customers by when they first purchased
- DATEDIFF calculates month gaps: How many months since first purchase
- CASE WHEN for pivoting: Creates columns for each month
- Visualize in Excel: Export to Excel, create heat map with conditional formatting
- Look for patterns: Does retention improve over time? Which cohorts are strongest?
✅ Expected Result: Cohort retention matrix showing customer return rates by month, revealing retention patterns
🔄 Connection to Previous Learning: Uses CTEs (Day 16), date functions (Day 12), and CASE statements (Day 18)
strftime('%Y-%m-01', OrderDate) instead of DATE_TRUNC and calculate months
mathematically.
Exercise 2: Calculate RFM (Recency, Frequency, Monetary) Scores
Goal: Score customers based on how recently they purchased, how often, and how much they spend.
Step-by-Step Instructions:
- Calculate RFM metrics for each customer:
WITH RFM_Calc AS ( SELECT CustomerID, DATEDIFF(day, MAX(OrderDate), GETDATE()) AS Recency_Days, COUNT(DISTINCT OrderID) AS Frequency, SUM(Amount) AS Monetary_Value FROM CustomerOrders GROUP BY CustomerID ) - Score customers using NTILE (divide into 5 quintiles):
WITH RFM_Calc AS ( -- Previous query ), RFM_Scores AS ( SELECT CustomerID, Recency_Days, Frequency, Monetary_Value, NTILE(5) OVER (ORDER BY Recency_Days ASC) AS R_Score, -- Lower recency = better NTILE(5) OVER (ORDER BY Frequency DESC) AS F_Score, -- Higher frequency = better NTILE(5) OVER (ORDER BY Monetary_Value DESC) AS M_Score -- Higher monetary = better FROM RFM_Calc ) - Create combined RFM score:
SELECT CustomerID, Recency_Days, Frequency, Monetary_Value, R_Score, F_Score, M_Score, CAST(R_Score AS VARCHAR) + CAST(F_Score AS VARCHAR) + CAST(M_Score AS VARCHAR) AS RFM_Score, (R_Score + F_Score + M_Score) AS Total_Score FROM RFM_Scores ORDER BY Total_Score DESC; - Interpret scores: - RFM Score "555" = Best customers (recent, frequent, high-value) - RFM Score "111" = At-risk customers (not recent, infrequent, low-value) - RFM Score "511" = Recent buyers but low frequency/value (growth potential)
💡 Tips:
- NTILE(5) creates quintiles: Divides customers into 5 equal groups (1=worst, 5=best)
- Recency inverted: Lower days = more recent = better score
- Adjust scoring: Use NTILE(10) for deciles, or manual CASE for custom ranges
- Focus on high-value segments: RFM 4-5-5, 5-4-5, 5-5-4 are your VIP customers
- Automate campaigns: Use RFM scores to trigger targeted marketing
✅ Expected Result: Each customer scored 1-5 on Recency, Frequency, and Monetary value for targeted marketing
🔄 Connection to Previous Learning: Uses window functions NTILE (Day 12) and aggregations (Day 8)
Exercise 3: Create Customer Segments Based on RFM
Goal: Classify customers into actionable segments like Champions, Loyal, At Risk, and Lost.
Step-by-Step Instructions:
- Build on RFM scores from Exercise 2
- Create segment classifications using CASE:
WITH RFM_Scores AS ( -- Use query from Exercise 2 ), CustomerSegments AS ( SELECT CustomerID, R_Score, F_Score, M_Score, CASE WHEN R_Score >= 4 AND F_Score >= 4 AND M_Score >= 4 THEN 'Champions' WHEN R_Score >= 3 AND F_Score >= 3 AND M_Score >= 3 THEN 'Loyal Customers' WHEN R_Score >= 4 AND F_Score <= 2 THEN 'New Customers' WHEN R_Score >= 3 AND F_Score <= 2 AND M_Score <= 2 THEN 'Promising' WHEN R_Score <= 2 AND F_Score >= 3 THEN 'At Risk' WHEN R_Score <= 2 AND F_Score <= 2 AND M_Score >= 3 THEN 'Can't Lose Them' WHEN R_Score <= 2 AND F_Score <= 2 THEN 'Lost Customers' ELSE 'Others' END AS Customer_Segment, Recency_Days, Frequency, Monetary_Value FROM RFM_Scores ) - Analyze segment distribution:
SELECT Customer_Segment, COUNT(*) AS Customer_Count, ROUND(AVG(Recency_Days), 0) AS Avg_Recency, ROUND(AVG(Frequency), 1) AS Avg_Frequency, ROUND(AVG(Monetary_Value), 2) AS Avg_Monetary, SUM(Monetary_Value) AS Total_Revenue, ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 1) AS Pct_of_Customers, ROUND(100.0 * SUM(Monetary_Value) / SUM(SUM(Monetary_Value)) OVER(), 1) AS Pct_of_Revenue FROM CustomerSegments GROUP BY Customer_Segment ORDER BY Total_Revenue DESC; - Create actionable recommendations:
SELECT Customer_Segment, COUNT(*) AS Customer_Count, CASE Customer_Segment WHEN 'Champions' THEN 'Reward with VIP benefits, early access to new products' WHEN 'Loyal Customers' THEN 'Upsell premium products, loyalty program' WHEN 'At Risk' THEN 'Win-back campaign with personalized offers' WHEN 'Lost Customers' THEN 'Re-engagement campaign, survey for feedback' WHEN 'New Customers' THEN 'Onboarding series, educate about product range' ELSE 'Standard marketing' END AS Recommended_Action FROM CustomerSegments GROUP BY Customer_Segment;
💡 Tips:
- Customize segments: Adjust thresholds based on your business model
- Champions are gold: Focus retention efforts here - they drive revenue
- "Can't Lose Them": High spenders who haven't bought recently - urgent action needed
- Automate tagging: Save segment as customer attribute for marketing automation
- Track segment movement: Monitor customers moving between segments monthly
✅ Expected Result: Clear customer segments with specific marketing actions for each group
🔄 Connection to Previous Learning: Uses complex CASE logic (Day 18) and window functions for percentages
Exercise 4: Calculate Customer Lifetime Value (CLV)
Goal: Estimate the total value a customer will bring over their entire relationship with your business.
Step-by-Step Instructions:
- Calculate average order metrics:
WITH CustomerMetrics AS ( SELECT CustomerID, COUNT(DISTINCT OrderID) AS Total_Orders, SUM(Amount) AS Total_Revenue, AVG(Amount) AS Avg_Order_Value, DATEDIFF(day, MIN(OrderDate), MAX(OrderDate)) AS Customer_Lifespan_Days, DATEDIFF(day, MIN(OrderDate), MAX(OrderDate)) * 1.0 / NULLIF(COUNT(DISTINCT OrderID) - 1, 0) AS Avg_Days_Between_Orders FROM CustomerOrders GROUP BY CustomerID HAVING COUNT(DISTINCT OrderID) >= 2 -- Need at least 2 orders to calculate frequency ) - Calculate purchase frequency per year:
SELECT CustomerID, Total_Orders, Total_Revenue, Avg_Order_Value, Customer_Lifespan_Days, Avg_Days_Between_Orders, 365.0 / Avg_Days_Between_Orders AS Purchases_Per_Year FROM CustomerMetrics; - Estimate Customer Lifetime Value:
WITH CustomerMetrics AS ( -- Previous query ), CLV_Calculation AS ( SELECT CustomerID, Avg_Order_Value, 365.0 / Avg_Days_Between_Orders AS Purchases_Per_Year, 3.0 AS Estimated_Lifetime_Years, -- Assumption: average customer stays 3 years 0.20 AS Profit_Margin, -- Assumption: 20% profit margin -- CLV Formula: AOV × Purchase Frequency × Lifetime × Profit Margin Avg_Order_Value * (365.0 / Avg_Days_Between_Orders) * 3.0 * 0.20 AS Customer_Lifetime_Value FROM CustomerMetrics ) SELECT CustomerID, ROUND(Customer_Lifetime_Value, 2) AS CLV, CASE WHEN Customer_Lifetime_Value > 1000 THEN 'High Value' WHEN Customer_Lifetime_Value > 500 THEN 'Medium Value' ELSE 'Low Value' END AS Value_Segment FROM CLV_Calculation ORDER BY Customer_Lifetime_Value DESC; - Compare CLV to acquisition cost:
Ideal ratio: CLV should be 3x+ acquisition costSELECT Value_Segment, COUNT(*) AS Customer_Count, ROUND(AVG(CLV), 2) AS Avg_CLV, 50.00 AS Avg_Acquisition_Cost, -- Your actual CAC ROUND(AVG(CLV) / 50.00, 2) AS CLV_to_CAC_Ratio FROM (previous query) GROUP BY Value_Segment;
💡 Tips:
- CLV is an estimate: Based on historical behavior, not guaranteed future
- Adjust assumptions: Lifetime years and profit margin vary by industry
- Focus acquisition on high-CLV segments: Know which customer types are most valuable
- Monitor CLV trends: Track monthly to see if value is increasing or decreasing
- Use in budgeting: Set max acquisition cost based on CLV (aim for 1:3 ratio)
✅ Expected Result: Estimated lifetime value for each customer, guiding acquisition and retention budgets
🔄 Connection to Previous Learning: Complex calculations building on aggregations and date functions
Exercise 5: Build Complete Retention Dashboard Query
Goal: Create one comprehensive query that provides all key retention metrics for executive reporting.
Step-by-Step Instructions:
- Combine all metrics into one dashboard query:
-- Overall Retention Metrics Dashboard WITH CustomerStats AS ( SELECT COUNT(DISTINCT CustomerID) AS Total_Customers, COUNT(DISTINCT CASE WHEN OrderDate >= DATEADD(month, -3, GETDATE()) THEN CustomerID END) AS Active_Last_3_Months, COUNT(DISTINCT CASE WHEN OrderDate >= DATEADD(month, -1, GETDATE()) THEN CustomerID END) AS Active_Last_Month FROM CustomerOrders ), ChurnMetrics AS ( SELECT COUNT(DISTINCT CustomerID) AS Churned_Customers FROM CustomerOrders WHERE CustomerID NOT IN ( SELECT DISTINCT CustomerID FROM CustomerOrders WHERE OrderDate >= DATEADD(month, -6, GETDATE()) ) ), RFM_Summary AS ( SELECT COUNT(DISTINCT CASE WHEN R_Score >= 4 AND F_Score >= 4 THEN CustomerID END) AS Champions_Count, COUNT(DISTINCT CASE WHEN R_Score <= 2 AND F_Score <= 2 THEN CustomerID END) AS At_Risk_Count FROM ( SELECT CustomerID, NTILE(5) OVER (ORDER BY MAX(OrderDate) DESC) AS R_Score, NTILE(5) OVER (ORDER BY COUNT(*) DESC) AS F_Score FROM CustomerOrders GROUP BY CustomerID ) rfm ), RevenueMetrics AS ( SELECT AVG(Amount) AS Avg_Order_Value, SUM(Amount) AS Total_Revenue, SUM(CASE WHEN OrderDate >= DATEADD(month, -1, GETDATE()) THEN Amount END) AS Revenue_Last_Month FROM CustomerOrders ) -- Final Dashboard SELECT 'Customer Metrics' AS Category, cs.Total_Customers, cs.Active_Last_3_Months, ROUND(100.0 * cs.Active_Last_3_Months / cs.Total_Customers, 1) AS Active_Rate_Pct, cm.Churned_Customers, ROUND(100.0 * cm.Churned_Customers / cs.Total_Customers, 1) AS Churn_Rate_Pct, rfm.Champions_Count, rfm.At_Risk_Count, ROUND(rm.Avg_Order_Value, 2) AS Avg_Order_Value, ROUND(rm.Total_Revenue, 2) AS Total_Revenue, ROUND(rm.Revenue_Last_Month, 2) AS Revenue_Last_Month FROM CustomerStats cs CROSS JOIN ChurnMetrics cm CROSS JOIN RFM_Summary rfm CROSS JOIN RevenueMetrics rm; - Create monthly trend view:
SELECT DATE_TRUNC('month', OrderDate) AS Month, COUNT(DISTINCT CustomerID) AS Active_Customers, COUNT(DISTINCT OrderID) AS Total_Orders, SUM(Amount) AS Revenue, ROUND(AVG(Amount), 2) AS Avg_Order_Value FROM CustomerOrders WHERE OrderDate >= DATEADD(month, -12, GETDATE()) GROUP BY DATE_TRUNC('month', OrderDate) ORDER BY Month; - Export to Excel and visualize: - Import both queries into Excel Power Query - Create KPI cards for key metrics - Line chart for monthly trends - Bar chart comparing segments
💡 Tips:
- CROSS JOIN for single-row summary: Combines metrics from different CTEs
- Dashboard queries are expensive: Consider creating materialized view or caching
- Schedule daily refresh: Automate this query to run daily for up-to-date metrics
- Add year-over-year comparisons: Include same metrics from last year for trend analysis
- Alert on thresholds: Set up alerts when churn rate exceeds acceptable level
✅ Expected Result: Single comprehensive query providing all key retention and engagement metrics for leadership
🔄 Connection to Previous Learning: Culmination of all advanced SQL techniques (CTEs, window functions, aggregations)
Independent Practice (60 min):
💡 Hints for Independent Practice:
- Start simple: Begin with basic variations before attempting complex modifications
- Break it down: Divide complex problems into smaller, manageable steps
- Test incrementally: Verify each step works before moving to the next
- Use documentation: Refer to official docs or tutorials when needed
- Experiment: Try different approaches to find what works best
Apply the concepts learned to different datasets and scenarios.
Daily Assignment (60 min):
Assignment:
- Retention Metrics:Calculate overall retention rate, churn rate, average customer lifetime
- Segmentation:Create segments (Champions, Loyal, At Risk, Lost) based on RFM scores
- Recommendations:Provide action plan for each customer segment
Expected Outcomes:
- Perform cohort analysis
- Calculate retention rates
- Conduct RFM analysis
- Build customer segmentation
- Apply advanced analytical techniques
📝 Day 27 Quick Quiz - Test Your Knowledge!
Q1. As you near the end of the program, what does Day 27 emphasize?
Q2. How should you approach Day 27 exercises?
Q3. What makes Day 27 content valuable for your career?
Q4. By Day 27, what should you be comfortable with?
Q5. What's the best mindset for Day 27?