DAY 12: SQL - Date Functions & Window Functions Part 1 (ROW_NUMBER, RANK)

Guru Charan Pathalla

DAY 12: SQL - Date Functions & Window Functions Part 1 (ROW_NUMBER, RANK)

Learning Objectives: - Master SQL date and time functions - Understand window functions and OVER clause - Use ROW_NUMBER for ranking and pagination - Apply RANK and DENSE_RANK for competitive ranking

Topics Covered (60 min):

1. SQL Date Functions - GETDATE() / CURRENT_DATE - Current date/time - DATEADD(interval, number, date) - Add/subtract dates - DATEDIFF(interval, start_date, end_date) - Date difference - DATEPART(part, date) - Extract year, month, day, etc. - EOMONTH(date, months) - End of month calculation

2. Date Formatting & Extraction - YEAR(date), MONTH(date), DAY(date) - DATENAME(part, date) - Get month/day names - FORMAT(date, 'format') - Custom date formatting - Example: WHERE YEAR(OrderDate) = 2023 AND MONTH(OrderDate) = 12

3. Window Functions - ROW_NUMBER - Syntax: ROW_NUMBER() OVER (PARTITION BY col ORDER BY col) - Assigns unique sequential numbers within partitions - Use for: pagination, top N per group, duplicate detection - Example: Number orders per customer

4. Window Functions - RANK & DENSE_RANK - RANK() - Leaves gaps after ties (1, 2, 2, 4) - DENSE_RANK() - No gaps (1, 2, 2, 3) - Use PARTITION BY to rank within groups - Example: Rank salespeople by revenue per region

Video Resources:

  1. SQL Window Functions | Clearly Explained (7:16)
    https://www.youtube.com/watch?v=rIcB4zMYMas

  2. SQL Window Functions Basics (Visually Explained) (18:47)
    https://www.youtube.com/watch?v=o666k19mZwE

  3. SQL Window Function | RANK, ROW_NUMBER, LAG, LEAD (24:54)
    https://www.youtube.com/watch?v=Ww71knvhQ-s

AI Learning Prompts (Concept Understanding Only):

Understanding SQL Date Functions:

I'm learning SQL date functions. Please explain:

1. What is the difference between DATEADD and DATEDIFF?
2. How do I calculate "days since last order" for each customer?
3. Show me how to extract year, quarter, month from dates
4. How do I filter for "last 30 days" or "current month"?
5. Provide examples for calculating customer age from birthdate

Use SQL Server syntax with clear examples.

Understanding Window Functions:

Help me understand SQL window functions:

1. What does the OVER clause do?
2. Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK()
3. What is PARTITION BY and how does it work?
4. Show me how to find "top 3 products per category by sales"
5. How do I use ROW_NUMBER for pagination?
6. Provide examples using sales and customer data

Explain with simple examples suitable for beginners.

Guided Practice (60 min):

📝 How to approach Guided Practice:

  • Follow step-by-step: Work through each exercise in order
  • Use provided resources: Reference the video tutorials and AI prompts if you get stuck
  • Verify your work: Check your results match expected outcomes
  • Take notes: Document key learnings and shortcuts you discover
  • Ask for help: Use AI assistants to clarify concepts you don't understand

📊 Dataset: Superstore Sales with columns: OrderID, OrderDate, CustomerID, CustomerName, Region, Category, ProductName, Sales

Exercise 1: Calculate Days Since Last Order (DATEDIFF and MAX)

Goal: Learn to use DATEDIFF function with aggregates to calculate customer recency.

Step-by-Step Instructions:

  1. Open your SQL editor and connect to your database
  2. Understand the requirement: For each customer, find their most recent order date and calculate how many days ago that was
  3. Start with finding the last order date per customer:
    SELECT CustomerID,
           MAX(OrderDate) AS LastOrderDate
    FROM Orders
    GROUP BY CustomerID;
    Execute this to verify it shows the most recent order for each customer
  4. Now add the days calculation using DATEDIFF:
    SELECT CustomerID,
           MAX(OrderDate) AS LastOrderDate,
           DATEDIFF(day, MAX(OrderDate), GETDATE()) AS DaysSinceLastOrder
    FROM Orders
    GROUP BY CustomerID
    ORDER BY DaysSinceLastOrder DESC;
  5. Analyze the results: Sort by DaysSinceLastOrder DESC to see customers who haven't ordered in the longest time
  6. Add customer names for readability:
    SELECT CustomerID,
           MAX(CustomerName) AS CustomerName,
           MAX(OrderDate) AS LastOrderDate,
           DATEDIFF(day, MAX(OrderDate), GETDATE()) AS DaysSinceLastOrder
    FROM Orders
    GROUP BY CustomerID
    ORDER BY DaysSinceLastOrder DESC
    LIMIT 20;

💡 Tips:

  • DATEDIFF syntax: DATEDIFF(unit, start_date, end_date) - unit can be day, month, year, hour, etc.
  • GETDATE() vs CURRENT_DATE: Use GETDATE() for SQL Server, CURRENT_DATE for PostgreSQL, NOW() for MySQL
  • MAX with GROUP BY: MAX(OrderDate) finds the most recent date for each CustomerID group
  • Why MAX(CustomerName)?: Since CustomerName is same for all orders of a customer, MAX just picks that value
  • Database differences: PostgreSQL uses CURRENT_DATE - MAX(OrderDate), MySQL uses DATEDIFF(CURDATE(), MAX(OrderDate))
  • Business use case: This query identifies "at-risk" customers who haven't ordered recently

✅ Expected Result: A list of customers with their last order date and the number of days since that order (e.g., CustomerID: 101, LastOrder: 2024-10-15, DaysSince: 45)

🔄 Connection to Excel: Similar to Excel's TODAY()-MAX(dates) calculation from Day 11, but applied across groups!

Exercise 2: Extract Year, Quarter, Month from Dates (Date Part Functions)

Goal: Learn to extract date components for time-series analysis and reporting.

Step-by-Step Instructions:

  1. Start with extracting the year from OrderDate:
    SELECT OrderID,
           OrderDate,
           YEAR(OrderDate) AS OrderYear
    FROM Orders
    LIMIT 10;
    This shows how YEAR() extracts just the year part
  2. Extract multiple date parts:
    SELECT OrderID,
           OrderDate,
           YEAR(OrderDate) AS OrderYear,
           DATEPART(quarter, OrderDate) AS OrderQuarter,
           MONTH(OrderDate) AS OrderMonth,
           DAY(OrderDate) AS OrderDay
    FROM Orders
    LIMIT 10;
  3. Now aggregate sales by year and quarter:
    SELECT YEAR(OrderDate) AS OrderYear,
           DATEPART(quarter, OrderDate) AS OrderQuarter,
           SUM(Sales) AS TotalSales,
           COUNT(*) AS OrderCount
    FROM Orders
    GROUP BY YEAR(OrderDate), DATEPART(quarter, OrderDate)
    ORDER BY OrderYear, OrderQuarter;
  4. Add formatted quarter labels (Q1, Q2, Q3, Q4):
    SELECT YEAR(OrderDate) AS OrderYear,
           DATEPART(quarter, OrderDate) AS OrderQuarter,
           CONCAT('Q', DATEPART(quarter, OrderDate), ' ', YEAR(OrderDate)) AS QuarterLabel,
           SUM(Sales) AS TotalSales,
           COUNT(*) AS OrderCount
    FROM Orders
    GROUP BY YEAR(OrderDate), DATEPART(quarter, OrderDate)
    ORDER BY OrderYear, OrderQuarter;
    This creates labels like "Q1 2023", "Q2 2023", etc.
  5. Create a monthly sales summary:
    SELECT YEAR(OrderDate) AS OrderYear,
           MONTH(OrderDate) AS OrderMonth,
           DATENAME(month, OrderDate) AS MonthName,
           SUM(Sales) AS TotalSales
    FROM Orders
    GROUP BY YEAR(OrderDate), MONTH(OrderDate), DATENAME(month, OrderDate)
    ORDER BY OrderYear, OrderMonth;

💡 Tips:

  • YEAR vs DATEPART: YEAR(date) is shorthand for DATEPART(year, date) - both work the same
  • Quarter calculation: Q1 = Jan-Mar (1-3), Q2 = Apr-Jun (4-6), Q3 = Jul-Sep (7-9), Q4 = Oct-Dec (10-12)
  • DATENAME vs DATEPART: DATENAME returns text ("January"), DATEPART returns numbers (1)
  • GROUP BY requirement: In GROUP BY, you must repeat the date function - can't just use the alias
  • Database differences: MySQL uses EXTRACT(YEAR FROM date), PostgreSQL supports both syntaxes
  • Fiscal quarters: If your fiscal year starts in July, add CASE logic to adjust quarters

✅ Expected Result: Sales totals grouped by year and quarter (e.g., 2023 Q1: $45,230, 2023 Q2: $52,180, etc.)

Exercise 3: Filter for Recent Orders (Last 30/60/90 Days)

Goal: Learn to filter data using date ranges with DATEADD or INTERVAL.

Step-by-Step Instructions:

  1. Find orders from the last 30 days:
    -- SQL Server syntax
    SELECT OrderID, OrderDate, CustomerName, Sales
    FROM Orders
    WHERE OrderDate >= DATEADD(day, -30, GETDATE())
    ORDER BY OrderDate DESC;
  2. Alternative PostgreSQL/MySQL syntax:
    -- PostgreSQL
    SELECT OrderID, OrderDate, CustomerName, Sales
    FROM Orders
    WHERE OrderDate >= CURRENT_DATE - INTERVAL '30 days'
    ORDER BY OrderDate DESC;
    
    -- MySQL
    SELECT OrderID, OrderDate, CustomerName, Sales
    FROM Orders
    WHERE OrderDate >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
    ORDER BY OrderDate DESC;
  3. Compare different time periods (30, 60, 90 days):
    SELECT
        SUM(CASE WHEN OrderDate >= DATEADD(day, -30, GETDATE()) THEN Sales ELSE 0 END) AS Last30Days,
        SUM(CASE WHEN OrderDate >= DATEADD(day, -60, GETDATE()) THEN Sales ELSE 0 END) AS Last60Days,
        SUM(CASE WHEN OrderDate >= DATEADD(day, -90, GETDATE()) THEN Sales ELSE 0 END) AS Last90Days
    FROM Orders;
    This shows sales totals for each time period
  4. Find orders in the current month:
    SELECT OrderID, OrderDate, CustomerName, Sales
    FROM Orders
    WHERE YEAR(OrderDate) = YEAR(GETDATE())
      AND MONTH(OrderDate) = MONTH(GETDATE())
    ORDER BY OrderDate DESC;
  5. Filter for a specific quarter:
    SELECT OrderID, OrderDate, Sales
    FROM Orders
    WHERE YEAR(OrderDate) = 2024
      AND DATEPART(quarter, OrderDate) = 1  -- Q1 only
    ORDER BY OrderDate;

💡 Tips:

  • DATEADD parameters: DATEADD(unit, number, date) - use negative numbers to go back in time
  • Performance tip: Date filtering on indexed columns is fast - avoid wrapping column in functions like YEAR(OrderDate) = 2024
  • Better performance: Use WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01' instead of WHERE YEAR(OrderDate) = 2024
  • Current month start: Use DATEFROMPARTS(YEAR(GETDATE()), MONTH(GETDATE()), 1) for the first day of current month
  • Inclusive ranges: Be careful with date ranges - decide if you want >= or >, <= or <

✅ Expected Result: Orders filtered to specific date ranges, showing only recent activity

Exercise 4: Ranking Products with ROW_NUMBER (Introduction to Window Functions)

Goal: Learn your first window function - ROW_NUMBER() to rank products within categories.

Step-by-Step Instructions:

  1. First, calculate total sales per product (without ranking):
    SELECT Category,
           ProductName,
           SUM(Sales) AS TotalSales
    FROM Orders
    GROUP BY Category, ProductName
    ORDER BY Category, TotalSales DESC;
    This shows products grouped by category, sorted by sales
  2. Now add ROW_NUMBER to rank products within each category:
    SELECT Category,
           ProductName,
           SUM(Sales) AS TotalSales,
           ROW_NUMBER() OVER (PARTITION BY Category ORDER BY SUM(Sales) DESC) AS ProductRank
    FROM Orders
    GROUP BY Category, ProductName
    ORDER BY Category, ProductRank;
  3. Understand the OVER clause:
    • PARTITION BY Category - Creates separate ranking "groups" for each category
    • ORDER BY SUM(Sales) DESC - Within each category, rank by sales (highest first)
    • Result: #1, #2, #3... restarts for each category
  4. Filter to show only top 3 products per category:
    WITH RankedProducts AS (
        SELECT Category,
               ProductName,
               SUM(Sales) AS TotalSales,
               ROW_NUMBER() OVER (PARTITION BY Category ORDER BY SUM(Sales) DESC) AS ProductRank
        FROM Orders
        GROUP BY Category, ProductName
    )
    SELECT Category, ProductName, TotalSales, ProductRank
    FROM RankedProducts
    WHERE ProductRank <= 3
    ORDER BY Category, ProductRank;
    This uses a CTE (Common Table Expression) from Day 16 preview
  5. Alternative without CTE (using subquery):
    SELECT *
    FROM (
        SELECT Category,
               ProductName,
               SUM(Sales) AS TotalSales,
               ROW_NUMBER() OVER (PARTITION BY Category ORDER BY SUM(Sales) DESC) AS ProductRank
        FROM Orders
        GROUP BY Category, ProductName
    ) AS RankedProducts
    WHERE ProductRank <= 3
    ORDER BY Category, ProductRank;

💡 Tips:

  • Window functions don't reduce rows: Unlike GROUP BY alone, window functions add calculations without collapsing rows
  • PARTITION BY is optional: Without it, ranking applies to entire result set (no groups)
  • ORDER BY is required: Window functions need to know how to order data for ranking/numbering
  • Can't use WHERE with window functions: That's why we need CTE or subquery to filter by rank
  • ROW_NUMBER vs RANK: ROW_NUMBER always gives unique ranks (1, 2, 3, 4...), RANK allows ties (1, 2, 2, 4...)
  • Performance: Window functions are powerful but can be slower on very large datasets

✅ Expected Result: Products ranked 1-3 within each category (e.g., Electronics: #1 Laptop, #2 Tablet, #3 Phone; Furniture: #1 Chair, #2 Desk, #3 Table)

🔄 Connection to Excel: This is like adding a rank column in Excel Pivot Tables from Day 7!

Exercise 5: Top 3 Customers Per Region (Complex Window Function with CTE)

Goal: Apply window functions to find top performers in each group using CTE pattern.

Step-by-Step Instructions:

  1. First, calculate total sales per customer by region:
    SELECT Region,
           CustomerName,
           SUM(Sales) AS TotalSales
    FROM Orders
    GROUP BY Region, CustomerName
    ORDER BY Region, TotalSales DESC;
  2. Add ROW_NUMBER to rank customers within each region:
    SELECT Region,
           CustomerName,
           SUM(Sales) AS TotalSales,
           ROW_NUMBER() OVER (PARTITION BY Region ORDER BY SUM(Sales) DESC) AS CustomerRank
    FROM Orders
    GROUP BY Region, CustomerName
    ORDER BY Region, CustomerRank;
    Now each region has customers ranked 1, 2, 3, 4...
  3. Use CTE to filter for top 3 per region:
    WITH RankedCustomers AS (
        SELECT Region,
               CustomerName,
               SUM(Sales) AS TotalSales,
               ROW_NUMBER() OVER (PARTITION BY Region ORDER BY SUM(Sales) DESC) AS Rank
        FROM Orders
        GROUP BY Region, CustomerName
    )
    SELECT Region, CustomerName, TotalSales, Rank
    FROM RankedCustomers
    WHERE Rank <= 3
    ORDER BY Region, Rank;
  4. Enhance with additional metrics:
    WITH RankedCustomers AS (
        SELECT Region,
               CustomerName,
               SUM(Sales) AS TotalSales,
               COUNT(*) AS OrderCount,
               AVG(Sales) AS AvgOrderValue,
               ROW_NUMBER() OVER (PARTITION BY Region ORDER BY SUM(Sales) DESC) AS Rank
        FROM Orders
        GROUP BY Region, CustomerName
    )
    SELECT Region, CustomerName, TotalSales, OrderCount, AvgOrderValue, Rank
    FROM RankedCustomers
    WHERE Rank <= 3
    ORDER BY Region, Rank;
    This adds more context about why customers are top performers
  5. Try with RANK() instead of ROW_NUMBER():
    WITH RankedCustomers AS (
        SELECT Region,
               CustomerName,
               SUM(Sales) AS TotalSales,
               RANK() OVER (PARTITION BY Region ORDER BY SUM(Sales) DESC) AS Rank
        FROM Orders
        GROUP BY Region, CustomerName
    )
    SELECT *
    FROM RankedCustomers
    WHERE Rank <= 3
    ORDER BY Region, Rank;
    RANK() will show ties (e.g., if two customers have same sales, both get rank 2, next is rank 4)

💡 Tips:

  • CTE (Common Table Expression): WITH clause creates a temporary named result set - makes complex queries more readable
  • When to use CTEs: When you need to filter or reference window function results
  • ROW_NUMBER vs RANK vs DENSE_RANK: - ROW_NUMBER: 1, 2, 3, 4 (always unique) - RANK: 1, 2, 2, 4 (skips after tie) - DENSE_RANK: 1, 2, 2, 3 (no skip after tie)
  • Multiple PARTITION BY columns: You can partition by multiple columns: PARTITION BY Region, Category
  • Multiple ORDER BY columns: Break ties with: ORDER BY SUM(Sales) DESC, CustomerName ASC
  • Real-world use case: Regional sales reports, MVP identification, performance dashboards

✅ Expected Result: Top 3 customers in each region with their ranks (e.g., East: 1. John ($50K), 2. Sarah ($48K), 3. Mike ($45K); West: 1. Lisa ($55K), 2. Tom ($52K), 3. Amy ($49K))

🔄 Connection to Previous Learning: This combines GROUP BY from Day 8, subqueries from Day 10, and introduces window functions - powerful SQL!

🎮 Interactive SQL Playground 1: Days Since Last Order

Calculate days since last order using date functions. Try changing CURRENT_DATE to see different results!

🎮 Interactive SQL Playground 2: Monthly Sales Trends

Extract month/year from dates and aggregate sales. Try different date parts like YEAR or DAY!

🎮 Interactive SQL Playground 3: ROW_NUMBER for Ranking

Rank orders by sales using ROW_NUMBER. Try changing the ORDER BY to rank differently!

🎮 Interactive SQL Playground 4: RANK with Ties

Compare RANK vs ROW_NUMBER behavior with ties. Notice how RANK handles duplicate values!

🎮 Interactive SQL Playground 5: Top 3 Per Region with CTE

Find top performers per region using window functions and CTEs. Try changing the rank limit or adding more columns!

Independent Practice (60 min):

💡 Hints for Independent Practice:

  • Start simple: Begin with basic variations before attempting complex modifications
  • Break it down: Divide complex problems into smaller, manageable steps
  • Test incrementally: Verify each step works before moving to the next
  • Use documentation: Refer to official docs or tutorials when needed
  • Experiment: Try different approaches to find what works best

Challenge 1: Customer Recency Analysis - Find customers who haven't ordered in 90+ days - Calculate average days between orders per customer - Identify "at risk" customers (no order in 6 months) - Compare current month sales vs same month last year

Challenge 2: Product Performance Ranking - Rank products by sales within each category - Find the #1 product in each subcategory - Identify products in bottom 10% by sales - Number orders sequentially for each customer

Daily Assignment (60 min):

Assignment: Sales Performance Dashboard Query

Create a comprehensive SQL report with:

- Sales by year, quarter, month

- Year-over-year growth calculation

- Orders placed in last 30/60/90 days

- Top 5 products by sales (overall)

- Top 3 customers per region (use ROW_NUMBER)

- Rank categories by profitability

- Customers by recency (0-30, 31-60, 61-90, 91+ days)

- Use DATEDIFF to calculate days since last order

- Count customers in each recency bucket

Requirements: - Use at least 3 different date functions - Use ROW_NUMBER with PARTITION BY - Use RANK or DENSE_RANK for one analysis - Include comments explaining each query - Format dates in output (MM/DD/YYYY)

Expected Outcomes:


📝 Day 12 Quick Quiz - Test Your Knowledge!

Q1. What does ROW_NUMBER() do in SQL?

Q2. What's the difference between ROW_NUMBER() and RANK()?

Q3. What does PARTITION BY do in window functions?

Q4. Which SQL function extracts the year from a date?

Q5. What does DENSE_RANK() do differently than RANK()?

Have feedback or questions?