DAY 22: SQL - Advanced Aggregations (GROUPING SETS, CUBE, ROLLUP)

Guru Charan Pathalla

DAY 22: SQL - Advanced Aggregations (GROUPING SETS, CUBE, ROLLUP)

Learning Objectives: - Use GROUPING SETS for multiple aggregations - Create summary reports with ROLLUP - Generate cross-tabulations with CUBE - Understand subtotals and grand totals - Master multi-dimensional analysis

Topics Covered (60 min):

1. GROUPING SETS - Multiple GROUP BY in single query - Syntax: GROUP BY GROUPING SETS ((col1), (col2), (col1, col2)) - Replaces multiple UNION ALL queries - Example: Sales by Region, by Product, by Both

2. ROLLUP - Hierarchical subtotals - Syntax: GROUP BY ROLLUP (col1, col2) - Creates hierarchy: (col1, col2), (col1), () - Grand total row - Example: Sales by Region → City with subtotals

3. CUBE - All possible grouping combinations - Syntax: GROUP BY CUBE (col1, col2) - Creates: (col1, col2), (col1), (col2), () - Cross-dimensional analysis - Example: Sales by Region AND Product (all combinations)

4. GROUPING and GROUPING_ID - Identify NULL vs subtotal rows - GROUPING(column) returns 1 for subtotal - GROUPING_ID(col1, col2) binary identifier - Use with CASE to label subtotals

Video Resources:

  1. SQL Stored Procedure and Functions - Basic to Advance (6:14)
    https://www.youtube.com/watch?v=B8wNclO2cSU

  2. Simple Explanation with Examples for Beginners | 2025 (0:00)
    https://www.youtube.com/watch?v=FDX9ErnWRp8

  3. SQL Stored Procedure (Visually Explained) (0:00)
    https://www.youtube.com/watch?v=DX8I5SmB6jo

AI Learning Prompts (Concept Understanding Only):

Understanding GROUPING SETS:

I'm learning SQL advanced aggregations. Please explain:

1. What is GROUPING SETS and why use it instead of UNION ALL?
2. Show me how to get sales by Category, by Region, and Total in one query
3. What is the difference between GROUPING SETS, ROLLUP, and CUBE?
4. Provide examples using sales data with multiple dimensions
5. When should I use each method?

Use SQL Server syntax with clear examples.

Understanding ROLLUP and CUBE:

Help me understand ROLLUP and CUBE:

1. What does ROLLUP do and how does it create hierarchies?
2. Show me ROLLUP with Region → State → City
3. What is CUBE and when should I use it?
4. How do I identify subtotal rows with GROUPING()?
5. Provide practical examples for multi-dimensional sales reports

Explain with step-by-step examples.

Guided Practice (60 min):

📝 How to approach Guided Practice:

  • Follow step-by-step: Work through each exercise in order
  • Use provided resources: Reference the video tutorials and AI prompts if you get stuck
  • Verify your work: Check your results match expected outcomes
  • Take notes: Document key learnings and shortcuts you discover
  • Ask for help: Use AI assistants to clarify concepts you don't understand

Exercise 1: Create Multi-Level Summary with GROUPING SETS

Goal: Learn to use GROUPING SETS to create multiple aggregation levels in a single query without using UNION ALL.

Step-by-Step Instructions:

  1. Open your SQL editor (SQL Server Management Studio, pgAdmin, or online SQL environment)
  2. Create a sample Sales table with data to practice:
    CREATE TABLE Sales (
        SaleID INT,
        Region VARCHAR(50),
        Category VARCHAR(50),
        Amount DECIMAL(10,2)
    );
    
    INSERT INTO Sales VALUES
    (1, 'East', 'Electronics', 1500),
    (2, 'East', 'Furniture', 800),
    (3, 'West', 'Electronics', 2000),
    (4, 'West', 'Furniture', 1200),
    (5, 'East', 'Electronics', 1800),
    (6, 'West', 'Electronics', 2200);
  3. First, understand what we want: Sales totals by Region only, by Category only, and by both combined
  4. Write the GROUPING SETS query:
    SELECT
        Region,
        Category,
        SUM(Amount) AS TotalSales
    FROM Sales
    GROUP BY GROUPING SETS (
        (Region),           -- Sales by Region only
        (Category),         -- Sales by Category only
        (Region, Category), -- Sales by both
        ()                  -- Grand total
    )
    ORDER BY Region, Category;
  5. Run the query and observe how you get multiple aggregation levels in one result set
⚠️ Note on SQLite Support: The browser-based SQL engine (SQLite) does not natively support GROUPING SETS. The query below uses UNION ALL to simulate the result you would get in SQL Server or PostgreSQL.

💡 Tips:

  • GROUPING SETS replaces multiple queries: One query instead of 3-4 separate GROUP BY queries with UNION ALL
  • NULL indicates aggregation level: NULL in a column means that column was aggregated out
  • Order matters in GROUPING SETS: List the most detailed grouping first for better readability
  • Performance benefit: Single table scan instead of multiple scans with UNION ALL
  • Use empty () for grand total: The empty set gives you the overall total

✅ Expected Result: A result set showing sales aggregated by Region, by Category, by both, and a grand total - all in one query

🔄 Connection to Previous Learning: Similar to creating multiple pivot tables in Excel (Day 7), but in SQL

Exercise 2: Build Hierarchical Totals with ROLLUP

Goal: Master ROLLUP to create hierarchical subtotals automatically from left to right in the GROUP BY columns.

Step-by-Step Instructions:

  1. Expand the Sales table with more hierarchical data:
    CREATE TABLE SalesHierarchy (
        SaleID INT,
        Year INT,
        Quarter INT,
        Month INT,
        Amount DECIMAL(10,2)
    );
    
    INSERT INTO SalesHierarchy VALUES
    (1, 2024, 1, 1, 5000),
    (2, 2024, 1, 2, 5500),
    (3, 2024, 1, 3, 6000),
    (4, 2024, 2, 4, 7000),
    (5, 2024, 2, 5, 7500),
    (6, 2024, 2, 6, 8000);
  2. Understand the hierarchy: Year → Quarter → Month (from general to specific)
  3. Write the ROLLUP query:
    SELECT
        Year,
        Quarter,
        Month,
        SUM(Amount) AS TotalSales
    FROM SalesHierarchy
    GROUP BY ROLLUP (Year, Quarter, Month)
    ORDER BY Year, Quarter, Month;
  4. Analyze the results: You'll see: - Individual months - Subtotals for each quarter - Subtotals for each year - Grand total (all NULLs)
  5. Compare with manual approach: This one query replaces 4 separate GROUP BY queries!
⚠️ Note on SQLite Support: Browser-based SQLite does not support ROLLUP. Below is a UNION ALL simulation to demonstrate the hierarchical aggregation structure.

💡 Tips:

  • ROLLUP creates right-to-left hierarchy: It removes columns from right to left for aggregation
  • Column order is critical: Put the most general column first (Year), most specific last (Month)
  • Subtotal rows have NULLs: Quarter subtotals show NULL for Month, Year subtotals show NULL for Quarter and Month
  • Perfect for financial reports: Common in sales reports, budgets, and hierarchical data
  • ROLLUP is a shorthand: It's equivalent to specific GROUPING SETS but more concise

✅ Expected Result: A hierarchical report showing totals at each level: monthly, quarterly, yearly, and grand total

🔄 Connection to Previous Learning: Similar to Excel subtotals feature (Day 2), but more powerful and automated

Exercise 3: Generate Cross-Tabulations with CUBE

Goal: Use CUBE to create all possible combinations of groupings for comprehensive multi-dimensional analysis.

Step-by-Step Instructions:

  1. Use the Sales table from Exercise 1 or create new data:
    -- Reuse Sales table from Exercise 1
    -- Or insert more diverse data
  2. Understand what CUBE does: It creates ALL possible combinations of the specified columns
  3. Write the CUBE query:
    SELECT
        Region,
        Category,
        SUM(Amount) AS TotalSales,
        COUNT(*) AS SalesCount
    FROM Sales
    GROUP BY CUBE (Region, Category)
    ORDER BY
        CASE WHEN Region IS NULL THEN 1 ELSE 0 END,
        Region,
        CASE WHEN Category IS NULL THEN 1 ELSE 0 END,
        Category;
  4. Analyze the output - you'll get: - Sales by Region AND Category (both specified) - Sales by Region only (Category = NULL) - Sales by Category only (Region = NULL) - Grand total (both NULL)
  5. Count the combinations: With 2 columns, CUBE creates 2² = 4 grouping combinations
⚠️ Note on SQLite Support: Browser-based SQLite does not support CUBE. Below is a simulation showing all combinations.

💡 Tips:

  • CUBE creates all combinations: For N columns, you get 2^N combinations (including grand total)
  • More powerful than ROLLUP: ROLLUP is hierarchical, CUBE is comprehensive across all dimensions
  • Perfect for pivot-style reports: When you need to analyze data from multiple angles
  • Can be expensive: With many columns (5+), CUBE generates many rows and can impact performance
  • Use with caution: For 3+ columns, consider if you need ALL combinations or just specific GROUPING SETS

✅ Expected Result: All possible aggregate combinations showing sales from every analytical angle

🔄 Connection to Previous Learning: Similar to Excel's multi-dimensional pivot tables (Day 7) showing row and column totals

Exercise 4: Identify Subtotal Rows with GROUPING() Function

Goal: Learn to use the GROUPING() function to distinguish between NULL values from data and NULL values from aggregation.

Step-by-Step Instructions:

  1. Start with a ROLLUP query from Exercise 2
  2. Understand the problem: How do we know if a NULL is a subtotal or actual NULL data?
  3. Add GROUPING() to identify aggregation NULLs:
    SELECT
        Year,
        Quarter,
        Month,
        SUM(Amount) AS TotalSales,
        GROUPING(Year) AS IsYearTotal,
        GROUPING(Quarter) AS IsQuarterTotal,
        GROUPING(Month) AS IsMonthTotal
    FROM SalesHierarchy
    GROUP BY ROLLUP (Year, Quarter, Month)
    ORDER BY Year, Quarter, Month;
  4. Interpret the GROUPING() results: - Returns 1 if the column is aggregated out (subtotal row) - Returns 0 if the column has an actual value
  5. Use GROUPING_ID for combined identifier:
    SELECT
        Year,
        Quarter,
        Month,
        SUM(Amount) AS TotalSales,
        GROUPING_ID(Year, Quarter, Month) AS GroupLevel
    FROM SalesHierarchy
    GROUP BY ROLLUP (Year, Quarter, Month)
    ORDER BY Year, Quarter, Month;
    The GroupLevel will be: - 0 (binary 000) = Detail row - 1 (binary 001) = Month total - 3 (binary 011) = Quarter total - 7 (binary 111) = Grand total
⚠️ Note: Since we are simulating ROLLUP/CUBE, we also simulate the GROUPING_ID logic. In a real environment, GROUPING_ID is calculated automatically by the database engine.

💡 Tips:

  • GROUPING() returns 1 or 0: 1 means it's a subtotal, 0 means it's actual data
  • Essential for labeling: Use with CASE statements to add descriptive labels to subtotal rows
  • GROUPING_ID is more compact: Combines multiple GROUPING() calls into one binary number
  • Binary representation: Each bit in GROUPING_ID represents one column's aggregation status
  • Useful in WHERE clauses: Filter to show only certain aggregation levels

✅ Expected Result: Clear identification of which rows are detail data vs. various subtotal levels

🔄 Connection to Previous Learning: Similar to Excel's subtotal levels (Day 7) where you can expand/collapse different levels

Exercise 5: Create Labeled Summary Report with CASE and GROUPING()

Goal: Combine GROUPING SETS with CASE statements to create professional, labeled summary reports.

Step-by-Step Instructions:

  1. Use the Sales table from Exercise 1
  2. Goal: Create a report with descriptive labels instead of NULLs for subtotals
  3. Write the enhanced query with labels:
    SELECT
        CASE
            WHEN GROUPING(Region) = 1 THEN '**ALL REGIONS**'
            ELSE ISNULL(Region, 'Unknown')
        END AS Region,
        CASE
            WHEN GROUPING(Category) = 1 THEN '**ALL CATEGORIES**'
            ELSE ISNULL(Category, 'Unknown')
        END AS Category,
        SUM(Amount) AS TotalSales,
        COUNT(*) AS TransactionCount,
        AVG(Amount) AS AvgSale,
        CASE
            WHEN GROUPING(Region) = 1 AND GROUPING(Category) = 1 THEN 'Grand Total'
            WHEN GROUPING(Region) = 1 THEN 'Region Subtotal'
            WHEN GROUPING(Category) = 1 THEN 'Category Subtotal'
            ELSE 'Detail'
        END AS RowType
    FROM Sales
    GROUP BY ROLLUP (Region, Category)
    ORDER BY
        GROUPING(Region),
        Region,
        GROUPING(Category),
        Category;
  4. Run and review the report: Notice how subtotals are clearly labeled and easy to understand
  5. Enhance further with formatting:
    SELECT
        CASE
            WHEN GROUPING(Region) = 1 THEN '**ALL REGIONS**'
            ELSE Region
        END AS Region,
        CASE
            WHEN GROUPING(Category) = 1 THEN '**ALL CATEGORIES**'
            ELSE Category
        END AS Category,
        FORMAT(SUM(Amount), 'C') AS TotalSales,
        COUNT(*) AS Transactions,
        CASE GROUPING_ID(Region, Category)
            WHEN 0 THEN '📊 Detail'
            WHEN 1 THEN '📈 Category Total'
            WHEN 2 THEN '📊 Region Total'
            WHEN 3 THEN '🎯 GRAND TOTAL'
        END AS Level
    FROM Sales
    GROUP BY ROLLUP (Region, Category)
    ORDER BY GROUPING(Region), Region, GROUPING(Category), Category;

💡 Tips:

  • Always label subtotal rows: Makes reports much more readable for end users
  • Use bold markers for totals: Convention like '**ALL REGIONS**' or '**TOTAL**' helps them stand out
  • Combine multiple aggregates: Include SUM, COUNT, AVG for comprehensive insights
  • Use GROUPING_ID in CASE: Create different labels/formatting for each aggregation level
  • Format currency properly: Use FORMAT() function for currency, percentages, etc.
  • Order matters for readability: Sort by GROUPING() first to group detail and summary rows

✅ Expected Result: A professional, clearly labeled summary report with multiple aggregation levels and descriptive labels

🔄 Connection to Previous Learning: Combines SQL aggregation with report formatting, similar to Excel's subtotal feature with custom labels (Day 7)

Independent Practice (60 min):

💡 Hints for Independent Practice:

  • Start simple: Begin with basic variations before attempting complex modifications
  • Break it down: Divide complex problems into smaller, manageable steps
  • Test incrementally: Verify each step works before moving to the next
  • Use documentation: Refer to official docs or tutorials when needed
  • Experiment: Try different approaches to find what works best

Apply the concepts learned to different datasets and scenarios.

Daily Assignment (60 min):

Assignment:

Expected Outcomes:


📝 Day 22 Quick Quiz - Test Your Knowledge!

Q1. What advanced concept is introduced in Day 22?

Q2. How can Day 22 skills improve your workflow?

Q3. What prerequisite knowledge does Day 22 assume?

Q4. What's a real-world application of Day 22 content?

Q5. What distinguishes Day 22 from earlier lessons?

Have feedback or questions?