DAY 22: SQL - Advanced Aggregations (GROUPING SETS, CUBE, ROLLUP)
Learning Objectives: - Use GROUPING SETS for multiple aggregations - Create summary reports with ROLLUP - Generate cross-tabulations with CUBE - Understand subtotals and grand totals - Master multi-dimensional analysis
Topics Covered (60 min):
1. GROUPING SETS
- Multiple GROUP BY in single query
- Syntax: GROUP BY GROUPING SETS ((col1), (col2), (col1, col2))
- Replaces multiple UNION ALL queries
- Example: Sales by Region, by Product, by Both
2. ROLLUP
- Hierarchical subtotals
- Syntax: GROUP BY ROLLUP (col1, col2)
- Creates hierarchy: (col1, col2), (col1), ()
- Grand total row
- Example: Sales by Region → City with subtotals
3. CUBE
- All possible grouping combinations
- Syntax: GROUP BY CUBE (col1, col2)
- Creates: (col1, col2), (col1), (col2), ()
- Cross-dimensional analysis
- Example: Sales by Region AND Product (all combinations)
4. GROUPING and GROUPING_ID
- Identify NULL vs subtotal rows
- GROUPING(column) returns 1 for subtotal
- GROUPING_ID(col1, col2) binary identifier
- Use with CASE to label subtotals
Video Resources:
-
SQL Stored Procedure and Functions - Basic to Advance (6:14)
https://www.youtube.com/watch?v=B8wNclO2cSU -
Simple Explanation with Examples for Beginners | 2025 (0:00)
https://www.youtube.com/watch?v=FDX9ErnWRp8 -
SQL Stored Procedure (Visually Explained) (0:00)
https://www.youtube.com/watch?v=DX8I5SmB6jo
AI Learning Prompts (Concept Understanding Only):
Understanding GROUPING SETS:
I'm learning SQL advanced aggregations. Please explain:
1. What is GROUPING SETS and why use it instead of UNION ALL?
2. Show me how to get sales by Category, by Region, and Total in one query
3. What is the difference between GROUPING SETS, ROLLUP, and CUBE?
4. Provide examples using sales data with multiple dimensions
5. When should I use each method?
Use SQL Server syntax with clear examples.
Understanding ROLLUP and CUBE:
Help me understand ROLLUP and CUBE:
1. What does ROLLUP do and how does it create hierarchies?
2. Show me ROLLUP with Region → State → City
3. What is CUBE and when should I use it?
4. How do I identify subtotal rows with GROUPING()?
5. Provide practical examples for multi-dimensional sales reports
Explain with step-by-step examples.
Guided Practice (60 min):
📝 How to approach Guided Practice:
- Follow step-by-step: Work through each exercise in order
- Use provided resources: Reference the video tutorials and AI prompts if you get stuck
- Verify your work: Check your results match expected outcomes
- Take notes: Document key learnings and shortcuts you discover
- Ask for help: Use AI assistants to clarify concepts you don't understand
Exercise 1: Create Multi-Level Summary with GROUPING SETS
Goal: Learn to use GROUPING SETS to create multiple aggregation levels in a single query without using UNION ALL.
Step-by-Step Instructions:
- Open your SQL editor (SQL Server Management Studio, pgAdmin, or online SQL environment)
- Create a sample Sales table with data to practice:
CREATE TABLE Sales ( SaleID INT, Region VARCHAR(50), Category VARCHAR(50), Amount DECIMAL(10,2) ); INSERT INTO Sales VALUES (1, 'East', 'Electronics', 1500), (2, 'East', 'Furniture', 800), (3, 'West', 'Electronics', 2000), (4, 'West', 'Furniture', 1200), (5, 'East', 'Electronics', 1800), (6, 'West', 'Electronics', 2200); - First, understand what we want: Sales totals by Region only, by Category only, and by both combined
- Write the GROUPING SETS query:
SELECT Region, Category, SUM(Amount) AS TotalSales FROM Sales GROUP BY GROUPING SETS ( (Region), -- Sales by Region only (Category), -- Sales by Category only (Region, Category), -- Sales by both () -- Grand total ) ORDER BY Region, Category; - Run the query and observe how you get multiple aggregation levels in one result set
GROUPING SETS. The query below uses UNION ALL to simulate the result you
would get in SQL Server or PostgreSQL.
💡 Tips:
- GROUPING SETS replaces multiple queries: One query instead of 3-4 separate GROUP BY queries with UNION ALL
- NULL indicates aggregation level: NULL in a column means that column was aggregated out
- Order matters in GROUPING SETS: List the most detailed grouping first for better readability
- Performance benefit: Single table scan instead of multiple scans with UNION ALL
- Use empty () for grand total: The empty set gives you the overall total
✅ Expected Result: A result set showing sales aggregated by Region, by Category, by both, and a grand total - all in one query
🔄 Connection to Previous Learning: Similar to creating multiple pivot tables in Excel (Day 7), but in SQL
Exercise 2: Build Hierarchical Totals with ROLLUP
Goal: Master ROLLUP to create hierarchical subtotals automatically from left to right in the GROUP BY columns.
Step-by-Step Instructions:
- Expand the Sales table with more hierarchical data:
CREATE TABLE SalesHierarchy ( SaleID INT, Year INT, Quarter INT, Month INT, Amount DECIMAL(10,2) ); INSERT INTO SalesHierarchy VALUES (1, 2024, 1, 1, 5000), (2, 2024, 1, 2, 5500), (3, 2024, 1, 3, 6000), (4, 2024, 2, 4, 7000), (5, 2024, 2, 5, 7500), (6, 2024, 2, 6, 8000); - Understand the hierarchy: Year → Quarter → Month (from general to specific)
- Write the ROLLUP query:
SELECT Year, Quarter, Month, SUM(Amount) AS TotalSales FROM SalesHierarchy GROUP BY ROLLUP (Year, Quarter, Month) ORDER BY Year, Quarter, Month; - Analyze the results: You'll see: - Individual months - Subtotals for each quarter - Subtotals for each year - Grand total (all NULLs)
- Compare with manual approach: This one query replaces 4 separate GROUP BY queries!
ROLLUP. Below
is a UNION ALL simulation to demonstrate the hierarchical aggregation structure.
💡 Tips:
- ROLLUP creates right-to-left hierarchy: It removes columns from right to left for aggregation
- Column order is critical: Put the most general column first (Year), most specific last (Month)
- Subtotal rows have NULLs: Quarter subtotals show NULL for Month, Year subtotals show NULL for Quarter and Month
- Perfect for financial reports: Common in sales reports, budgets, and hierarchical data
- ROLLUP is a shorthand: It's equivalent to specific GROUPING SETS but more concise
✅ Expected Result: A hierarchical report showing totals at each level: monthly, quarterly, yearly, and grand total
🔄 Connection to Previous Learning: Similar to Excel subtotals feature (Day 2), but more powerful and automated
Exercise 3: Generate Cross-Tabulations with CUBE
Goal: Use CUBE to create all possible combinations of groupings for comprehensive multi-dimensional analysis.
Step-by-Step Instructions:
- Use the Sales table from Exercise 1 or create new data:
-- Reuse Sales table from Exercise 1 -- Or insert more diverse data - Understand what CUBE does: It creates ALL possible combinations of the specified columns
- Write the CUBE query:
SELECT Region, Category, SUM(Amount) AS TotalSales, COUNT(*) AS SalesCount FROM Sales GROUP BY CUBE (Region, Category) ORDER BY CASE WHEN Region IS NULL THEN 1 ELSE 0 END, Region, CASE WHEN Category IS NULL THEN 1 ELSE 0 END, Category; - Analyze the output - you'll get: - Sales by Region AND Category (both specified) - Sales by Region only (Category = NULL) - Sales by Category only (Region = NULL) - Grand total (both NULL)
- Count the combinations: With 2 columns, CUBE creates 2² = 4 grouping combinations
CUBE. Below
is a simulation showing all combinations.
💡 Tips:
- CUBE creates all combinations: For N columns, you get 2^N combinations (including grand total)
- More powerful than ROLLUP: ROLLUP is hierarchical, CUBE is comprehensive across all dimensions
- Perfect for pivot-style reports: When you need to analyze data from multiple angles
- Can be expensive: With many columns (5+), CUBE generates many rows and can impact performance
- Use with caution: For 3+ columns, consider if you need ALL combinations or just specific GROUPING SETS
✅ Expected Result: All possible aggregate combinations showing sales from every analytical angle
🔄 Connection to Previous Learning: Similar to Excel's multi-dimensional pivot tables (Day 7) showing row and column totals
Exercise 4: Identify Subtotal Rows with GROUPING() Function
Goal: Learn to use the GROUPING() function to distinguish between NULL values from data and NULL values from aggregation.
Step-by-Step Instructions:
- Start with a ROLLUP query from Exercise 2
- Understand the problem: How do we know if a NULL is a subtotal or actual NULL data?
- Add GROUPING() to identify aggregation NULLs:
SELECT Year, Quarter, Month, SUM(Amount) AS TotalSales, GROUPING(Year) AS IsYearTotal, GROUPING(Quarter) AS IsQuarterTotal, GROUPING(Month) AS IsMonthTotal FROM SalesHierarchy GROUP BY ROLLUP (Year, Quarter, Month) ORDER BY Year, Quarter, Month; - Interpret the GROUPING() results: - Returns 1 if the column is aggregated out (subtotal row) - Returns 0 if the column has an actual value
- Use GROUPING_ID for combined identifier:
The GroupLevel will be: - 0 (binary 000) = Detail row - 1 (binary 001) = Month total - 3 (binary 011) = Quarter total - 7 (binary 111) = Grand totalSELECT Year, Quarter, Month, SUM(Amount) AS TotalSales, GROUPING_ID(Year, Quarter, Month) AS GroupLevel FROM SalesHierarchy GROUP BY ROLLUP (Year, Quarter, Month) ORDER BY Year, Quarter, Month;
GROUPING_ID
logic. In a real environment, GROUPING_ID is calculated automatically by the database engine.
💡 Tips:
- GROUPING() returns 1 or 0: 1 means it's a subtotal, 0 means it's actual data
- Essential for labeling: Use with CASE statements to add descriptive labels to subtotal rows
- GROUPING_ID is more compact: Combines multiple GROUPING() calls into one binary number
- Binary representation: Each bit in GROUPING_ID represents one column's aggregation status
- Useful in WHERE clauses: Filter to show only certain aggregation levels
✅ Expected Result: Clear identification of which rows are detail data vs. various subtotal levels
🔄 Connection to Previous Learning: Similar to Excel's subtotal levels (Day 7) where you can expand/collapse different levels
Exercise 5: Create Labeled Summary Report with CASE and GROUPING()
Goal: Combine GROUPING SETS with CASE statements to create professional, labeled summary reports.
Step-by-Step Instructions:
- Use the Sales table from Exercise 1
- Goal: Create a report with descriptive labels instead of NULLs for subtotals
- Write the enhanced query with labels:
SELECT CASE WHEN GROUPING(Region) = 1 THEN '**ALL REGIONS**' ELSE ISNULL(Region, 'Unknown') END AS Region, CASE WHEN GROUPING(Category) = 1 THEN '**ALL CATEGORIES**' ELSE ISNULL(Category, 'Unknown') END AS Category, SUM(Amount) AS TotalSales, COUNT(*) AS TransactionCount, AVG(Amount) AS AvgSale, CASE WHEN GROUPING(Region) = 1 AND GROUPING(Category) = 1 THEN 'Grand Total' WHEN GROUPING(Region) = 1 THEN 'Region Subtotal' WHEN GROUPING(Category) = 1 THEN 'Category Subtotal' ELSE 'Detail' END AS RowType FROM Sales GROUP BY ROLLUP (Region, Category) ORDER BY GROUPING(Region), Region, GROUPING(Category), Category; - Run and review the report: Notice how subtotals are clearly labeled and easy to understand
- Enhance further with formatting:
SELECT CASE WHEN GROUPING(Region) = 1 THEN '**ALL REGIONS**' ELSE Region END AS Region, CASE WHEN GROUPING(Category) = 1 THEN '**ALL CATEGORIES**' ELSE Category END AS Category, FORMAT(SUM(Amount), 'C') AS TotalSales, COUNT(*) AS Transactions, CASE GROUPING_ID(Region, Category) WHEN 0 THEN '📊 Detail' WHEN 1 THEN '📈 Category Total' WHEN 2 THEN '📊 Region Total' WHEN 3 THEN '🎯 GRAND TOTAL' END AS Level FROM Sales GROUP BY ROLLUP (Region, Category) ORDER BY GROUPING(Region), Region, GROUPING(Category), Category;
💡 Tips:
- Always label subtotal rows: Makes reports much more readable for end users
- Use bold markers for totals: Convention like '**ALL REGIONS**' or '**TOTAL**' helps them stand out
- Combine multiple aggregates: Include SUM, COUNT, AVG for comprehensive insights
- Use GROUPING_ID in CASE: Create different labels/formatting for each aggregation level
- Format currency properly: Use FORMAT() function for currency, percentages, etc.
- Order matters for readability: Sort by GROUPING() first to group detail and summary rows
✅ Expected Result: A professional, clearly labeled summary report with multiple aggregation levels and descriptive labels
🔄 Connection to Previous Learning: Combines SQL aggregation with report formatting, similar to Excel's subtotal feature with custom labels (Day 7)
Independent Practice (60 min):
💡 Hints for Independent Practice:
- Start simple: Begin with basic variations before attempting complex modifications
- Break it down: Divide complex problems into smaller, manageable steps
- Test incrementally: Verify each step works before moving to the next
- Use documentation: Refer to official docs or tutorials when needed
- Experiment: Try different approaches to find what works best
Apply the concepts learned to different datasets and scenarios.
Daily Assignment (60 min):
Assignment:
- GROUPING SETS:Create report showing sales by (Category), (Region), (Category + Region), (Total)
- ROLLUP:Hierarchical summary by Year → Quarter → Month with subtotals
- CUBE:Cross-tabulation of sales by Region and Product Category (all combinations)
- Labeled Subtotals:Use GROUPING() with CASE to label rows as 'Category Total', 'Region Total', 'Grand Total'
- Performance Analysis:Compare performance of GROUPING SETS vs multiple queries
Expected Outcomes:
- Use GROUPING SETS for multiple aggregations
- Create summary reports with ROLLUP
- Generate cross-tabulations with CUBE
- Understand subtotals and grand totals
- Master multi-dimensional analysis
📝 Day 22 Quick Quiz - Test Your Knowledge!
Q1. What advanced concept is introduced in Day 22?
Q2. How can Day 22 skills improve your workflow?
Q3. What prerequisite knowledge does Day 22 assume?
Q4. What's a real-world application of Day 22 content?
Q5. What distinguishes Day 22 from earlier lessons?