DAY 8: SQL - GROUP BY & HAVING

Guru Charan Pathalla

DAY 8: SQL - GROUP BY & HAVING

Learning Objectives: - Master GROUP BY for data aggregation - Use HAVING for group filtering - Group by multiple columns - Combine GROUP BY with JOINs (Mirror Excel Pivot Tables)

Topics Covered (60 min):

1. GROUP BY (Mirror Excel Pivot Tables) - Syntax: SELECT Region, SUM(Sales) FROM Orders GROUP BY Region - Groups rows with same values - Must use with aggregate functions - Like Pivot Table’s Row area + Values area

2. Multiple Column Grouping - GROUP BY Region, Category - Creates groups for each unique combination - Like Pivot Table with multiple row fields

3. HAVING (Filter Groups) - HAVING SUM(Sales) > 10000 - Filters groups (after aggregation) - WHERE filters rows (before aggregation) - HAVING uses aggregate functions

4. WITH JOINs - Combine tables then group - Example: SELECT C.Region, SUM(O.Sales) FROM Orders O JOIN Customers C ON ... GROUP BY C.Region

Query Order:

SELECT columns
FROM table1
JOIN table2 ON condition
WHERE row_filter
GROUP BY columns
HAVING group_filter
ORDER BY columns

Video Resources:

  1. Subqueries & Common Table Expressions (CTEs) (28:38)
    https://www.youtube.com/watch?v=l3l_5i2QVIM

  2. Write Modular T-SQL Like a Pro (18:10)
    https://www.youtube.com/watch?v=O7qHUaLh28Q

  3. Advanced SQL Tutorial | Subqueries (22:01)
    https://www.youtube.com/watch?v=m1KcNV-Zhmc

AI Learning Prompts (Concept Understanding Only):

Understanding GROUP BY (Mirror Pivot Tables):

Yesterday I learned Excel Pivot Tables. Help me understand SQL GROUP BY:

1. How is GROUP BY similar to Excel Pivot Table's Row area?
2. What does GROUP BY do to the data?
3. What's the syntax: SELECT column, SUM(value) FROM table GROUP BY column?
4. Why do I need aggregate functions (SUM, COUNT, AVG) with GROUP BY?
5. Show me examples using Employee data (count employees by department, average salary by role)
6. What columns can I put in SELECT when using GROUP BY?

Explain the parallel between Pivot Tables and GROUP BY clearly.

Understanding HAVING Clause:

Explain SQL HAVING clause:

1. What is HAVING and why do I need it?
2. How is HAVING different from WHERE?
3. When do I use WHERE vs HAVING?
4. Show me the order: SELECT β†’ FROM β†’ WHERE β†’ GROUP BY β†’ HAVING β†’ ORDER BY
5. Give examples using Store data (find stores with sales > $10,000)
6. Can I use both WHERE and HAVING in the same query?

Use retail or school data for examples.

Understanding Multiple Grouping:

Explain grouping by multiple columns:

1. How do I GROUP BY more than one column?
2. What does GROUP BY col1, col2 create?
3. How is this like Excel Pivot Table with multiple row fields?
4. Show me examples: GROUP BY department, job_title (using HR data)
5. Does the order of columns in GROUP BY matter?

Use organizational or survey data for examples.

Understanding GROUP BY with JOINs:

Explain combining JOINs and GROUP BY:

1. How do I use GROUP BY after joining tables?
2. What's the typical pattern: FROM table1 JOIN table2 ... GROUP BY?
3. Can I GROUP BY columns from different tables?
4. Show me examples: Join Authors and Books, count books per author
5. Which comes first: JOIN or GROUP BY?

Use library or publishing data for examples.

Guided Practice (60 min):

πŸ“ How to approach Guided Practice:

  • Follow step-by-step: Work through each exercise in order
  • Use provided resources: Reference the video tutorials and AI prompts if you get stuck
  • Verify your work: Check your results match expected outcomes
  • Take notes: Document key learnings and shortcuts you discover
  • Ask for help: Use AI assistants to clarify concepts you don't understand

Using Superstore Sales dataset - SQL Practice

πŸ“ Setup Instructions:

  1. Open SQLite Browser with your Superstore database
  2. Today you'll learn SQL aggregation - the equivalent of Excel Pivot Tables!
  3. GROUP BY groups rows together; HAVING filters grouped results

πŸ’‘ GROUP BY = SQL's Pivot Table: Remember Day 7 (Excel Pivot Tables)? GROUP BY does the same thing in SQL - summarize data by categories! It's one of the most powerful SQL features.

Exercise 1: GROUP BY Basics - Total Sales by Region

Step-by-Step Instructions:

  1. Type this query:
    SELECT
        Region,
        SUM(Sales) AS TotalSales
    FROM Orders
    GROUP BY Region
    ORDER BY TotalSales DESC;
  2. Click Execute (▢️)
  3. You'll see total sales for each region!

πŸ” Understanding GROUP BY:

  • GROUP BY Region = Group all rows with the same Region together
  • SUM(Sales) = Calculate total sales for each group
  • Result: One row per region with aggregated data
  • This is like dragging Region to Rows and Sales to Values in a Pivot Table!

πŸ’‘ Tips:

  • Columns in SELECT must either be in GROUP BY or use aggregate functions
  • Common aggregates: SUM, AVG, COUNT, MAX, MIN
  • GROUP BY comes after WHERE, before ORDER BY

βœ… Expected Result: 4 rows (one per region) with total sales for each

Exercise 2: Multiple Aggregates - Comprehensive Summary

Step-by-Step Instructions:

  1. Calculate multiple metrics for each category:
    SELECT
        Category,
        COUNT(*) AS TotalOrders,
        SUM(Sales) AS TotalSales,
        AVG(Sales) AS AvgOrderValue,
        MAX(Sales) AS HighestSale,
        MIN(Sales) AS LowestSale
    FROM Orders
    GROUP BY Category
    ORDER BY TotalSales DESC;
  2. This gives you a complete statistical summary per category!

πŸ’‘ Multiple Aggregates:

  • You can calculate many metrics in one query
  • Each aggregate operates on the grouped rows
  • Much more efficient than separate queries

βœ… Expected Result: Each category with 6 different metrics

Exercise 3: GROUP BY Multiple Columns - Two Dimensions

Step-by-Step Instructions:

  1. Group by both Region AND Category:
    SELECT
        Region,
        Category,
        COUNT(*) AS OrderCount,
        SUM(Sales) AS TotalSales
    FROM Orders
    GROUP BY Region, Category
    ORDER BY Region, TotalSales DESC;
  2. You now see sales broken down by both dimensions!
  3. This is like having both Region and Category in Rows in a Pivot Table

🎯 Multi-Column Grouping:

  • Creates unique combinations of all grouped columns
  • Order matters: GROUP BY Region, Category groups by Region first
  • Can group by 3+ columns: GROUP BY Region, Category, Year

βœ… Expected Result: Sales for each Region + Category combination (e.g., "East - Furniture", "East - Technology", etc.)

Exercise 4: Combining WHERE and GROUP BY - Filtered Aggregation

Step-by-Step Instructions:

  1. Calculate sales by region, but only for profitable orders:
    SELECT
        Region,
        COUNT(*) AS ProfitableOrders,
        SUM(Sales) AS TotalSales,
        SUM(Profit) AS TotalProfit
    FROM Orders
    WHERE Profit > 0
    GROUP BY Region
    ORDER BY TotalProfit DESC;
  2. WHERE filters BEFORE grouping (filters individual rows)

πŸ“Œ WHERE vs HAVING:

  • WHERE: Filters rows BEFORE grouping (filters raw data)
  • HAVING: Filters groups AFTER aggregation (filters summarized data)
  • Example: WHERE Profit > 0 (row level) vs HAVING SUM(Profit) > 1000 (group level)

βœ… Expected Result: Regional sales including only profitable orders

Exercise 5: HAVING Clause - Filter Aggregated Results

Step-by-Step Instructions:

  1. Find regions with total sales exceeding $50,000:
    SELECT
        Region,
        SUM(Sales) AS TotalSales,
        COUNT(*) AS OrderCount
    FROM Orders
    GROUP BY Region
    HAVING SUM(Sales) > 50000
    ORDER BY TotalSales DESC;
  2. HAVING filters groups based on aggregate calculations
  3. More examples:
    -- Customers with more than 10 orders
    SELECT CustomerID, COUNT(*) AS OrderCount
    FROM Orders
    GROUP BY CustomerID
    HAVING COUNT(*) > 10;
    
    -- Categories with average sale > $200
    SELECT Category, AVG(Sales) AS AvgSale
    FROM Orders
    GROUP BY Category
    HAVING AVG(Sales) > 200;

🎯 HAVING Use Cases:

  • Find top customers (by order count or total spend)
  • Identify high-performing products/categories
  • Filter out insignificant groups (e.g., < 5 orders)
  • Quality control (e.g., products with high defect rates)

πŸ’‘ Query Order:

SELECT columns
FROM table
WHERE row_filters
GROUP BY columns
HAVING group_filters
ORDER BY columns
LIMIT n;

βœ… Expected Result: Only regions meeting the $50,000 threshold

🎯 Practice Goal Checkpoint:

By completing these 5 exercises, you should now be able to:

  • Use GROUP BY to aggregate data by categories
  • Apply multiple aggregate functions (SUM, AVG, COUNT, MAX, MIN)
  • Group by multiple columns for multi-dimensional analysis
  • Combine WHERE with GROUP BY for filtered aggregation
  • Use HAVING to filter grouped results
  • Understand the difference between WHERE and HAVING

Key Insight: GROUP BY is SQL's answer to Excel Pivot Tables. While Excel uses drag-and-drop, SQL uses code - but both summarize data by categories!

Excel Day 7 vs SQL Day 8:

  • Excel Pivot: Drag Region to Rows, Sales to Values
  • SQL GROUP BY: SELECT Region, SUM(Sales) FROM Orders GROUP BY Region
  • Same result, different interface!

Common GROUP BY Patterns:

  • Sales analysis by time period, region, product
  • Customer segmentation and ranking
  • Inventory summaries
  • Performance metrics and KPIs

Take a 5-minute break, then move on to Independent Practice!

Independent Practice (60 min):

πŸ’‘ Hints for Independent Practice:

  • Start simple: Begin with basic variations before attempting complex modifications
  • Break it down: Divide complex problems into smaller, manageable steps
  • Test incrementally: Verify each step works before moving to the next
  • Use documentation: Refer to official docs or tutorials when needed
  • Experiment: Try different approaches to find what works best
  1. Total revenue by product category
  2. Monthly sales totals (GROUP BY month)
  3. Customers with more than 5 orders (HAVING)
  4. Average discount by category
  5. Top 5 regions by total profit
  6. Quarterly sales (GROUP BY quarter)
  7. Categories with average sale > $500
  8. Customer lifetime value (SUM sales per customer)
  9. Products sold per region (JOIN + GROUP BY)
  10. Year-over-year growth by category

Daily Assignment (60 min):

Expected Outcomes:

βœ“ Use GROUP BY for data aggregation
βœ“ Understand parallel between Pivot Tables and GROUP BY
βœ“ Filter groups with HAVING
βœ“ Submit: Day08_SQL_Practice.sql


πŸ“ Day 8 Quick Quiz - Test Your Knowledge!

Q1. What does GROUP BY do?

Q2. What's the difference between WHERE and HAVING?

Q3. Which is correct syntax for GROUP BY?

Q4. Can you use aggregate functions in WHERE clause?

Q5. What does COUNT(*) return when used with GROUP BY?

Have feedback or questions?