DAY 24: SQL - Query Optimization & Performance Tuning

Guru Charan Pathalla

DAY 24: SQL - Query Optimization & Performance Tuning

Learning Objectives: - Analyze query execution plans - Identify performance bottlenecks - Optimize JOIN operations - Use appropriate indexing strategies - Write efficient WHERE clauses

Topics Covered (60 min):

1. Execution Plan Analysis - View Actual Execution Plan (Ctrl+M) - Read plan from right to left - Identify expensive operations (Table Scan, Sort) - Look for warnings (missing index, type conversion) - % of total cost

2. Indexing Strategy - Clustered index (1 per table, sorts data) - Non-clustered index (pointers to data) - Composite indexes (multiple columns) - Include columns for covering index - When to rebuild/reorganize indexes

3. Query Optimization Techniques - Use WHERE before JOIN - Avoid SELECT *, specify needed columns - Use EXISTS instead of IN for large datasets - Avoid functions on indexed columns in WHERE - Use UNION ALL instead of UNION when possible

4. JOIN Optimization - Join order matters (smaller table first) - Use appropriate join types - Index foreign key columns - Avoid Cartesian products - Consider temp tables for complex joins

Video Resources:

  1. SQL Query Optimization Techniques (7:20)
    https://www.youtube.com/watch?v=MpczBuIk7R8

  2. Secret To Optimizing SQL Queries - Understand The SQL Execution Plan (12:11)
    https://www.youtube.com/watch?v=BHwzDmr6d7s

  3. 5-Minute SQL Performance Boost Using Execution Plans (2:24)
    https://www.youtube.com/watch?v=Kb1Qz265YSE

AI Learning Prompts (Concept Understanding Only):

Understanding Query Optimization:

I'm learning SQL query optimization. Please explain:

1. How do I read an execution plan?
2. What is a table scan vs index seek and which is better?
3. Show me how to identify slow queries
4. What are the most common performance mistakes?
5. Provide examples of before/after query optimization

Use SQL Server with practical examples.

Understanding Indexing:

Help me understand SQL indexing:

1. What is the difference between clustered and non-clustered indexes?
2. When should I create an index?
3. Can too many indexes be bad?
4. Show me how to find missing indexes
5. Explain covering indexes and when to use them

Provide beginner-friendly explanations.

Guided Practice (60 min):

📝 How to approach Guided Practice:

  • Follow step-by-step: Work through each exercise in order
  • Use provided resources: Reference the video tutorials and AI prompts if you get stuck
  • Verify your work: Check your results match expected outcomes
  • Take notes: Document key learnings and shortcuts you discover
  • Ask for help: Use AI assistants to clarify concepts you don't understand

Exercise 1: Read and Analyze SQL Execution Plans

Goal: Learn to read execution plans to identify performance bottlenecks like table scans, missing indexes, and expensive operations.

Step-by-Step Instructions:

  1. Create sample tables with data:
    CREATE TABLE Customers (
        CustomerID INT PRIMARY KEY,
        Name VARCHAR(100),
        Region VARCHAR(50),
        Email VARCHAR(100)
    );
    
    CREATE TABLE Orders (
        OrderID INT PRIMARY KEY,
        CustomerID INT,
        OrderDate DATE,
        Amount DECIMAL(10,2)
    );
    
    -- Insert sample data (1000+ rows for meaningful analysis)
    INSERT INTO Customers SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
        'Customer' + CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR),
        CASE WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 4 = 0 THEN 'East'
             WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 4 = 1 THEN 'West'
             WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 4 = 2 THEN 'North'
             ELSE 'South' END,
        'customer' + CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR) + '@email.com'
    FROM master..spt_values; -- Generates rows
  2. Run a slow query without viewing the plan first:
    SELECT c.Name, c.Region, SUM(o.Amount) AS TotalSales
    FROM Customers c
    JOIN Orders o ON c.CustomerID = o.CustomerID
    WHERE c.Region = 'East'
    GROUP BY c.Name, c.Region;
  3. Enable execution plan viewing: - SQL Server: Ctrl+M or click "Display Estimated Execution Plan" - Or add: SET SHOWPLAN_TEXT ON; before the query
  4. Re-run the query and analyze the plan: - Look for Table Scan (bad) vs Index Seek (good) - Check the cost percentage of each operation - Identify operations with highest cost (thickest arrows)
  5. Key indicators to look for: - Table Scan = Reading entire table (slow) - Index Scan = Reading entire index (better than table scan) - Index Seek = Using index efficiently (best) - Sort operations = Can be expensive - Nested Loops vs Hash Match vs Merge Join

💡 Tips:

  • Read right to left: Execution plans show operations from right to left, bottom to top
  • Thicker arrows = more rows: Hover over arrows to see row counts
  • Look for warnings: Yellow exclamation marks indicate potential issues
  • Cost percentages matter: Focus optimization on operations with highest cost %
  • Actual vs Estimated: Use "Include Actual Execution Plan" (Ctrl+M) for real numbers

✅ Expected Result: Ability to read execution plans and identify the most expensive operations causing slow queries

🔄 Connection to Previous Learning: Builds on JOIN operations (Day 6) by showing how they execute under the hood

⚠️ SQLite Note: We use EXPLAIN QUERY PLAN to visualize operations. Look for SCAN TABLE (Full Scan) vs SEARCH TABLE (Index Usage).

Exercise 2: Create and Use Indexes for Performance

Goal: Master creating appropriate indexes to speed up queries and understanding when to use clustered vs non-clustered indexes.

Step-by-Step Instructions:

  1. Test query performance WITHOUT index:
    -- Turn on statistics
    SET STATISTICS TIME ON;
    SET STATISTICS IO ON;
    
    -- Run query without index
    SELECT * FROM Orders
    WHERE CustomerID = 500;
    
    -- Note the execution time and logical reads
  2. Create a non-clustered index on CustomerID:
    CREATE NONCLUSTERED INDEX IX_Orders_CustomerID
    ON Orders(CustomerID);
  3. Re-run the same query and compare:
    SELECT * FROM Orders
    WHERE CustomerID = 500;
    
    -- Compare execution time and logical reads - should be much faster!
  4. Create a covering index for better performance:
    -- Drop the previous index
    DROP INDEX IX_Orders_CustomerID ON Orders;
    
    -- Create covering index that includes columns we need
    CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_Covering
    ON Orders(CustomerID)
    INCLUDE (OrderDate, Amount);
  5. Test a query that benefits from covering index:
    SELECT CustomerID, OrderDate, Amount
    FROM Orders
    WHERE CustomerID = 500;
    
    -- No need to access the table data - all columns in index!
  6. Create a composite index for multiple columns:
    CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_Date
    ON Orders(CustomerID, OrderDate);

💡 Tips:

  • Clustered vs Non-clustered: Clustered sorts the table itself (1 per table), Non-clustered creates separate structure
  • Covering indexes avoid lookups: Include frequently queried columns in INCLUDE clause
  • Index column order matters: Most selective column first, then less selective
  • Don't over-index: Indexes speed up reads but slow down writes (INSERT/UPDATE/DELETE)
  • WHERE and JOIN columns are candidates: Index columns frequently used in WHERE, JOIN, and ORDER BY
  • Monitor index usage: Use sys.dm_db_index_usage_stats to see which indexes are actually used

✅ Expected Result: Dramatic performance improvement (10x-100x faster) for queries using appropriate indexes

🔄 Connection to Previous Learning: Optimizes the filtering and joining concepts from Days 4 and 6

Exercise 3: Optimize JOIN Operations and Query Structure

Goal: Learn to write efficient JOINs and restructure queries to minimize data processing.

Step-by-Step Instructions:

  1. Start with an inefficient query:
    -- BAD: Filtering after JOIN (processes all rows first)
    SELECT c.Name, o.OrderDate, o.Amount
    FROM Customers c
    JOIN Orders o ON c.CustomerID = o.CustomerID
    WHERE c.Region = 'East'
      AND o.OrderDate >= '2024-01-01';
  2. Optimize by filtering early:
    -- BETTER: Filter before JOIN (fewer rows to join)
    SELECT c.Name, o.OrderDate, o.Amount
    FROM (SELECT CustomerID, Name FROM Customers WHERE Region = 'East') c
    JOIN (SELECT CustomerID, OrderDate, Amount FROM Orders WHERE OrderDate >= '2024-01-01') o
    ON c.CustomerID = o.CustomerID;
  3. Or use CTEs for readability:
    -- BEST: Clear intent, optimized by query engine
    WITH EastCustomers AS (
        SELECT CustomerID, Name
        FROM Customers
        WHERE Region = 'East'
    ),
    RecentOrders AS (
        SELECT CustomerID, OrderDate, Amount
        FROM Orders
        WHERE OrderDate >= '2024-01-01'
    )
    SELECT c.Name, o.OrderDate, o.Amount
    FROM EastCustomers c
    JOIN RecentOrders o ON c.CustomerID = o.CustomerID;
  4. Avoid functions on indexed columns:
    -- BAD: Function prevents index usage
    SELECT * FROM Orders
    WHERE YEAR(OrderDate) = 2024;
    
    -- GOOD: Index can be used
    SELECT * FROM Orders
    WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01';
  5. Use EXISTS instead of IN for large datasets:
    -- LESS EFFICIENT with large subquery results
    SELECT * FROM Customers
    WHERE CustomerID IN (SELECT CustomerID FROM Orders WHERE Amount > 1000);
    
    -- MORE EFFICIENT - stops at first match
    SELECT * FROM Customers c
    WHERE EXISTS (SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID AND o.Amount > 1000);

💡 Tips:

  • Filter early, filter often: Reduce data volume before expensive operations like JOINs
  • Avoid SELECT *: Specify only needed columns to reduce I/O
  • No functions on WHERE columns: YEAR(date), UPPER(name) prevent index usage
  • EXISTS vs IN: EXISTS is faster for large subqueries (short-circuits on first match)
  • Use INNER JOIN over subqueries: Modern optimizers handle JOINs better than nested subqueries
  • Avoid OR in WHERE: Use UNION instead, or restructure with separate queries

✅ Expected Result: Queries that process minimal data and use indexes effectively, resulting in faster execution

🔄 Connection to Previous Learning: Applies optimization principles to CTEs (Day 16) and subqueries (Day 10)

Exercise 4: Use Query Hints and Force Optimization

Goal: Learn when and how to use query hints to force specific execution strategies when the optimizer makes poor choices.

Step-by-Step Instructions:

  1. Understand the problem: Sometimes the query optimizer chooses a suboptimal plan
  2. Force an index usage with INDEX hint:
    -- Force use of specific index
    SELECT OrderID, CustomerID, Amount
    FROM Orders WITH (INDEX(IX_Orders_CustomerID))
    WHERE CustomerID = 500;
  3. Use FORCESEEK to force index seek:
    -- Prevent table/index scan, force seek
    SELECT OrderID, CustomerID, Amount
    FROM Orders WITH (FORCESEEK)
    WHERE CustomerID = 500;
  4. Control JOIN algorithm with hints:
    -- Force hash join for large datasets
    SELECT c.Name, SUM(o.Amount) AS Total
    FROM Customers c
    INNER HASH JOIN Orders o ON c.CustomerID = o.CustomerID
    GROUP BY c.Name;
    
    -- Force loop join for small datasets
    SELECT c.Name, o.Amount
    FROM Customers c
    INNER LOOP JOIN Orders o ON c.CustomerID = o.CustomerID
    WHERE c.Region = 'East';
  5. Use MAXDOP to control parallelism:
    -- Limit to single thread (useful for debugging)
    SELECT CustomerID, SUM(Amount) AS Total
    FROM Orders
    GROUP BY CustomerID
    OPTION (MAXDOP 1);
    
    -- Allow up to 4 parallel threads
    SELECT CustomerID, SUM(Amount) AS Total
    FROM Orders
    GROUP BY CustomerID
    OPTION (MAXDOP 4);
  6. Recompile for parameter-sensitive queries:
    -- Force recompile for accurate statistics
    SELECT * FROM Orders
    WHERE Amount > @threshold
    OPTION (RECOMPILE);

💡 Tips:

  • Use hints sparingly: Let optimizer do its job; only hint when you know better
  • Document why you hint: Future maintainers need to know why you overrode the optimizer
  • Test with and without: Always compare performance with and without hints
  • Hints can become outdated: As data grows, hint assumptions may become invalid
  • NOLOCK for dirty reads: WITH (NOLOCK) allows reading uncommitted data (faster but risky)
  • Parameter sniffing: Use OPTION (RECOMPILE) or OPTIMIZE FOR when parameter values vary widely

✅ Expected Result: Understanding of when to override the optimizer and how to force specific execution strategies

🔄 Connection to Previous Learning: Advanced technique building on all previous SQL optimization concepts

⚠️ Note: SQL Server query hints (WITH INDEX, MAXDOP) are not supported in SQLite. However, you can see how the optimizer creates proper plans automatically.

Exercise 5: Monitor and Measure Query Performance

Goal: Learn to measure query performance accurately and identify the actual bottlenecks using SQL Server tools.

Step-by-Step Instructions:

  1. Enable detailed statistics:
    SET STATISTICS TIME ON;  -- Show execution time
    SET STATISTICS IO ON;    -- Show I/O operations
    
    SELECT c.Name, COUNT(*) AS OrderCount, SUM(o.Amount) AS TotalSpent
    FROM Customers c
    JOIN Orders o ON c.CustomerID = o.CustomerID
    GROUP BY c.Name
    HAVING SUM(o.Amount) > 10000;
    
    -- Review output:
    -- SQL Server Execution Times:
    --   CPU time = 156 ms,  elapsed time = 234 ms.
    -- Table 'Orders'. Scan count 1, logical reads 2450
  2. Interpret the statistics: - CPU time: Actual processing time - Elapsed time: Total time including waits - Logical reads: Pages read from memory (lower is better) - Physical reads: Pages read from disk (should be 0 after first run)
  3. Measure baseline before optimization:
    -- Create a performance test script
    DECLARE @StartTime DATETIME = GETDATE();
    
    -- Your query here
    SELECT ...
    
    DECLARE @EndTime DATETIME = GETDATE();
    SELECT DATEDIFF(MILLISECOND, @StartTime, @EndTime) AS ExecutionTimeMS;
  4. Use query store to track performance over time:
    -- Enable Query Store (SQL Server 2016+)
    ALTER DATABASE YourDatabase
    SET QUERY_STORE = ON;
    
    -- View top resource-consuming queries
    SELECT TOP 10
        q.query_id,
        qt.query_sql_text,
        rs.avg_duration/1000 AS avg_duration_ms,
        rs.avg_logical_io_reads
    FROM sys.query_store_query q
    JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
    JOIN sys.query_store_plan p ON q.query_id = p.query_id
    JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
    ORDER BY rs.avg_duration DESC;
  5. Find missing indexes:
    -- SQL Server suggests missing indexes
    SELECT
        migs.avg_user_impact,
        migs.avg_total_user_cost,
        migs.user_seeks,
        mid.statement AS TableName,
        mid.equality_columns,
        mid.inequality_columns,
        mid.included_columns
    FROM sys.dm_db_missing_index_groups mig
    JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
    JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
    WHERE migs.avg_user_impact > 50
    ORDER BY migs.avg_user_impact DESC;
  6. Document before/after metrics:
    -- Before optimization
    -- Execution time: 2,340 ms
    -- Logical reads: 45,000
    -- CPU time: 1,890 ms
    
    -- After adding index and rewriting query
    -- Execution time: 187 ms (12.5x faster!)
    -- Logical reads: 1,200 (37.5x fewer reads!)
    -- CPU time: 156 ms (12x less CPU)

💡 Tips:

  • Logical reads are key metric: Focus on reducing logical reads, not just execution time
  • Run queries multiple times: First run may be slower due to cold cache
  • Clear cache for testing: Use DBCC DROPCLEANBUFFERS (dev only!) for accurate testing
  • Production monitoring: Use Query Store or Extended Events for ongoing monitoring
  • Benchmark systematically: Always measure before and after each optimization
  • Set performance goals: "Query must complete in < 500ms" gives clear target

✅ Expected Result: Systematic approach to measuring and documenting query performance improvements

🔄 Connection to Previous Learning: Provides metrics to validate all previous optimization techniques

Independent Practice (60 min):

💡 Hints for Independent Practice:

  • Start simple: Begin with basic variations before attempting complex modifications
  • Break it down: Divide complex problems into smaller, manageable steps
  • Test incrementally: Verify each step works before moving to the next
  • Use documentation: Refer to official docs or tutorials when needed
  • Experiment: Try different approaches to find what works best

Apply the concepts learned to different datasets and scenarios.

Daily Assignment (60 min):

Assignment:

Expected Outcomes:


📝 Day 24 Quick Quiz - Test Your Knowledge!

Q1. What advanced concept is introduced in Day 24?

Q2. How can Day 24 skills improve your workflow?

Q3. What prerequisite knowledge does Day 24 assume?

Q4. What's a real-world application of Day 24 content?

Q5. What distinguishes Day 24 from earlier lessons?

Have feedback or questions?