DAY 24: SQL - Query Optimization & Performance Tuning
Learning Objectives: - Analyze query execution plans - Identify performance bottlenecks - Optimize JOIN operations - Use appropriate indexing strategies - Write efficient WHERE clauses
Topics Covered (60 min):
1. Execution Plan Analysis - View Actual Execution Plan (Ctrl+M) - Read plan from right to left - Identify expensive operations (Table Scan, Sort) - Look for warnings (missing index, type conversion) - % of total cost
2. Indexing Strategy - Clustered index (1 per table, sorts data) - Non-clustered index (pointers to data) - Composite indexes (multiple columns) - Include columns for covering index - When to rebuild/reorganize indexes
3. Query Optimization Techniques - Use WHERE before JOIN - Avoid SELECT *, specify needed columns - Use EXISTS instead of IN for large datasets - Avoid functions on indexed columns in WHERE - Use UNION ALL instead of UNION when possible
4. JOIN Optimization - Join order matters (smaller table first) - Use appropriate join types - Index foreign key columns - Avoid Cartesian products - Consider temp tables for complex joins
Video Resources:
-
SQL Query Optimization Techniques (7:20)
https://www.youtube.com/watch?v=MpczBuIk7R8 -
Secret To Optimizing SQL Queries - Understand The SQL Execution Plan (12:11)
https://www.youtube.com/watch?v=BHwzDmr6d7s -
5-Minute SQL Performance Boost Using Execution Plans (2:24)
https://www.youtube.com/watch?v=Kb1Qz265YSE
AI Learning Prompts (Concept Understanding Only):
Understanding Query Optimization:
I'm learning SQL query optimization. Please explain:
1. How do I read an execution plan?
2. What is a table scan vs index seek and which is better?
3. Show me how to identify slow queries
4. What are the most common performance mistakes?
5. Provide examples of before/after query optimization
Use SQL Server with practical examples.
Understanding Indexing:
Help me understand SQL indexing:
1. What is the difference between clustered and non-clustered indexes?
2. When should I create an index?
3. Can too many indexes be bad?
4. Show me how to find missing indexes
5. Explain covering indexes and when to use them
Provide beginner-friendly explanations.
Guided Practice (60 min):
📝 How to approach Guided Practice:
- Follow step-by-step: Work through each exercise in order
- Use provided resources: Reference the video tutorials and AI prompts if you get stuck
- Verify your work: Check your results match expected outcomes
- Take notes: Document key learnings and shortcuts you discover
- Ask for help: Use AI assistants to clarify concepts you don't understand
Exercise 1: Read and Analyze SQL Execution Plans
Goal: Learn to read execution plans to identify performance bottlenecks like table scans, missing indexes, and expensive operations.
Step-by-Step Instructions:
- Create sample tables with data:
CREATE TABLE Customers ( CustomerID INT PRIMARY KEY, Name VARCHAR(100), Region VARCHAR(50), Email VARCHAR(100) ); CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT, OrderDate DATE, Amount DECIMAL(10,2) ); -- Insert sample data (1000+ rows for meaningful analysis) INSERT INTO Customers SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), 'Customer' + CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR), CASE WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 4 = 0 THEN 'East' WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 4 = 1 THEN 'West' WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 4 = 2 THEN 'North' ELSE 'South' END, 'customer' + CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR) + '@email.com' FROM master..spt_values; -- Generates rows - Run a slow query without viewing the plan first:
SELECT c.Name, c.Region, SUM(o.Amount) AS TotalSales FROM Customers c JOIN Orders o ON c.CustomerID = o.CustomerID WHERE c.Region = 'East' GROUP BY c.Name, c.Region; - Enable execution plan viewing:
- SQL Server: Ctrl+M or click "Display Estimated Execution Plan"
- Or add:
SET SHOWPLAN_TEXT ON;before the query - Re-run the query and analyze the plan: - Look for Table Scan (bad) vs Index Seek (good) - Check the cost percentage of each operation - Identify operations with highest cost (thickest arrows)
- Key indicators to look for: - Table Scan = Reading entire table (slow) - Index Scan = Reading entire index (better than table scan) - Index Seek = Using index efficiently (best) - Sort operations = Can be expensive - Nested Loops vs Hash Match vs Merge Join
💡 Tips:
- Read right to left: Execution plans show operations from right to left, bottom to top
- Thicker arrows = more rows: Hover over arrows to see row counts
- Look for warnings: Yellow exclamation marks indicate potential issues
- Cost percentages matter: Focus optimization on operations with highest cost %
- Actual vs Estimated: Use "Include Actual Execution Plan" (Ctrl+M) for real numbers
✅ Expected Result: Ability to read execution plans and identify the most expensive operations causing slow queries
🔄 Connection to Previous Learning: Builds on JOIN operations (Day 6) by showing how they execute under the hood
EXPLAIN QUERY PLAN to visualize operations. Look for
SCAN TABLE (Full Scan) vs SEARCH TABLE (Index Usage).
Exercise 2: Create and Use Indexes for Performance
Goal: Master creating appropriate indexes to speed up queries and understanding when to use clustered vs non-clustered indexes.
Step-by-Step Instructions:
- Test query performance WITHOUT index:
-- Turn on statistics SET STATISTICS TIME ON; SET STATISTICS IO ON; -- Run query without index SELECT * FROM Orders WHERE CustomerID = 500; -- Note the execution time and logical reads - Create a non-clustered index on CustomerID:
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID ON Orders(CustomerID); - Re-run the same query and compare:
SELECT * FROM Orders WHERE CustomerID = 500; -- Compare execution time and logical reads - should be much faster! - Create a covering index for better performance:
-- Drop the previous index DROP INDEX IX_Orders_CustomerID ON Orders; -- Create covering index that includes columns we need CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_Covering ON Orders(CustomerID) INCLUDE (OrderDate, Amount); - Test a query that benefits from covering index:
SELECT CustomerID, OrderDate, Amount FROM Orders WHERE CustomerID = 500; -- No need to access the table data - all columns in index! - Create a composite index for multiple columns:
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_Date ON Orders(CustomerID, OrderDate);
💡 Tips:
- Clustered vs Non-clustered: Clustered sorts the table itself (1 per table), Non-clustered creates separate structure
- Covering indexes avoid lookups: Include frequently queried columns in INCLUDE clause
- Index column order matters: Most selective column first, then less selective
- Don't over-index: Indexes speed up reads but slow down writes (INSERT/UPDATE/DELETE)
- WHERE and JOIN columns are candidates: Index columns frequently used in WHERE, JOIN, and ORDER BY
- Monitor index usage: Use sys.dm_db_index_usage_stats to see which indexes are actually used
✅ Expected Result: Dramatic performance improvement (10x-100x faster) for queries using appropriate indexes
🔄 Connection to Previous Learning: Optimizes the filtering and joining concepts from Days 4 and 6
Exercise 3: Optimize JOIN Operations and Query Structure
Goal: Learn to write efficient JOINs and restructure queries to minimize data processing.
Step-by-Step Instructions:
- Start with an inefficient query:
-- BAD: Filtering after JOIN (processes all rows first) SELECT c.Name, o.OrderDate, o.Amount FROM Customers c JOIN Orders o ON c.CustomerID = o.CustomerID WHERE c.Region = 'East' AND o.OrderDate >= '2024-01-01'; - Optimize by filtering early:
-- BETTER: Filter before JOIN (fewer rows to join) SELECT c.Name, o.OrderDate, o.Amount FROM (SELECT CustomerID, Name FROM Customers WHERE Region = 'East') c JOIN (SELECT CustomerID, OrderDate, Amount FROM Orders WHERE OrderDate >= '2024-01-01') o ON c.CustomerID = o.CustomerID; - Or use CTEs for readability:
-- BEST: Clear intent, optimized by query engine WITH EastCustomers AS ( SELECT CustomerID, Name FROM Customers WHERE Region = 'East' ), RecentOrders AS ( SELECT CustomerID, OrderDate, Amount FROM Orders WHERE OrderDate >= '2024-01-01' ) SELECT c.Name, o.OrderDate, o.Amount FROM EastCustomers c JOIN RecentOrders o ON c.CustomerID = o.CustomerID; - Avoid functions on indexed columns:
-- BAD: Function prevents index usage SELECT * FROM Orders WHERE YEAR(OrderDate) = 2024; -- GOOD: Index can be used SELECT * FROM Orders WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01'; - Use EXISTS instead of IN for large datasets:
-- LESS EFFICIENT with large subquery results SELECT * FROM Customers WHERE CustomerID IN (SELECT CustomerID FROM Orders WHERE Amount > 1000); -- MORE EFFICIENT - stops at first match SELECT * FROM Customers c WHERE EXISTS (SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID AND o.Amount > 1000);
💡 Tips:
- Filter early, filter often: Reduce data volume before expensive operations like JOINs
- Avoid SELECT *: Specify only needed columns to reduce I/O
- No functions on WHERE columns: YEAR(date), UPPER(name) prevent index usage
- EXISTS vs IN: EXISTS is faster for large subqueries (short-circuits on first match)
- Use INNER JOIN over subqueries: Modern optimizers handle JOINs better than nested subqueries
- Avoid OR in WHERE: Use UNION instead, or restructure with separate queries
✅ Expected Result: Queries that process minimal data and use indexes effectively, resulting in faster execution
🔄 Connection to Previous Learning: Applies optimization principles to CTEs (Day 16) and subqueries (Day 10)
Exercise 4: Use Query Hints and Force Optimization
Goal: Learn when and how to use query hints to force specific execution strategies when the optimizer makes poor choices.
Step-by-Step Instructions:
- Understand the problem: Sometimes the query optimizer chooses a suboptimal plan
- Force an index usage with INDEX hint:
-- Force use of specific index SELECT OrderID, CustomerID, Amount FROM Orders WITH (INDEX(IX_Orders_CustomerID)) WHERE CustomerID = 500; - Use FORCESEEK to force index seek:
-- Prevent table/index scan, force seek SELECT OrderID, CustomerID, Amount FROM Orders WITH (FORCESEEK) WHERE CustomerID = 500; - Control JOIN algorithm with hints:
-- Force hash join for large datasets SELECT c.Name, SUM(o.Amount) AS Total FROM Customers c INNER HASH JOIN Orders o ON c.CustomerID = o.CustomerID GROUP BY c.Name; -- Force loop join for small datasets SELECT c.Name, o.Amount FROM Customers c INNER LOOP JOIN Orders o ON c.CustomerID = o.CustomerID WHERE c.Region = 'East'; - Use MAXDOP to control parallelism:
-- Limit to single thread (useful for debugging) SELECT CustomerID, SUM(Amount) AS Total FROM Orders GROUP BY CustomerID OPTION (MAXDOP 1); -- Allow up to 4 parallel threads SELECT CustomerID, SUM(Amount) AS Total FROM Orders GROUP BY CustomerID OPTION (MAXDOP 4); - Recompile for parameter-sensitive queries:
-- Force recompile for accurate statistics SELECT * FROM Orders WHERE Amount > @threshold OPTION (RECOMPILE);
💡 Tips:
- Use hints sparingly: Let optimizer do its job; only hint when you know better
- Document why you hint: Future maintainers need to know why you overrode the optimizer
- Test with and without: Always compare performance with and without hints
- Hints can become outdated: As data grows, hint assumptions may become invalid
- NOLOCK for dirty reads: WITH (NOLOCK) allows reading uncommitted data (faster but risky)
- Parameter sniffing: Use OPTION (RECOMPILE) or OPTIMIZE FOR when parameter values vary widely
✅ Expected Result: Understanding of when to override the optimizer and how to force specific execution strategies
🔄 Connection to Previous Learning: Advanced technique building on all previous SQL optimization concepts
Exercise 5: Monitor and Measure Query Performance
Goal: Learn to measure query performance accurately and identify the actual bottlenecks using SQL Server tools.
Step-by-Step Instructions:
- Enable detailed statistics:
SET STATISTICS TIME ON; -- Show execution time SET STATISTICS IO ON; -- Show I/O operations SELECT c.Name, COUNT(*) AS OrderCount, SUM(o.Amount) AS TotalSpent FROM Customers c JOIN Orders o ON c.CustomerID = o.CustomerID GROUP BY c.Name HAVING SUM(o.Amount) > 10000; -- Review output: -- SQL Server Execution Times: -- CPU time = 156 ms, elapsed time = 234 ms. -- Table 'Orders'. Scan count 1, logical reads 2450 - Interpret the statistics: - CPU time: Actual processing time - Elapsed time: Total time including waits - Logical reads: Pages read from memory (lower is better) - Physical reads: Pages read from disk (should be 0 after first run)
- Measure baseline before optimization:
-- Create a performance test script DECLARE @StartTime DATETIME = GETDATE(); -- Your query here SELECT ... DECLARE @EndTime DATETIME = GETDATE(); SELECT DATEDIFF(MILLISECOND, @StartTime, @EndTime) AS ExecutionTimeMS; - Use query store to track performance over time:
-- Enable Query Store (SQL Server 2016+) ALTER DATABASE YourDatabase SET QUERY_STORE = ON; -- View top resource-consuming queries SELECT TOP 10 q.query_id, qt.query_sql_text, rs.avg_duration/1000 AS avg_duration_ms, rs.avg_logical_io_reads FROM sys.query_store_query q JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id JOIN sys.query_store_plan p ON q.query_id = p.query_id JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id ORDER BY rs.avg_duration DESC; - Find missing indexes:
-- SQL Server suggests missing indexes SELECT migs.avg_user_impact, migs.avg_total_user_cost, migs.user_seeks, mid.statement AS TableName, mid.equality_columns, mid.inequality_columns, mid.included_columns FROM sys.dm_db_missing_index_groups mig JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle WHERE migs.avg_user_impact > 50 ORDER BY migs.avg_user_impact DESC; - Document before/after metrics:
-- Before optimization -- Execution time: 2,340 ms -- Logical reads: 45,000 -- CPU time: 1,890 ms -- After adding index and rewriting query -- Execution time: 187 ms (12.5x faster!) -- Logical reads: 1,200 (37.5x fewer reads!) -- CPU time: 156 ms (12x less CPU)
💡 Tips:
- Logical reads are key metric: Focus on reducing logical reads, not just execution time
- Run queries multiple times: First run may be slower due to cold cache
- Clear cache for testing: Use DBCC DROPCLEANBUFFERS (dev only!) for accurate testing
- Production monitoring: Use Query Store or Extended Events for ongoing monitoring
- Benchmark systematically: Always measure before and after each optimization
- Set performance goals: "Query must complete in < 500ms" gives clear target
✅ Expected Result: Systematic approach to measuring and documenting query performance improvements
🔄 Connection to Previous Learning: Provides metrics to validate all previous optimization techniques
Independent Practice (60 min):
💡 Hints for Independent Practice:
- Start simple: Begin with basic variations before attempting complex modifications
- Break it down: Divide complex problems into smaller, manageable steps
- Test incrementally: Verify each step works before moving to the next
- Use documentation: Refer to official docs or tutorials when needed
- Experiment: Try different approaches to find what works best
Apply the concepts learned to different datasets and scenarios.
Daily Assignment (60 min):
Assignment:
- Analysis:Identify bottlenecks (table scans, missing indexes, inefficient joins)
- Optimization:Add indexes, rewrite queries, apply best practices
- Comparison:Measure performance improvement (before/after execution time)
- Documentation:Document each optimization with explanation
Expected Outcomes:
- Analyze query execution plans
- Identify performance bottlenecks
- Optimize JOIN operations
- Use appropriate indexing strategies
- Write efficient WHERE clauses
📝 Day 24 Quick Quiz - Test Your Knowledge!
Q1. What advanced concept is introduced in Day 24?
Q2. How can Day 24 skills improve your workflow?
Q3. What prerequisite knowledge does Day 24 assume?
Q4. What's a real-world application of Day 24 content?
Q5. What distinguishes Day 24 from earlier lessons?