DAY 24: SQL - Query Optimization & Performance Tuning

Learning Objectives: - Analyze query execution plans - Identify performance bottlenecks - Optimize JOIN operations - Use appropriate indexing strategies - Write efficient WHERE clauses

Topics Covered (60 min):

1. Execution Plan Analysis - View Actual Execution Plan (Ctrl+M) - Read plan from right to left - Identify expensive operations (Table Scan, Sort) - Look for warnings (missing index, type conversion) - % of total cost

2. Indexing Strategy - Clustered index (1 per table, sorts data) - Non-clustered index (pointers to data) - Composite indexes (multiple columns) - Include columns for covering index - When to rebuild/reorganize indexes

3. Query Optimization Techniques - Use WHERE before JOIN - Avoid SELECT *, specify needed columns - Use EXISTS instead of IN for large datasets - Avoid functions on indexed columns in WHERE - Use UNION ALL instead of UNION when possible

4. JOIN Optimization - Join order matters (smaller table first) - Use appropriate join types - Index foreign key columns - Avoid Cartesian products - Consider temp tables for complex joins

Video Resources:

SQL Query Optimization Techniques (7:20)
https://www.youtube.com/watch?v=MpczBuIk7R8
Secret To Optimizing SQL Queries - Understand The SQL Execution Plan (12:11)
https://www.youtube.com/watch?v=BHwzDmr6d7s
5-Minute SQL Performance Boost Using Execution Plans (2:24)
https://www.youtube.com/watch?v=Kb1Qz265YSE

AI Learning Prompts (Concept Understanding Only):

Understanding Query Optimization:

I'm learning SQL query optimization. Please explain:

1. How do I read an execution plan?
2. What is a table scan vs index seek and which is better?
3. Show me how to identify slow queries
4. What are the most common performance mistakes?
5. Provide examples of before/after query optimization

Use SQL Server with practical examples.

Understanding Indexing:

Help me understand SQL indexing:

1. What is the difference between clustered and non-clustered indexes?
2. When should I create an index?
3. Can too many indexes be bad?
4. Show me how to find missing indexes
5. Explain covering indexes and when to use them

Provide beginner-friendly explanations.

Guided Practice (60 min):

📝 How to approach Guided Practice:

Follow step-by-step: Work through each exercise in order
Use provided resources: Reference the video tutorials and AI prompts if you get stuck
Verify your work: Check your results match expected outcomes
Take notes: Document key learnings and shortcuts you discover
Ask for help: Use AI assistants to clarify concepts you don't understand

Exercise 1: Read and Analyze SQL Execution Plans

Goal: Learn to read execution plans to identify performance bottlenecks like table scans, missing indexes, and expensive operations.

Step-by-Step Instructions:

Create sample tables with data:

CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    Name VARCHAR(100),
    Region VARCHAR(50),
    Email VARCHAR(100)
);

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    Amount DECIMAL(10,2)
);

-- Insert sample data (1000+ rows for meaningful analysis)
INSERT INTO Customers SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
    'Customer' + CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR),
    CASE WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 4 = 0 THEN 'East'
         WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 4 = 1 THEN 'West'
         WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 4 = 2 THEN 'North'
         ELSE 'South' END,
    'customer' + CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR) + '@email.com'
FROM master..spt_values; -- Generates rows

Run a slow query without viewing the plan first:

SELECT c.Name, c.Region, SUM(o.Amount) AS TotalSales
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE c.Region = 'East'
GROUP BY c.Name, c.Region;

Enable execution plan viewing: - SQL Server: Ctrl+M or click "Display Estimated Execution Plan" - Or add: SET SHOWPLAN_TEXT ON; before the query
Re-run the query and analyze the plan: - Look for Table Scan (bad) vs Index Seek (good) - Check the cost percentage of each operation - Identify operations with highest cost (thickest arrows)
Key indicators to look for: - Table Scan = Reading entire table (slow) - Index Scan = Reading entire index (better than table scan) - Index Seek = Using index efficiently (best) - Sort operations = Can be expensive - Nested Loops vs Hash Match vs Merge Join

💡 Tips:

Read right to left: Execution plans show operations from right to left, bottom to top
Thicker arrows = more rows: Hover over arrows to see row counts
Look for warnings: Yellow exclamation marks indicate potential issues
Cost percentages matter: Focus optimization on operations with highest cost %
Actual vs Estimated: Use "Include Actual Execution Plan" (Ctrl+M) for real numbers

✅ Expected Result: Ability to read execution plans and identify the most expensive operations causing slow queries

🔄 Connection to Previous Learning: Builds on JOIN operations (Day 6) by showing how they execute under the hood

⚠️ SQLite Note: We use EXPLAIN QUERY PLAN to visualize operations. Look for SCAN TABLE (Full Scan) vs SEARCH TABLE (Index Usage).

💻 Try Exercise 1:

Exercise 2: Create and Use Indexes for Performance

Goal: Master creating appropriate indexes to speed up queries and understanding when to use clustered vs non-clustered indexes.

Step-by-Step Instructions:

Test query performance WITHOUT index:

-- Turn on statistics
SET STATISTICS TIME ON;
SET STATISTICS IO ON;

-- Run query without index
SELECT * FROM Orders
WHERE CustomerID = 500;

-- Note the execution time and logical reads

Create a non-clustered index on CustomerID:

CREATE NONCLUSTERED INDEX IX_Orders_CustomerID
ON Orders(CustomerID);

Re-run the same query and compare:

SELECT * FROM Orders
WHERE CustomerID = 500;

-- Compare execution time and logical reads - should be much faster!

Create a covering index for better performance:

-- Drop the previous index
DROP INDEX IX_Orders_CustomerID ON Orders;

-- Create covering index that includes columns we need
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_Covering
ON Orders(CustomerID)
INCLUDE (OrderDate, Amount);

Test a query that benefits from covering index:

SELECT CustomerID, OrderDate, Amount
FROM Orders
WHERE CustomerID = 500;

-- No need to access the table data - all columns in index!

Create a composite index for multiple columns:

CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_Date
ON Orders(CustomerID, OrderDate);

💡 Tips:

Clustered vs Non-clustered: Clustered sorts the table itself (1 per table), Non-clustered creates separate structure
Covering indexes avoid lookups: Include frequently queried columns in INCLUDE clause
Index column order matters: Most selective column first, then less selective
Don't over-index: Indexes speed up reads but slow down writes (INSERT/UPDATE/DELETE)
WHERE and JOIN columns are candidates: Index columns frequently used in WHERE, JOIN, and ORDER BY
Monitor index usage: Use sys.dm_db_index_usage_stats to see which indexes are actually used

✅ Expected Result: Dramatic performance improvement (10x-100x faster) for queries using appropriate indexes

🔄 Connection to Previous Learning: Optimizes the filtering and joining concepts from Days 4 and 6

💻 Try Exercise 2: Compare Plans

Exercise 3: Optimize JOIN Operations and Query Structure

Goal: Learn to write efficient JOINs and restructure queries to minimize data processing.

Step-by-Step Instructions:

Start with an inefficient query:

-- BAD: Filtering after JOIN (processes all rows first)
SELECT c.Name, o.OrderDate, o.Amount
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE c.Region = 'East'
  AND o.OrderDate >= '2024-01-01';

Optimize by filtering early:

-- BETTER: Filter before JOIN (fewer rows to join)
SELECT c.Name, o.OrderDate, o.Amount
FROM (SELECT CustomerID, Name FROM Customers WHERE Region = 'East') c
JOIN (SELECT CustomerID, OrderDate, Amount FROM Orders WHERE OrderDate >= '2024-01-01') o
ON c.CustomerID = o.CustomerID;

Or use CTEs for readability:

-- BEST: Clear intent, optimized by query engine
WITH EastCustomers AS (
    SELECT CustomerID, Name
    FROM Customers
    WHERE Region = 'East'
),
RecentOrders AS (
    SELECT CustomerID, OrderDate, Amount
    FROM Orders
    WHERE OrderDate >= '2024-01-01'
)
SELECT c.Name, o.OrderDate, o.Amount
FROM EastCustomers c
JOIN RecentOrders o ON c.CustomerID = o.CustomerID;

Avoid functions on indexed columns:

-- BAD: Function prevents index usage
SELECT * FROM Orders
WHERE YEAR(OrderDate) = 2024;

-- GOOD: Index can be used
SELECT * FROM Orders
WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01';

Use EXISTS instead of IN for large datasets:

-- LESS EFFICIENT with large subquery results
SELECT * FROM Customers
WHERE CustomerID IN (SELECT CustomerID FROM Orders WHERE Amount > 1000);

-- MORE EFFICIENT - stops at first match
SELECT * FROM Customers c
WHERE EXISTS (SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID AND o.Amount > 1000);

💡 Tips:

Filter early, filter often: Reduce data volume before expensive operations like JOINs
Avoid SELECT *: Specify only needed columns to reduce I/O
No functions on WHERE columns: YEAR(date), UPPER(name) prevent index usage
EXISTS vs IN: EXISTS is faster for large subqueries (short-circuits on first match)
Use INNER JOIN over subqueries: Modern optimizers handle JOINs better than nested subqueries
Avoid OR in WHERE: Use UNION instead, or restructure with separate queries

✅ Expected Result: Queries that process minimal data and use indexes effectively, resulting in faster execution

🔄 Connection to Previous Learning: Applies optimization principles to CTEs (Day 16) and subqueries (Day 10)

💻 Try Exercise 3: Optimize JOIN

Exercise 4: Use Query Hints and Force Optimization

Goal: Learn when and how to use query hints to force specific execution strategies when the optimizer makes poor choices.

Step-by-Step Instructions:

Understand the problem: Sometimes the query optimizer chooses a suboptimal plan

Force an index usage with INDEX hint:

-- Force use of specific index
SELECT OrderID, CustomerID, Amount
FROM Orders WITH (INDEX(IX_Orders_CustomerID))
WHERE CustomerID = 500;

Use FORCESEEK to force index seek:

-- Prevent table/index scan, force seek
SELECT OrderID, CustomerID, Amount
FROM Orders WITH (FORCESEEK)
WHERE CustomerID = 500;

Control JOIN algorithm with hints:

-- Force hash join for large datasets
SELECT c.Name, SUM(o.Amount) AS Total
FROM Customers c
INNER HASH JOIN Orders o ON c.CustomerID = o.CustomerID
GROUP BY c.Name;

-- Force loop join for small datasets
SELECT c.Name, o.Amount
FROM Customers c
INNER LOOP JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE c.Region = 'East';

Use MAXDOP to control parallelism:

-- Limit to single thread (useful for debugging)
SELECT CustomerID, SUM(Amount) AS Total
FROM Orders
GROUP BY CustomerID
OPTION (MAXDOP 1);

-- Allow up to 4 parallel threads
SELECT CustomerID, SUM(Amount) AS Total
FROM Orders
GROUP BY CustomerID
OPTION (MAXDOP 4);

Recompile for parameter-sensitive queries:

-- Force recompile for accurate statistics
SELECT * FROM Orders
WHERE Amount > @threshold
OPTION (RECOMPILE);

💡 Tips:

Use hints sparingly: Let optimizer do its job; only hint when you know better
Document why you hint: Future maintainers need to know why you overrode the optimizer
Test with and without: Always compare performance with and without hints
Hints can become outdated: As data grows, hint assumptions may become invalid
NOLOCK for dirty reads: WITH (NOLOCK) allows reading uncommitted data (faster but risky)
Parameter sniffing: Use OPTION (RECOMPILE) or OPTIMIZE FOR when parameter values vary widely

✅ Expected Result: Understanding of when to override the optimizer and how to force specific execution strategies

🔄 Connection to Previous Learning: Advanced technique building on all previous SQL optimization concepts

⚠️ Note: SQL Server query hints (WITH INDEX, MAXDOP) are not supported in SQLite. However, you can see how the optimizer creates proper plans automatically.

💻 Try Exercise 4: Query Analyzer

Exercise 5: Monitor and Measure Query Performance

Goal: Learn to measure query performance accurately and identify the actual bottlenecks using SQL Server tools.

Step-by-Step Instructions:

Enable detailed statistics:

SET STATISTICS TIME ON;  -- Show execution time
SET STATISTICS IO ON;    -- Show I/O operations

SELECT c.Name, COUNT(*) AS OrderCount, SUM(o.Amount) AS TotalSpent
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
GROUP BY c.Name
HAVING SUM(o.Amount) > 10000;

-- Review output:
-- SQL Server Execution Times:
--   CPU time = 156 ms,  elapsed time = 234 ms.
-- Table 'Orders'. Scan count 1, logical reads 2450

Interpret the statistics: - CPU time: Actual processing time - Elapsed time: Total time including waits - Logical reads: Pages read from memory (lower is better) - Physical reads: Pages read from disk (should be 0 after first run)

Measure baseline before optimization:

-- Create a performance test script
DECLARE @StartTime DATETIME = GETDATE();

-- Your query here
SELECT ...

DECLARE @EndTime DATETIME = GETDATE();
SELECT DATEDIFF(MILLISECOND, @StartTime, @EndTime) AS ExecutionTimeMS;

Use query store to track performance over time:

-- Enable Query Store (SQL Server 2016+)
ALTER DATABASE YourDatabase
SET QUERY_STORE = ON;

-- View top resource-consuming queries
SELECT TOP 10
    q.query_id,
    qt.query_sql_text,
    rs.avg_duration/1000 AS avg_duration_ms,
    rs.avg_logical_io_reads
FROM sys.query_store_query q
JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
JOIN sys.query_store_plan p ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
ORDER BY rs.avg_duration DESC;

Find missing indexes:

-- SQL Server suggests missing indexes
SELECT
    migs.avg_user_impact,
    migs.avg_total_user_cost,
    migs.user_seeks,
    mid.statement AS TableName,
    mid.equality_columns,
    mid.inequality_columns,
    mid.included_columns
FROM sys.dm_db_missing_index_groups mig
JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE migs.avg_user_impact > 50
ORDER BY migs.avg_user_impact DESC;

Document before/after metrics:

-- Before optimization
-- Execution time: 2,340 ms
-- Logical reads: 45,000
-- CPU time: 1,890 ms

-- After adding index and rewriting query
-- Execution time: 187 ms (12.5x faster!)
-- Logical reads: 1,200 (37.5x fewer reads!)
-- CPU time: 156 ms (12x less CPU)

💡 Tips:

Logical reads are key metric: Focus on reducing logical reads, not just execution time
Run queries multiple times: First run may be slower due to cold cache
Clear cache for testing: Use DBCC DROPCLEANBUFFERS (dev only!) for accurate testing
Production monitoring: Use Query Store or Extended Events for ongoing monitoring
Benchmark systematically: Always measure before and after each optimization
Set performance goals: "Query must complete in < 500ms" gives clear target

✅ Expected Result: Systematic approach to measuring and documenting query performance improvements

🔄 Connection to Previous Learning: Provides metrics to validate all previous optimization techniques

💻 Try Exercise 5: Measure Performance (Time)

Independent Practice (60 min):

💡 Hints for Independent Practice:

Start simple: Begin with basic variations before attempting complex modifications
Break it down: Divide complex problems into smaller, manageable steps
Test incrementally: Verify each step works before moving to the next
Use documentation: Refer to official docs or tutorials when needed
Experiment: Try different approaches to find what works best

Apply the concepts learned to different datasets and scenarios.

Daily Assignment (60 min):

Assignment:

Analysis:Identify bottlenecks (table scans, missing indexes, inefficient joins)
Optimization:Add indexes, rewrite queries, apply best practices
Comparison:Measure performance improvement (before/after execution time)
Documentation:Document each optimization with explanation

Expected Outcomes:

Analyze query execution plans
Identify performance bottlenecks
Optimize JOIN operations
Use appropriate indexing strategies
Write efficient WHERE clauses