Indexes & Performance
Learn how to make your SQL queries lightning fast! Master the secret weapon of database performance.
🚀 What Are Indexes?
Imagine searching for a specific name in a 1,000-page phone book. Without an alphabetical index, you'd have to flip through every single page! Indexes in databases work the same way - they help the database find data super quickly without scanning every row.
🎯 Real-Life Analogy: Indexes are Like a Book's Index Page
Think about how you use a textbook:
- Without an index: You flip through every page to find "Photosynthesis" (slow!)
- With an index: You look at the index page, see "Photosynthesis - Page 87", jump directly there (fast!)
Database indexes work exactly like this! They create a quick lookup table so the database doesn't have to scan every row.
What is a Database Index?
An index is a special data structure that stores a sorted copy of selected columns from a table, along with pointers to the actual rows.
- Purpose: Speed up data retrieval (SELECT queries)
- Trade-off: Uses extra disk space and slows down INSERT/UPDATE/DELETE slightly
- When it helps: When searching, sorting, or filtering large tables
Real-World Example: Finding a Customer
Scenario: You have a table with 1 million customer records.
Without an index:
SELECT * FROM customers WHERE email = 'alice@example.com';
❌ Database scans all 1,000,000 rows one-by-one (Full Table Scan) - SLOW!
With an index on email:
SELECT * FROM customers WHERE email = 'alice@example.com';
✅ Database uses the index, finds the row instantly - FAST! (like using a phone book's alphabetical ordering)
🔍 Quick Summary:
- Index = Speed Boost for searching and filtering
- Works like a sorted lookup table
- Most useful on columns you frequently search/filter by (WHERE, JOIN, ORDER BY)
🔍 How Indexes Work Behind the Scenes
Let's peek under the hood and understand exactly how indexes make queries faster!
🎯 Real-Life Analogy: Library Card Catalog
Think of an old-school library:
- Books on shelves: Your actual data (table rows)
- Card catalog: The index (sorted by title, author, or subject)
- Finding a book: Instead of walking through every aisle, you look in the card catalog, which tells you exactly which shelf to go to!
The card catalog doesn't contain the entire book - just a reference to where the book is located. Same with database indexes!
Step-by-Step: How a Query Uses an Index
Example Table: employees
| id | name | salary | |
|---|---|---|---|
| 1 | Alice | alice@company.com | 75000 |
| 2 | Bob | bob@company.com | 82000 |
| 3 | Carol | carol@company.com | 91000 |
Imagine this table has 100,000 rows instead of just 3...
Query WITHOUT an Index:
SELECT * FROM employees WHERE email = 'bob@company.com';
What the database does (Full Table Scan):
- Start at Row 1 → Check if email = 'bob@company.com' → No, keep going
- Go to Row 2 → Check if email = 'bob@company.com' → Yes! Found it!
- Problem: Had to check rows sequentially until it found the match
- With 100,000 rows, it might scan 50,000 rows on average (very slow!)
Query WITH an Index on email:
Index Structure (simplified):
Index on email column:
alice@company.com → Row 1
bob@company.com → Row 2
carol@company.com → Row 3
(sorted alphabetically!)
What the database does (Index Lookup):
- Look at the index (which is sorted like a dictionary)
- Use binary search to find 'bob@company.com' in the index (very fast!)
- Index says "bob@company.com is in Row 2"
- Jump directly to Row 2, retrieve the data
- Result: Found in just a few comparisons instead of scanning all rows!
🔍 Technical Detail: B-Tree Index Structure
Most databases use a B-Tree (Balanced Tree) data structure for indexes:
- Sorted structure: Data is kept in sorted order
- Fast lookups: Can find any value in O(log N) time (very fast, even for millions of rows)
- Balanced: Tree stays balanced, ensuring consistent performance
Analogy: Think of it like a game of "20 Questions" - each step eliminates half the possibilities, so you find the answer super quickly!
Full Table Scan
Without Index
Time: O(N) - Linear
Checks every single row
Index Lookup
With Index
Time: O(log N) - Logarithmic
Binary search in sorted index
Speed Comparison Example:
For a table with 1,000,000 rows:
- Full Table Scan: ~500,000 comparisons on average
- Index Lookup: ~20 comparisons (log₂ 1,000,000 ≈ 20)
- Speed improvement: ~25,000x faster!
🛠️ Creating Indexes
Now let's learn how to actually create indexes in SQL!
CREATE INDEX Syntax:
CREATE INDEX index_name
ON table_name (column_name);
Try It: Create Your First Index
🔍 What Just Happened?
- We created a table called
products - We created an index named
idx_categoryon thecategorycolumn - Now when we search by category, the database uses the index for faster lookups!
- The index is automatically maintained - when you INSERT/UPDATE/DELETE, the index updates too
Creating Indexes on Multiple Columns (Composite Index):
Try It: Composite Index
🎯 Composite Index Explained:
A composite index on (customer_id, status) works like a phone book sorted by:
- First: Last Name (customer_id in our case)
- Then: First Name (status in our case)
Good for queries that filter by:
- ✅ customer_id only
- ✅ customer_id AND status
- ❌ status only (can't use the index efficiently - like trying to find all "Johns" in a phone book sorted by last name)
Unique Indexes:
A UNIQUE index not only speeds up queries but also enforces uniqueness (like a UNIQUE constraint):
CREATE UNIQUE INDEX idx_email ON users(email);
This ensures no two users can have the same email AND makes email lookups fast!
Removing Indexes:
DROP INDEX index_name;
Use this if you no longer need an index or if it's slowing down INSERT/UPDATE operations too much.
⚖️ When to Use Indexes (And When NOT To)
Indexes are powerful, but they're not always the right choice. Let's learn when to use them!
🎯 Real-Life Analogy: Index Trade-offs
Think about organizing a small personal library vs. a huge city library:
- Small library (10 books): No need for a card catalog - just look at the shelf!
- Huge library (1 million books): Definitely need a card catalog - you'd never find anything otherwise!
Same with database indexes: Small tables don't need indexes, but large tables do!
✅ GOOD Reasons to Create an Index:
1. Large Tables
Tables with thousands or millions of rows
Benefit: Huge speed improvement
2. Frequent Searches
Columns used often in WHERE clauses
Example: WHERE email = '...'
3. JOIN Columns
Foreign keys used in JOINs
Example: customer_id, product_id
4. ORDER BY
Columns used for sorting
Example: ORDER BY created_date
5. Unique Constraints
Enforce uniqueness AND speed up lookups
Example: username, email
❌ BAD Reasons to Create an Index:
1. Small Tables
Tables with < 1000 rows
Why? Full scan is already fast enough
2. Rarely Queried Columns
Columns you never search by
Why? Wastes space, slows down writes
3. High Write/Low Read Tables
Tables with many INSERT/UPDATE operations
Why? Index maintenance overhead
4. Low Selectivity Columns
Columns with few unique values
Example: gender (M/F) - index won't help much
🔍 The Trade-offs of Indexes:
| Benefits | Costs | |
|---|---|---|
| SELECT | ✅ Much faster queries | - |
| INSERT | - | ❌ Slower (must update index) |
| UPDATE | - | ❌ Slower (must update index) |
| DELETE | - | ❌ Slower (must update index) |
| Storage | - | ❌ Uses extra disk space |
Rule of Thumb:
Create indexes on columns where:
- You frequently use WHERE, JOIN, or ORDER BY
- The table has many rows (> 1000)
- The column has high selectivity (many unique values)
- You read data more often than you write it
📚 Types of Indexes
Different types of indexes for different needs!
1. Primary Key Index
Automatically created when you define PRIMARY KEY
CREATE TABLE users (
id INTEGER PRIMARY KEY
);
Always unique, never NULL
2. Unique Index
Enforces uniqueness AND speeds up lookups
CREATE UNIQUE INDEX
idx_email
ON users(email);
No duplicate values allowed
3. Single-Column Index
Index on one column
CREATE INDEX idx_name
ON users(last_name);
Most common type
4. Composite Index
Index on multiple columns
CREATE INDEX idx_name_age
ON users(last_name, age);
Good for multi-column filters
5. Partial Index
Index only certain rows (SQLite 3.8+)
CREATE INDEX idx_active
ON users(email)
WHERE status = 'active';
Saves space, faster updates
6. Full-Text Index
For text search (FTS in SQLite)
CREATE VIRTUAL TABLE
articles_fts USING fts5(title, content);
Great for search features
🔍 Which Index Type Should You Use?
- Primary Key: Always use for the main identifier
- Unique Index: For columns that must be unique (email, username)
- Single-Column: For columns you frequently search by alone
- Composite: When you often filter by multiple columns together
- Partial: When you only care about certain rows (e.g., active users)
- Full-Text: For searching within text content (articles, comments)
⚠️ Common Performance Mistakes
Learn from these common pitfalls that slow down your queries!
🎯 Real-Life Analogy: Performance Mistakes are Like Traffic Jams
Just like a traffic jam can be caused by:
- Missing highway signs: No indexes = everyone searching randomly
- Too many toll booths: Too many indexes = delays at every turn
- Taking the long route: Poor query design = wasted time
Let's learn how to avoid these performance "traffic jams"!
Mistake #1: Forgetting to Index Foreign Keys
❌ Bad Practice:
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER, -- ❌ No index on foreign key!
product_id INTEGER, -- ❌ No index on foreign key!
order_date TEXT
);
-- This JOIN query will be SLOW without indexes
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.id = 12345;
Problem: JOINs on un-indexed columns force full table scans!
✅ Good Practice:
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER,
product_id INTEGER,
order_date TEXT
);
-- Create indexes on foreign keys
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_product_id ON orders(product_id);
-- Now the JOIN is FAST!
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.id = 12345;
Why it matters: Foreign keys are almost always used in JOINs. Without indexes, every JOIN becomes a full table scan, making queries exponentially slower as tables grow!
Mistake #2: Using SELECT * Instead of Specific Columns
❌ Bad Practice:
-- Fetches ALL 50 columns even if you only need 2!
SELECT * FROM users WHERE id = 100;
Problems:
- Transfers unnecessary data over the network
- Uses more memory
- Slower query execution
- Can break code if table structure changes
✅ Good Practice:
-- Only fetch what you need
SELECT id, name, email FROM users WHERE id = 100;
See the Difference:
Mistake #3: Using Functions on Indexed Columns in WHERE
❌ Bad Practice:
-- Index CANNOT be used because of UPPER() function
CREATE INDEX idx_name ON users(name);
SELECT * FROM users WHERE UPPER(name) = 'ALICE';
Problem: Functions prevent index usage! Database must scan all rows, apply UPPER() to each, then compare.
✅ Good Practice:
-- Index CAN be used - much faster!
CREATE INDEX idx_name ON users(name);
SELECT * FROM users WHERE name = 'Alice';
-- Or use COLLATE for case-insensitive search
SELECT * FROM users WHERE name = 'alice' COLLATE NOCASE;
Rule: Never apply functions to indexed columns in WHERE clauses. If needed, apply the function to the comparison value instead, or create a functional index (advanced).
Mistake #4: Over-Indexing (Too Many Indexes)
❌ Bad Practice:
-- Creating indexes on EVERY column (overkill!)
CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT,
category TEXT,
price REAL,
description TEXT
);
CREATE INDEX idx_name ON products(name);
CREATE INDEX idx_category ON products(category);
CREATE INDEX idx_price ON products(price);
CREATE INDEX idx_description ON products(description);
-- Too many indexes!
Problems:
- Every INSERT/UPDATE/DELETE must update 5 indexes!
- Wastes disk space
- Index maintenance overhead
- Query planner gets confused with too many choices
✅ Good Practice:
-- Only index what you actually search/filter by
CREATE TABLE products (
id INTEGER PRIMARY KEY, -- Already indexed
name TEXT,
category TEXT,
price REAL,
description TEXT
);
-- Only index frequently queried columns
CREATE INDEX idx_category ON products(category);
CREATE INDEX idx_price ON products(price);
-- That's enough for most use cases!
Mistake #5: Not Using LIMIT on Large Result Sets
❌ Bad Practice:
-- Fetches ALL 10 million rows when you only need 20!
SELECT * FROM logs ORDER BY created_at DESC;
Problem: Wastes memory, bandwidth, and time fetching data you won't use.
✅ Good Practice:
-- Only fetch what you need
SELECT * FROM logs ORDER BY created_at DESC LIMIT 20;
Practice: Avoid Common Mistakes
🔍 Quick Mistake Checklist:
Before running a query in production, ask yourself:
- ✓ Are my foreign keys indexed?
- ✓ Am I using SELECT columns instead of SELECT *?
- ✓ Am I avoiding functions on indexed columns in WHERE?
- ✓ Do I have too many indexes (more than 3-5 per table)?
- ✓ Am I using LIMIT when I don't need all rows?
- ✓ Are my WHERE columns indexed?
⚡ Query Optimization Basics
Learn to write queries that run blazing fast!
🎯 What is Query Optimization?
Query optimization is the art of writing SQL queries that execute as efficiently as possible. It's like taking the fastest route on a road trip instead of the scenic route!
Optimization Techniques:
1. Use Indexes
Create indexes on frequently queried columns
Impact: 100x-1000x faster
2. SELECT Only What You Need
Avoid SELECT *
-- Bad
SELECT * FROM users;
-- Good
SELECT id, name FROM users;
3. Use LIMIT
Limit results when you don't need all rows
SELECT * FROM logs
ORDER BY created_at DESC
LIMIT 100;
4. Avoid Functions in WHERE
Functions prevent index usage
-- Bad (can't use index)
WHERE UPPER(name) = 'ALICE'
-- Good (can use index)
WHERE name = 'Alice'
5. Use EXISTS Instead of IN
For subqueries, EXISTS is often faster
-- Good
WHERE EXISTS (
SELECT 1 FROM orders
WHERE customer_id = c.id
)
6. Avoid OR, Use UNION
OR can prevent index usage
-- Instead of OR, use UNION
SELECT * FROM users
WHERE city = 'NYC'
UNION
SELECT * FROM users
WHERE age > 65;
Before & After Optimization Example:
-- ❌ SLOW Query (No index, SELECT *, unnecessary sorting)
SELECT * FROM customers
WHERE LOWER(email) LIKE '%@gmail.com%'
ORDER BY created_at;
-- ✅ FAST Query (Uses index, selects only needed columns, added LIMIT)
CREATE INDEX idx_email ON customers(email);
CREATE INDEX idx_created ON customers(created_at);
SELECT id, email, name FROM customers
WHERE email LIKE '%@gmail.com'
ORDER BY created_at DESC
LIMIT 100;
🔍 Using EXPLAIN QUERY PLAN
Learn to diagnose exactly how your queries are executed!
🎯 What is EXPLAIN QUERY PLAN?
EXPLAIN QUERY PLAN is like getting X-ray vision into your queries! It shows you exactly how the database will execute your query before it runs.
Real-Life Analogy: It's like getting GPS directions before a road trip. You can see if the route is efficient or if you're taking unnecessary detours!
Basic Syntax:
EXPLAIN QUERY PLAN
SELECT * FROM users WHERE email = 'alice@example.com';
Understanding the Output:
❌ SCAN TABLE
Bad! Full table scan
Database checks every single row
Slow for large tables
✅ SEARCH USING INDEX
Good! Index lookup
Database uses index for fast search
Fast even for millions of rows
Try It: See EXPLAIN in Action
Run these queries and observe the difference:
Real EXPLAIN QUERY PLAN Examples:
Example 1: Full Table Scan (Slow)
EXPLAIN QUERY PLAN
SELECT * FROM orders WHERE customer_id = 100;
-- Output would show:
-- SCAN TABLE orders
-- ❌ This means it's checking every row - SLOW!
Example 2: Index Usage (Fast)
CREATE INDEX idx_customer ON orders(customer_id);
EXPLAIN QUERY PLAN
SELECT * FROM orders WHERE customer_id = 100;
-- Output would show:
-- SEARCH TABLE orders USING INDEX idx_customer (customer_id=?)
-- ✅ Using the index - FAST!
Example 3: JOIN with Indexes
EXPLAIN QUERY PLAN
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
-- Good output:
-- SEARCH TABLE orders
-- SEARCH TABLE customers USING INDEX PRIMARY KEY (id=?)
-- ✅ Both tables using indexes!
🔍 What to Look For in EXPLAIN Output:
- ✅ GOOD Signs:
- "SEARCH" instead of "SCAN"
- "USING INDEX" appears
- Index names are mentioned
- ❌ BAD Signs:
- "SCAN TABLE" (full table scan)
- No mention of indexes
- "TEMP B-TREE" for sorts (might need ORDER BY index)
💡 When to Use EXPLAIN QUERY PLAN:
- Before deploying: Check new queries are optimized
- When debugging: Find out why a query is slow
- After indexing: Verify indexes are being used
- Performance testing: Compare different query approaches
🔧 Troubleshooting Slow Queries
Step-by-step guide to identify and fix performance problems!
🎯 The Troubleshooting Process
Think of it like diagnosing a car problem:
- Identify symptoms: Query is slow
- Check the basics: Are indexes present?
- Use diagnostics: EXPLAIN QUERY PLAN
- Apply fix: Add index or rewrite query
- Test: Measure improvement
Problem #1: Query Takes Too Long
Symptoms:
- Query hangs or times out
- Takes several seconds when it should be instant
- Application becomes unresponsive
Diagnosis Steps:
- Check for indexes on WHERE columns:
-- If this query is slow: SELECT * FROM orders WHERE customer_id = 100; -- Check if index exists: -- In a real database, you'd query system tables -- In SQLite, there's no easy way in sandbox, but you can try: -- SELECT name FROM sqlite_master WHERE type='index'; - Look for full table scans:
-- Use EXPLAIN to see execution plan EXPLAIN QUERY PLAN SELECT * FROM orders WHERE customer_id = 100; -- If you see "SCAN TABLE" → Problem found! - Apply the fix:
-- Create the missing index CREATE INDEX idx_customer ON orders(customer_id); -- Retry query - should be fast now! SELECT * FROM orders WHERE customer_id = 100;
Problem #2: JOIN Queries Are Slow
Common Cause:
Missing indexes on foreign key columns!
-- Slow JOIN query:
SELECT o.order_id, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
-- Solution: Index the foreign key
CREATE INDEX idx_customer_id ON orders(customer_id);
-- Now the JOIN is fast!
Problem #3: ORDER BY Takes Forever
Common Cause:
No index on the column being sorted!
-- Slow sorting:
SELECT * FROM logs ORDER BY created_at DESC LIMIT 100;
-- Solution: Index the sort column
CREATE INDEX idx_created ON logs(created_at);
-- Much faster now!
Problem #4: Complex Queries with Multiple WHERE Conditions
Solution: Composite Index
-- Slow query with multiple filters:
SELECT * FROM products
WHERE category = 'Electronics'
AND price < 500
AND in_stock = 1;
-- Create composite index on all filter columns
CREATE INDEX idx_cat_price_stock
ON products(category, price, in_stock);
-- Query now uses the composite index!
🔧 Troubleshooting Flowchart:
Query is slow?
↓
Check: Does table have > 1000 rows?
↓ YES
Check: Are there indexes on WHERE/JOIN/ORDER BY columns?
↓ NO
Create appropriate indexes
↓
Retest query
↓ Still slow?
Check: Using SELECT *?
↓ YES
Change to SELECT specific columns
↓
Check: Using functions in WHERE clause?
↓ YES
Remove functions or use functional index
↓
Check: Missing LIMIT on large results?
↓ YES
Add LIMIT clause
↓
Query should now be fast!
Practice: Fix a Slow Query
Identify and fix the performance problem:
🔍 Quick Troubleshooting Checklist:
When a query is slow, check these in order:
- ✓ Do I have indexes on all WHERE columns?
- ✓ Do I have indexes on all JOIN columns (foreign keys)?
- ✓ Do I have indexes on ORDER BY columns?
- ✓ Am I using SELECT specific columns (not SELECT *)?
- ✓ Am I using LIMIT when appropriate?
- ✓ Am I avoiding functions on indexed columns in WHERE?
- ✓ Is my table properly normalized (no huge text fields)?
Most common fix: Add an index! ~80% of slow queries are solved by proper indexing.
💡 Performance Tips & Best Practices
Practical tips to keep your database running at peak performance!
✅ DO: Index Foreign Keys
Always index columns used in JOINs
CREATE INDEX idx_customer
ON orders(customer_id);
✅ DO: Keep Indexes Small
Index smaller data types when possible
INTEGER index is faster than TEXT
✅ DO: Monitor Performance
Use EXPLAIN to check query plans
Measure before/after optimization
❌ DON'T: Over-Index
Too many indexes slow down writes
Only index what you actually use
❌ DON'T: Index Small Tables
Tables with < 1000 rows don't benefit
Full scan is fast enough
❌ DON'T: Index Low-Selectivity
Columns with few unique values
Example: boolean flags, gender
Quick Performance Checklist:
- ✅ Do I have indexes on columns used in WHERE clauses?
- ✅ Do I have indexes on foreign key columns?
- ✅ Am I selecting only the columns I need?
- ✅ Am I using LIMIT when I don't need all rows?
- ✅ Are my queries avoiding functions on indexed columns?
- ✅ Have I tested with EXPLAIN QUERY PLAN?
🔍 Real-World Example: E-Commerce Database
Common Query: "Show me all orders for customer #12345"
-- Without index: Scans all 10 million order rows (SLOW!)
SELECT * FROM orders WHERE customer_id = 12345;
-- With index: Uses index to find matching rows instantly (FAST!)
CREATE INDEX idx_customer_id ON orders(customer_id);
SELECT * FROM orders WHERE customer_id = 12345;
Impact: Query time reduced from 5 seconds to 0.01 seconds!
Practice Exercises
Part 1: Guided Exercises
Guided Exercise 1: Create an Index for Faster Lookups
Task: Create a table called "customers" with id, name, email, and city. Then create an index on the email column to speed up email searches.
Guided Exercise 2: Composite Index for Multi-Column Queries
Task: Create a composite index on category and price for faster multi-filter queries.
Guided Exercise 3: Optimize a JOIN Query
Task: Create necessary indexes to make a JOIN query fast.
Guided Exercise 4: Fix Slow ORDER BY
Task: Add an index to speed up sorting operations.
Part 2: Independent Practice
Challenge 1: Optimize a Slow Query
Difficulty: Medium
Task: You have a "products" table with product_id, name, category, and price. Create a composite index that would speed up queries filtering by both category and price range.
Challenge 2: Social Media Posts Performance
Difficulty: Medium
Task: Create a "social_posts" table with post_id, user_id, content, created_at, and likes_count. Add appropriate indexes to optimize queries that: (1) show a user's posts ordered by date, and (2) find posts by creation date.
Challenge 3: E-Commerce Order Analytics
Difficulty: Hard
Task: Design an optimized schema for order analytics. Create "customers", "orders", and "order_items" tables with proper indexes. Optimize for queries like: (1) customer order history, (2) orders by date range, and (3) product sales reports.
Challenge 4: Search Engine Optimization
Difficulty: Medium
Task: Create an "articles" table with article_id, title, author, content, published_date, and view_count. Add indexes to optimize: (1) searching articles by author, (2) finding recent articles, and (3) finding most-viewed articles.
Challenge 5: Multi-Tenant Application
Difficulty: Hard
Task: Create a multi-tenant "tasks" table (task_id, tenant_id, user_id, title, status, due_date). Add composite indexes to optimize queries filtering by tenant first (data isolation), then by user or status.
📝 Knowledge Check Quiz
1. What is the main purpose of a database index?
2. What is a trade-off of adding many indexes to a table?
3. When is it NOT beneficial to create an index?
4. You have a composite index on (customer_id, order_date). Which query will benefit MOST from this index?
5. Why should you avoid using functions on indexed columns in WHERE clauses?
6. In EXPLAIN QUERY PLAN output, what does 'SEARCH TABLE USING INDEX' indicate?
7. What's the main problem with using SELECT * in production queries?
8. Why is it important to index foreign key columns?