DAY 10: SQL - Subqueries
Learning Objectives: - Use subqueries in WHERE clause - Create subqueries in SELECT - Understand IN and EXISTS operators - Learn correlated vs non-correlated subqueries
Topics Covered (60 min):
1. WHERE Subquery (Non-Correlated) -
SELECT * FROM Orders WHERE Sales > (SELECT AVG(Sales) FROM Orders)
- Find records above/below aggregate - Inner query runs once
2. IN Operator -
WHERE CustomerID IN (SELECT ID FROM Customers WHERE State='CA')
- Tests if value is in subquery results - Returns multiple values
3. EXISTS Operator -
WHERE EXISTS (SELECT 1 FROM OrderDetails WHERE OrderID = Orders.ID)
- Tests if subquery returns ANY rows - Usually faster than IN for large
datasets - Returns TRUE/FALSE
4. SELECT Subquery (Scalar) -
SELECT OrderID, (SELECT Name FROM Customers WHERE ID=Orders.CustomerID) AS Customer
- Must return single value - Adds calculated column
5. Correlated Subquery - References outer query -
WHERE Salary > (SELECT AVG(Salary) FROM Employees E2 WHERE E2.Department = E1.Department)
- Runs for each row (slower)
Video Resources:
CASE Statement and UNION SQL interview questions (10:00)
https://www.youtube.com/watch?v=F2JJWM40Dh4How to Use SQL Case Statements (13:40)
https://www.youtube.com/watch?v=iq-geVaMOosSQL Tutorial #36 - SQL UNION Operator (7:49)
https://www.youtube.com/watch?v=7QaIPZvu1NI
AI Learning Prompts (Concept Understanding Only):
Understanding Subqueries Basics:
Explain SQL subqueries:
1. What is a subquery and why use it?
2. Where can subqueries be placed (WHERE, SELECT, FROM)?
3. How do I write a subquery: SELECT ... WHERE col > (SELECT ...)?
4. Show me examples using Employee data (find employees with salary > average)
5. What's the difference between subquery and JOIN?
6. When should I use subquery vs JOIN?
Use HR or academic data for examples (not sales).
Understanding IN and EXISTS:
Explain IN and EXISTS with subqueries:
1. How does IN operator work with subqueries?
2. What's the syntax: WHERE column IN (SELECT ...)?
3. How does EXISTS differ from IN?
4. When is EXISTS more efficient than IN?
5. Show me examples: Find students who are enrolled in at least one class
6. What's the difference between NOT IN and NOT EXISTS?
Use school enrollment or membership data for examples.
Understanding Correlated Subqueries:
Explain correlated subqueries:
1. What makes a subquery "correlated"?
2. How does it reference the outer query?
3. Show me examples: Find employees earning more than average in their department
4. Why are correlated subqueries slower?
5. Can I rewrite correlated subqueries as JOINs?
Use departmental or grouped data for examples.
Understanding Subqueries in SELECT:
Explain subqueries in the SELECT clause:
1. How do I use subqueries to add calculated columns?
2. What are scalar subqueries (returning single value)?
3. Show me examples: Add "Department Average Salary" column to employee list
4. When should I use subquery vs JOIN for this?
Use organizational hierarchy examples.
Guided Practice (60 min):
📝 How to approach Guided Practice:
- Follow step-by-step: Work through each exercise in order
- Use provided resources: Reference the video tutorials and AI prompts if you get stuck
- Verify your work: Check your results match expected outcomes
- Take notes: Document key learnings and shortcuts you discover
- Ask for help: Use AI assistants to clarify concepts you don't understand
Exercise 1: Find Orders with Sales Above Average (Basic Subquery in WHERE)
Goal: Learn to use subqueries in the WHERE clause to compare against aggregate values.
Step-by-Step Instructions:
- Open your SQL editor (pgAdmin, MySQL Workbench, or online SQL environment)
- Start with the inner subquery first to understand what value it returns:
This calculates the average sales across all orders (e.g., $500)SELECT AVG(Sales) FROM Orders; - Now write the full query with the subquery in the WHERE clause:
SELECT OrderID, CustomerID, Sales, OrderDate FROM Orders WHERE Sales > (SELECT AVG(Sales) FROM Orders) ORDER BY Sales DESC; - Execute the query (press F5 or click Execute)
- Analyze the results: Notice all returned orders have sales greater than the average
💡 Tips:
- Test the subquery separately first - Run just the
SELECT AVG(Sales)...to see what value it returns - Subquery runs once: Non-correlated subqueries execute only one time, then the result is used for comparison
- Parentheses are required: Always wrap subqueries in parentheses
(SELECT ...) - ORDER BY helps verification: Sort by Sales DESC to easily verify all values are above average
- Alternative approach: You could use a CTE or JOIN with a derived table, but subqueries are cleaner for this use case
✅ Expected Result: A list of orders where Sales value is higher than the average (e.g., if average is $500, you'll see only orders with $501+)
🔄 Connection to Previous Learning: This is similar to Excel's AVERAGEIF from Day 3, but more flexible - you're filtering based on a calculated aggregate.
Exercise 2: Select Customers Who Have Placed Orders (Using IN Subquery)
Goal: Learn to use IN operator with subqueries to filter based on a list of values.
Step-by-Step Instructions:
- Understand the scenario: You have a Customers table and an Orders table. You want to find only customers who have placed at least one order
- First, test the subquery alone to see what it returns:
This returns a list of all customer IDs that appear in Orders (e.g., 101, 105, 108...)SELECT DISTINCT CustomerID FROM Orders; - Now write the full query using IN:
SELECT CustomerID, CustomerName, State, City FROM Customers WHERE CustomerID IN (SELECT DISTINCT CustomerID FROM Orders) ORDER BY CustomerName; - Execute and verify - All returned customers should have orders
- Try the opposite (NOT IN) to find customers without orders:
SELECT CustomerID, CustomerName, State, City FROM Customers WHERE CustomerID NOT IN (SELECT DISTINCT CustomerID FROM Orders);
💡 Tips:
- IN vs JOIN: This could also be done with
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID, but IN is more intuitive for this use case - Use DISTINCT in subquery: Prevents duplicate customer IDs in the list (though IN handles duplicates fine)
- NULL handling: Be careful with NOT IN if the subquery can return NULL - it can cause unexpected results. Use
WHERE CustomerID IS NOT NULLin subquery - Performance: For large datasets, EXISTS might be faster than IN
- Debugging tip: Run the subquery separately to verify it returns the expected list
✅ Expected Result: A list of customers from the Customers table who have at least one corresponding order in the Orders table
🔄 Connection to Previous Learning: Similar to Excel VLOOKUP filtering from Day 5, but here you're finding matching records across tables.
Exercise 3: Find Products That Have Been Sold (Using EXISTS)
Goal: Learn to use EXISTS operator for existence checks - often more efficient than IN.
Step-by-Step Instructions:
- Understand EXISTS: Returns TRUE if the subquery returns any rows, FALSE if it returns zero rows
- Write the EXISTS query to find products that have been sold:
SELECT ProductID, ProductName, Category, Price FROM Products P WHERE EXISTS ( SELECT 1 FROM OrderDetails OD WHERE OD.ProductID = P.ProductID ) ORDER BY ProductName; - Note the correlation: The subquery references
P.ProductIDfrom the outer query - this makes it a correlated subquery - Execute and analyze - You'll see only products that appear in OrderDetails
- Try NOT EXISTS to find products that have never been sold:
SELECT ProductID, ProductName, Category, Price FROM Products P WHERE NOT EXISTS ( SELECT 1 FROM OrderDetails OD WHERE OD.ProductID = P.ProductID ); - Compare with IN approach:
Both return same results, but EXISTS is typically faster for large datasets-- Alternative using IN SELECT ProductID, ProductName, Category, Price FROM Products WHERE ProductID IN (SELECT ProductID FROM OrderDetails);
💡 Tips:
- SELECT 1 vs SELECT *: In EXISTS, we use
SELECT 1because we only care if rows exist, not their values - more efficient - Performance advantage: EXISTS stops searching after finding first match; IN evaluates entire subquery
- Correlated subquery: Notice
OD.ProductID = P.ProductID- the subquery references outer query, so it runs for each product - Readability: EXISTS is more readable for "does this record exist" questions
- When to use: Prefer EXISTS over IN when checking existence in large tables or when subquery might return NULLs
✅ Expected Result: A list of products from the Products table that have at least one sale recorded in OrderDetails (or with NOT EXISTS, products with zero sales)
🔄 Connection to Previous Learning: This relates to Day 6 JOIN concepts - you're essentially checking for matching records between tables.
Exercise 4: Add Customer Name Using Subquery in SELECT (Scalar Subquery)
Goal: Learn to use subqueries in the SELECT clause to add calculated columns.
Step-by-Step Instructions:
- Understand scalar subqueries: A subquery in SELECT must return exactly one value (one row, one column)
- Start with a basic Orders query:
This shows orders but only CustomerID, not the nameSELECT OrderID, CustomerID, Sales, OrderDate FROM Orders; - Add customer name using a scalar subquery:
SELECT OrderID, CustomerID, (SELECT CustomerName FROM Customers C WHERE C.CustomerID = Orders.CustomerID) AS CustomerName, Sales, OrderDate FROM Orders ORDER BY OrderDate DESC LIMIT 10; - Execute and verify - Each order now shows the customer name retrieved from Customers table
- Add multiple scalar subqueries:
SELECT OrderID, (SELECT CustomerName FROM Customers C WHERE C.CustomerID = Orders.CustomerID) AS CustomerName, (SELECT State FROM Customers C WHERE C.CustomerID = Orders.CustomerID) AS CustomerState, Sales, OrderDate FROM Orders WHERE Sales > 1000 ORDER BY Sales DESC; - Compare with JOIN approach:
Both approaches work, but JOIN is typically better for performance-- Alternative using JOIN (usually more efficient) SELECT O.OrderID, C.CustomerName, C.State AS CustomerState, O.Sales, O.OrderDate FROM Orders O INNER JOIN Customers C ON O.CustomerID = C.CustomerID WHERE O.Sales > 1000 ORDER BY O.Sales DESC;
💡 Tips:
- Must return single value: If subquery returns multiple rows or columns, you'll get an error
- Correlated subquery: Notice
C.CustomerID = Orders.CustomerID- runs once per order row - Performance consideration: This runs the subquery for every row in Orders - can be slow for large datasets
- When to use: Use scalar subqueries when you need just one or two extra columns; use JOINs when you need multiple columns from related table
- NULL handling: If no matching customer found, subquery returns NULL
- Alternative approach: For better performance, use JOIN instead (shown in step 6)
✅ Expected Result: Orders list with CustomerName column added, showing the name from Customers table for each order
🔄 Connection to Previous Learning: This is the SQL equivalent of Excel's VLOOKUP from Day 5 - looking up related data from another table.
Exercise 5: Find Top 10% of Orders by Revenue (Advanced Subquery)
Goal: Learn to use subqueries with percentiles and ranking logic for advanced filtering.
Step-by-Step Instructions:
- Understand the goal: Find orders in the top 10% by sales value (e.g., if you have 1000 orders, find the top 100)
- Method 1 - Using LIMIT with subquery (for MySQL/PostgreSQL):
This finds the sales value at the 90th percentile, then returns all orders >= that valueSELECT OrderID, CustomerID, Sales, OrderDate FROM Orders WHERE Sales >= ( SELECT Sales FROM Orders ORDER BY Sales DESC LIMIT 1 OFFSET (SELECT COUNT(*) * 0.1 FROM Orders) ) ORDER BY Sales DESC; - Method 2 - Using PERCENT_RANK window function (more elegant):
This calculates the percentile for each order, then filters to top 10%SELECT OrderID, CustomerID, Sales, OrderDate FROM ( SELECT OrderID, CustomerID, Sales, OrderDate, PERCENT_RANK() OVER (ORDER BY Sales DESC) AS percentile FROM Orders ) AS RankedOrders WHERE percentile <= 0.10 ORDER BY Sales DESC; - Method 3 - Using COUNT subquery (works in all SQL databases):
This counts how many orders have higher sales than current order, keeps those in top 10%SELECT OrderID, CustomerID, Sales, OrderDate FROM Orders O1 WHERE ( SELECT COUNT(*) FROM Orders O2 WHERE O2.Sales > O1.Sales ) <= (SELECT COUNT(*) * 0.10 FROM Orders) ORDER BY Sales DESC; - Execute your preferred method and analyze the results
- Verify the percentage:
The second count should be ~10% of the first-- Count total orders SELECT COUNT(*) AS TotalOrders FROM Orders; -- Count orders in your result SELECT COUNT(*) AS Top10PercentOrders FROM [your query here]; - Extend to find bottom 10%:
-- Using Method 2, just change the WHERE clause SELECT OrderID, CustomerID, Sales, OrderDate FROM ( SELECT OrderID, CustomerID, Sales, OrderDate, PERCENT_RANK() OVER (ORDER BY Sales ASC) AS percentile FROM Orders ) AS RankedOrders WHERE percentile <= 0.10 ORDER BY Sales ASC;
💡 Tips:
- Choose the right method: Method 2 (PERCENT_RANK) is most efficient for modern databases; Method 3 works everywhere but slower
- Subquery in FROM: Method 2 uses a subquery in FROM clause (called derived table or inline view) - very powerful technique
- Window functions: PERCENT_RANK is a window function (preview of Day 12) - provides percentile without grouping
- Performance: Method 1 and 2 are fast; Method 3 with correlated subquery can be slow on large datasets
- Exact vs approximate: Depending on ties in Sales values, you might get slightly more or less than exactly 10%
- Database differences: LIMIT syntax varies (LIMIT for MySQL/PostgreSQL, TOP for SQL Server, FETCH FIRST for Oracle)
✅ Expected Result: A list of orders representing the top 10% by sales value - if you have 1000 orders, you should see approximately 100 orders with the highest sales values
🔄 Connection to Previous Learning: This combines concepts from Day 8 (aggregates) with subqueries to create sophisticated filtering - similar to Excel's top 10% filtering with AutoFilter from Day 3.
🎮 Interactive SQL Playground 1: Above Average Sales
Practice finding orders with sales above average. Try modifying the query to find below-average sales or use different aggregates!
🎮 Interactive SQL Playground 2: Customers with Orders (IN)
Find customers who have placed orders using IN operator. Try using NOT IN to find customers without orders!
🎮 Interactive SQL Playground 3: Products Sold (EXISTS)
Find products that have been sold using EXISTS. Compare performance with IN approach!
🎮 Interactive SQL Playground 4: Scalar Subquery in SELECT
Add customer name to orders using scalar subquery. Try adding more columns like State or City!
🎮 Interactive SQL Playground 5: Top 10% by Revenue
Find orders in the top 10% by sales. Experiment with different percentiles or find bottom performers!
Independent Practice (60 min):
💡 Hints for Independent Practice:
- Start simple: Begin with basic variations before attempting complex modifications
- Break it down: Divide complex problems into smaller, manageable steps
- Test incrementally: Verify each step works before moving to the next
- Use documentation: Refer to official docs or tutorials when needed
- Experiment: Try different approaches to find what works best
- Customers with above-average order value
- Products in top 3 revenue categories (subquery)
- Orders from California customers (IN subquery)
- Customers with more than 5 orders (subquery + HAVING)
- Products never ordered (NOT EXISTS)
- Orders above category average (correlated)
- High-value customers (top 20% by total sales)
- Products with above-average profit margin
- Regions performing above company average
- Customers with orders in last 30 days (subquery + date)
- Products in profitable categories only
- Orders from best customers (defined by criteria)
Daily Assignment (60 min):
- Create complex analysis using subqueries for comparisons
- Compare with Excel Day 9 multi-criteria formulas
- Find patterns using subqueries (above average, top performers, etc.)
- Save as: Day10_SQL_Practice.sql
- WEEK 2 TEST: Complete Week 2 assessment (provided separately by instructor)
Expected Outcomes:
✓ Write subqueries in various clauses
✓ Use IN and EXISTS effectively
✓ Understand when to use subqueries vs JOINs
✓ Submit: Day10_SQL_Practice.sql
📝 Day 10 Quick Quiz - Test Your Knowledge!
Q1. What is a subquery?
Q2. Where can subqueries be used?
Q3. What is a correlated subquery?
Q4. What does the IN operator do with a subquery?
Q5. Which is more efficient for checking existence?