DAY 10: SQL - Subqueries

Learning Objectives: - Use subqueries in WHERE clause - Create subqueries in SELECT - Understand IN and EXISTS operators - Learn correlated vs non-correlated subqueries

Topics Covered (60 min):

1. WHERE Subquery (Non-Correlated) - SELECT * FROM Orders WHERE Sales > (SELECT AVG(Sales) FROM Orders) - Find records above/below aggregate - Inner query runs once

2. IN Operator - WHERE CustomerID IN (SELECT ID FROM Customers WHERE State='CA') - Tests if value is in subquery results - Returns multiple values

3. EXISTS Operator - WHERE EXISTS (SELECT 1 FROM OrderDetails WHERE OrderID = Orders.ID) - Tests if subquery returns ANY rows - Usually faster than IN for large datasets - Returns TRUE/FALSE

4. SELECT Subquery (Scalar) - SELECT OrderID, (SELECT Name FROM Customers WHERE ID=Orders.CustomerID) AS Customer - Must return single value - Adds calculated column

5. Correlated Subquery - References outer query - WHERE Salary > (SELECT AVG(Salary) FROM Employees E2 WHERE E2.Department = E1.Department) - Runs for each row (slower)

Video Resources:

CASE Statement and UNION SQL interview questions (10:00)
https://www.youtube.com/watch?v=F2JJWM40Dh4
How to Use SQL Case Statements (13:40)
https://www.youtube.com/watch?v=iq-geVaMOos
SQL Tutorial #36 - SQL UNION Operator (7:49)
https://www.youtube.com/watch?v=7QaIPZvu1NI

AI Learning Prompts (Concept Understanding Only):

Understanding Subqueries Basics:

Explain SQL subqueries:

1. What is a subquery and why use it?
2. Where can subqueries be placed (WHERE, SELECT, FROM)?
3. How do I write a subquery: SELECT ... WHERE col > (SELECT ...)?
4. Show me examples using Employee data (find employees with salary > average)
5. What's the difference between subquery and JOIN?
6. When should I use subquery vs JOIN?

Use HR or academic data for examples (not sales).

Understanding IN and EXISTS:

Explain IN and EXISTS with subqueries:

1. How does IN operator work with subqueries?
2. What's the syntax: WHERE column IN (SELECT ...)?
3. How does EXISTS differ from IN?
4. When is EXISTS more efficient than IN?
5. Show me examples: Find students who are enrolled in at least one class
6. What's the difference between NOT IN and NOT EXISTS?

Use school enrollment or membership data for examples.

Understanding Correlated Subqueries:

Explain correlated subqueries:

1. What makes a subquery "correlated"?
2. How does it reference the outer query?
3. Show me examples: Find employees earning more than average in their department
4. Why are correlated subqueries slower?
5. Can I rewrite correlated subqueries as JOINs?

Use departmental or grouped data for examples.

Understanding Subqueries in SELECT:

Explain subqueries in the SELECT clause:

1. How do I use subqueries to add calculated columns?
2. What are scalar subqueries (returning single value)?
3. Show me examples: Add "Department Average Salary" column to employee list
4. When should I use subquery vs JOIN for this?

Use organizational hierarchy examples.

Guided Practice (60 min):

📝 How to approach Guided Practice:

Follow step-by-step: Work through each exercise in order
Use provided resources: Reference the video tutorials and AI prompts if you get stuck
Verify your work: Check your results match expected outcomes
Take notes: Document key learnings and shortcuts you discover
Ask for help: Use AI assistants to clarify concepts you don't understand

Exercise 1: Find Orders with Sales Above Average (Basic Subquery in WHERE)

Goal: Learn to use subqueries in the WHERE clause to compare against aggregate values.

Step-by-Step Instructions:

Open your SQL editor (pgAdmin, MySQL Workbench, or online SQL environment)
Start with the inner subquery first to understand what value it returns:
```
SELECT AVG(Sales) FROM Orders;
```
This calculates the average sales across all orders (e.g., $500)

Now write the full query with the subquery in the WHERE clause:

SELECT OrderID, CustomerID, Sales, OrderDate
FROM Orders
WHERE Sales > (SELECT AVG(Sales) FROM Orders)
ORDER BY Sales DESC;

Execute the query (press F5 or click Execute)
Analyze the results: Notice all returned orders have sales greater than the average

💡 Tips:

Test the subquery separately first - Run just the SELECT AVG(Sales)... to see what value it returns
Subquery runs once: Non-correlated subqueries execute only one time, then the result is used for comparison
Parentheses are required: Always wrap subqueries in parentheses (SELECT ...)
ORDER BY helps verification: Sort by Sales DESC to easily verify all values are above average
Alternative approach: You could use a CTE or JOIN with a derived table, but subqueries are cleaner for this use case

✅ Expected Result: A list of orders where Sales value is higher than the average (e.g., if average is $500, you'll see only orders with $501+)

🔄 Connection to Previous Learning: This is similar to Excel's AVERAGEIF from Day 3, but more flexible - you're filtering based on a calculated aggregate.

Exercise 2: Select Customers Who Have Placed Orders (Using IN Subquery)

Goal: Learn to use IN operator with subqueries to filter based on a list of values.

Step-by-Step Instructions:

Understand the scenario: You have a Customers table and an Orders table. You want to find only customers who have placed at least one order
First, test the subquery alone to see what it returns:
```
SELECT DISTINCT CustomerID FROM Orders;
```
This returns a list of all customer IDs that appear in Orders (e.g., 101, 105, 108...)

Now write the full query using IN:

SELECT CustomerID, CustomerName, State, City
FROM Customers
WHERE CustomerID IN (SELECT DISTINCT CustomerID FROM Orders)
ORDER BY CustomerName;

Execute and verify - All returned customers should have orders

Try the opposite (NOT IN) to find customers without orders:

SELECT CustomerID, CustomerName, State, City
FROM Customers
WHERE CustomerID NOT IN (SELECT DISTINCT CustomerID FROM Orders);

💡 Tips:

IN vs JOIN: This could also be done with INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID, but IN is more intuitive for this use case
Use DISTINCT in subquery: Prevents duplicate customer IDs in the list (though IN handles duplicates fine)
NULL handling: Be careful with NOT IN if the subquery can return NULL - it can cause unexpected results. Use WHERE CustomerID IS NOT NULL in subquery
Performance: For large datasets, EXISTS might be faster than IN
Debugging tip: Run the subquery separately to verify it returns the expected list

✅ Expected Result: A list of customers from the Customers table who have at least one corresponding order in the Orders table

🔄 Connection to Previous Learning: Similar to Excel VLOOKUP filtering from Day 5, but here you're finding matching records across tables.

Exercise 3: Find Products That Have Been Sold (Using EXISTS)

Goal: Learn to use EXISTS operator for existence checks - often more efficient than IN.

Step-by-Step Instructions:

Understand EXISTS: Returns TRUE if the subquery returns any rows, FALSE if it returns zero rows

Write the EXISTS query to find products that have been sold:

SELECT ProductID, ProductName, Category, Price
FROM Products P
WHERE EXISTS (
    SELECT 1
    FROM OrderDetails OD
    WHERE OD.ProductID = P.ProductID
)
ORDER BY ProductName;

Note the correlation: The subquery references P.ProductID from the outer query - this makes it a correlated subquery
Execute and analyze - You'll see only products that appear in OrderDetails

Try NOT EXISTS to find products that have never been sold:

SELECT ProductID, ProductName, Category, Price
FROM Products P
WHERE NOT EXISTS (
    SELECT 1
    FROM OrderDetails OD
    WHERE OD.ProductID = P.ProductID
);

Compare with IN approach:

-- Alternative using IN
SELECT ProductID, ProductName, Category, Price
FROM Products
WHERE ProductID IN (SELECT ProductID FROM OrderDetails);

Both return same results, but EXISTS is typically faster for large datasets

💡 Tips:

SELECT 1 vs SELECT *: In EXISTS, we use SELECT 1 because we only care if rows exist, not their values - more efficient
Performance advantage: EXISTS stops searching after finding first match; IN evaluates entire subquery
Correlated subquery: Notice OD.ProductID = P.ProductID - the subquery references outer query, so it runs for each product
Readability: EXISTS is more readable for "does this record exist" questions
When to use: Prefer EXISTS over IN when checking existence in large tables or when subquery might return NULLs

✅ Expected Result: A list of products from the Products table that have at least one sale recorded in OrderDetails (or with NOT EXISTS, products with zero sales)

🔄 Connection to Previous Learning: This relates to Day 6 JOIN concepts - you're essentially checking for matching records between tables.

Exercise 4: Add Customer Name Using Subquery in SELECT (Scalar Subquery)

Goal: Learn to use subqueries in the SELECT clause to add calculated columns.

Step-by-Step Instructions:

Understand scalar subqueries: A subquery in SELECT must return exactly one value (one row, one column)
Start with a basic Orders query:
```
SELECT OrderID, CustomerID, Sales, OrderDate
FROM Orders;
```
This shows orders but only CustomerID, not the name

Add customer name using a scalar subquery:

SELECT
    OrderID,
    CustomerID,
    (SELECT CustomerName
     FROM Customers C
     WHERE C.CustomerID = Orders.CustomerID) AS CustomerName,
    Sales,
    OrderDate
FROM Orders
ORDER BY OrderDate DESC
LIMIT 10;

Execute and verify - Each order now shows the customer name retrieved from Customers table

Add multiple scalar subqueries:

SELECT
    OrderID,
    (SELECT CustomerName FROM Customers C WHERE C.CustomerID = Orders.CustomerID) AS CustomerName,
    (SELECT State FROM Customers C WHERE C.CustomerID = Orders.CustomerID) AS CustomerState,
    Sales,
    OrderDate
FROM Orders
WHERE Sales > 1000
ORDER BY Sales DESC;

Compare with JOIN approach:

-- Alternative using JOIN (usually more efficient)
SELECT
    O.OrderID,
    C.CustomerName,
    C.State AS CustomerState,
    O.Sales,
    O.OrderDate
FROM Orders O
INNER JOIN Customers C ON O.CustomerID = C.CustomerID
WHERE O.Sales > 1000
ORDER BY O.Sales DESC;

Both approaches work, but JOIN is typically better for performance

💡 Tips:

Must return single value: If subquery returns multiple rows or columns, you'll get an error
Correlated subquery: Notice C.CustomerID = Orders.CustomerID - runs once per order row
Performance consideration: This runs the subquery for every row in Orders - can be slow for large datasets
When to use: Use scalar subqueries when you need just one or two extra columns; use JOINs when you need multiple columns from related table
NULL handling: If no matching customer found, subquery returns NULL
Alternative approach: For better performance, use JOIN instead (shown in step 6)

✅ Expected Result: Orders list with CustomerName column added, showing the name from Customers table for each order

🔄 Connection to Previous Learning: This is the SQL equivalent of Excel's VLOOKUP from Day 5 - looking up related data from another table.

Exercise 5: Find Top 10% of Orders by Revenue (Advanced Subquery)

Goal: Learn to use subqueries with percentiles and ranking logic for advanced filtering.

Step-by-Step Instructions:

Understand the goal: Find orders in the top 10% by sales value (e.g., if you have 1000 orders, find the top 100)

Method 1 - Using LIMIT with subquery (for MySQL/PostgreSQL):

SELECT OrderID, CustomerID, Sales, OrderDate
FROM Orders
WHERE Sales >= (
    SELECT Sales
    FROM Orders
    ORDER BY Sales DESC
    LIMIT 1 OFFSET (SELECT COUNT(*) * 0.1 FROM Orders)
)
ORDER BY Sales DESC;

This finds the sales value at the 90th percentile, then returns all orders >= that value

Method 2 - Using PERCENT_RANK window function (more elegant):

SELECT OrderID, CustomerID, Sales, OrderDate
FROM (
    SELECT
        OrderID,
        CustomerID,
        Sales,
        OrderDate,
        PERCENT_RANK() OVER (ORDER BY Sales DESC) AS percentile
    FROM Orders
) AS RankedOrders
WHERE percentile <= 0.10
ORDER BY Sales DESC;

This calculates the percentile for each order, then filters to top 10%

Method 3 - Using COUNT subquery (works in all SQL databases):

SELECT OrderID, CustomerID, Sales, OrderDate
FROM Orders O1
WHERE (
    SELECT COUNT(*)
    FROM Orders O2
    WHERE O2.Sales > O1.Sales
) <= (SELECT COUNT(*) * 0.10 FROM Orders)
ORDER BY Sales DESC;

This counts how many orders have higher sales than current order, keeps those in top 10%

Execute your preferred method and analyze the results

Verify the percentage:

-- Count total orders
SELECT COUNT(*) AS TotalOrders FROM Orders;

-- Count orders in your result
SELECT COUNT(*) AS Top10PercentOrders
FROM [your query here];

The second count should be ~10% of the first

Extend to find bottom 10%:

-- Using Method 2, just change the WHERE clause
SELECT OrderID, CustomerID, Sales, OrderDate
FROM (
    SELECT
        OrderID,
        CustomerID,
        Sales,
        OrderDate,
        PERCENT_RANK() OVER (ORDER BY Sales ASC) AS percentile
    FROM Orders
) AS RankedOrders
WHERE percentile <= 0.10
ORDER BY Sales ASC;

💡 Tips:

Choose the right method: Method 2 (PERCENT_RANK) is most efficient for modern databases; Method 3 works everywhere but slower
Subquery in FROM: Method 2 uses a subquery in FROM clause (called derived table or inline view) - very powerful technique
Window functions: PERCENT_RANK is a window function (preview of Day 12) - provides percentile without grouping
Performance: Method 1 and 2 are fast; Method 3 with correlated subquery can be slow on large datasets
Exact vs approximate: Depending on ties in Sales values, you might get slightly more or less than exactly 10%
Database differences: LIMIT syntax varies (LIMIT for MySQL/PostgreSQL, TOP for SQL Server, FETCH FIRST for Oracle)

✅ Expected Result: A list of orders representing the top 10% by sales value - if you have 1000 orders, you should see approximately 100 orders with the highest sales values

🔄 Connection to Previous Learning: This combines concepts from Day 8 (aggregates) with subqueries to create sophisticated filtering - similar to Excel's top 10% filtering with AutoFilter from Day 3.

🎮 Interactive SQL Playground 1: Above Average Sales

Practice finding orders with sales above average. Try modifying the query to find below-average sales or use different aggregates!

🎮 Interactive SQL Playground 2: Customers with Orders (IN)

Find customers who have placed orders using IN operator. Try using NOT IN to find customers without orders!

🎮 Interactive SQL Playground 3: Products Sold (EXISTS)

Find products that have been sold using EXISTS. Compare performance with IN approach!

🎮 Interactive SQL Playground 4: Scalar Subquery in SELECT

Add customer name to orders using scalar subquery. Try adding more columns like State or City!

🎮 Interactive SQL Playground 5: Top 10% by Revenue

Find orders in the top 10% by sales. Experiment with different percentiles or find bottom performers!

Independent Practice (60 min):

💡 Hints for Independent Practice:

Start simple: Begin with basic variations before attempting complex modifications
Break it down: Divide complex problems into smaller, manageable steps
Test incrementally: Verify each step works before moving to the next
Use documentation: Refer to official docs or tutorials when needed
Experiment: Try different approaches to find what works best

Customers with above-average order value
Products in top 3 revenue categories (subquery)
Orders from California customers (IN subquery)
Customers with more than 5 orders (subquery + HAVING)
Products never ordered (NOT EXISTS)
Orders above category average (correlated)
High-value customers (top 20% by total sales)
Products with above-average profit margin
Regions performing above company average
Customers with orders in last 30 days (subquery + date)
Products in profitable categories only
Orders from best customers (defined by criteria)

Daily Assignment (60 min):

Create complex analysis using subqueries for comparisons
Compare with Excel Day 9 multi-criteria formulas
Find patterns using subqueries (above average, top performers, etc.)
Save as: Day10_SQL_Practice.sql
WEEK 2 TEST: Complete Week 2 assessment (provided separately by instructor)

Expected Outcomes:

✓ Write subqueries in various clauses
✓ Use IN and EXISTS effectively
✓ Understand when to use subqueries vs JOINs
✓ Submit: Day10_SQL_Practice.sql