Chapter 12

Advanced Analysis Types

Explore correlation, cohort, funnel, and scenario analysis

Introduction to Advanced Analysis

You've learned descriptive analysis (what happened?), comparative analysis (how do things differ?), and trend analysis (what's changing over time?). Now it's time to explore advanced analysis techniques that answer more specialized questions.

These techniques help you understand:

  • What drives the total? (Contribution Analysis)
  • Which groups behave differently? (Segmentation Analysis)
  • Do two things move together? (Correlation Analysis)
  • How do groups change over time? (Cohort Analysis)
  • Where are we losing people? (Funnel Analysis)
  • What do customers buy together? (Basket Analysis)
  • What if we change something? (What-If Analysis)

Why learn advanced analysis?

These techniques are used daily in business, marketing, product development, and operations. They help you:

  • Make data-driven decisions with confidence
  • Understand customer behavior patterns
  • Optimize processes and improve conversion
  • Test different scenarios before committing resources
  • Discover hidden relationships in your data

Let's explore each technique with real-world examples and practical applications.

1. Contribution Analysis

What it is: Contribution analysis breaks down a total into its parts and shows what percentage each part contributes to the whole. It answers the question: "What drives the total?"

When to use it:

  • Understanding which products/categories generate the most revenue
  • Analyzing budget allocation by department
  • Determining where website traffic comes from
  • Identifying which customer segments are most valuable

Example 1: Revenue by Product Category

Question: "Which product categories contribute most to our total revenue?"

Category Revenue % of Total
Electronics $125,000 50%
Clothing $62,500 25%
Home Goods $37,500 15%
Books $17,500 7%
Sports $7,500 3%
Total $250,000 100%

Insight: "Electronics alone accounts for half of our revenue. The top 2 categories (Electronics + Clothing) drive 75% of total revenue."

How to calculate percentage of total:

Electronics: ($125,000 ÷ $250,000) × 100 = 50%

Example 2: Website Traffic by Source

Question: "Where do our visitors come from?"

Traffic Source Visitors % of Total
Organic Search 4,200 42%
Direct 2,500 25%
Social Media 1,800 18%
Email 1,000 10%
Paid Ads 500 5%
Total 10,000 100%

Insight: "Organic search is our biggest traffic driver (42%), followed by direct visits. Paid ads contribute only 5%, suggesting we should invest more in SEO."

💡 Real-Life Analogy: Your Monthly Budget

Imagine your monthly income is $3,000. You want to know where it goes:

  • Rent: $1,200 (40%)
  • Food: $600 (20%)
  • Transportation: $450 (15%)
  • Entertainment: $300 (10%)
  • Savings: $300 (10%)
  • Other: $150 (5%)

This shows you that rent is your biggest expense, consuming 40% of your income. This is contribution analysis.

Key takeaway: Contribution analysis helps you focus on what matters most. Often, a small number of categories (like the top 2-3) drive the majority of your total. This is known as the Pareto Principle or 80/20 rule.

Interactive Contribution Calculator

Calculate the percentage contribution of each part:

2. Segmentation Analysis

What it is: Segmentation divides your data into meaningful groups based on shared characteristics. It answers: "Which groups behave differently?"

When to use it:

  • Understanding different customer types (by age, location, behavior)
  • Comparing performance across regions or departments
  • Personalizing marketing messages for different audiences
  • Identifying high-value vs. low-value customer segments

Example 1: Customer Segmentation by Age

Question: "Do different age groups spend differently?"

Age Segment Customers Avg. Order Value Total Revenue
18-24 1,200 $35 $42,000
25-34 2,500 $58 $145,000
35-44 1,800 $72 $129,600
45-54 1,000 $65 $65,000
55+ 500 $48 $24,000

Insight: "The 35-44 age group has the highest average order value ($72), while 25-34 drives the most total revenue due to volume. The 18-24 segment spends least per order ($35), suggesting we should offer entry-level products for this group."

Example 2: Geographic Segmentation

Question: "Which regions have the highest conversion rates?"

Region Visitors Purchases Conversion Rate
Northeast 5,000 400 8.0%
Southeast 6,500 325 5.0%
Midwest 4,200 294 7.0%
West 8,300 581 7.0%

Insight: "The Northeast has the highest conversion rate (8%), despite having fewer visitors than the West. The Southeast has the lowest conversion rate (5%), indicating we may need to improve our offering or messaging in that region."

1. Demographic Segmentation

Group by age, gender, income, education

Example: Luxury brands targeting high-income customers

2. Geographic Segmentation

Group by location, region, country, climate

Example: Clothing retailers offering different products for warm vs. cold climates

3. Behavioral Segmentation

Group by purchase history, usage frequency, loyalty

Example: Identifying "power users" who use your app daily vs. occasional users

4. Psychographic Segmentation

Group by lifestyle, values, interests, personality

Example: Outdoor brands targeting adventure seekers vs. casual hikers

Why segmentation reveals hidden insights: Averages can hide important differences between groups. For example, "average customer age is 35" doesn't tell you that you have two distinct segments: college students (age 20) and parents (age 50). Segmentation makes these groups visible.

3. Correlation Analysis

What it is: Correlation measures the relationship between two variables. It tells you if two things tend to move together. It answers: "Are these two variables related?"

Types of correlation:

  • Positive correlation: When one variable increases, the other also increases
  • Negative correlation: When one variable increases, the other decreases
  • No correlation: The variables are independent—no predictable relationship

Positive Correlation

Definition: Both variables move in the same direction

Examples:

  • Study hours ↑ → Test scores ↑
  • Temperature ↑ → Ice cream sales ↑
  • Ad spending ↑ → Website traffic ↑

Visual pattern: Data points form an upward slope

Negative Correlation

Definition: Variables move in opposite directions

Examples:

  • Price ↑ → Sales volume ↓
  • Distance from store ↑ → Visit frequency ↓
  • Product defects ↑ → Customer satisfaction ↓

Visual pattern: Data points form a downward slope

No Correlation

Definition: No predictable relationship between variables

Examples:

  • Shoe size → Intelligence
  • Number of siblings → Salary
  • Hair color → Math skills

Visual pattern: Data points scattered randomly

Example: Marketing Spend vs. Revenue

Question: "Does spending more on marketing increase revenue?"

Month Marketing Spend Revenue
Jan $5,000 $45,000
Feb $8,000 $62,000
Mar $6,500 $54,000
Apr $10,000 $78,000
May $7,500 $59,000
Jun $12,000 $89,000

Insight: "There appears to be a positive correlation—months with higher marketing spend tend to have higher revenue. When we spent $12,000 (June), revenue was $89,000. When we spent $5,000 (January), revenue was only $45,000."

⚠️ CRITICAL: Correlation Does NOT Equal Causation

This is one of the most important concepts in data analysis. Just because two things are correlated doesn't mean one causes the other.

Classic Example: Ice Cream Sales vs. Drowning Incidents

Observation: There's a strong positive correlation between ice cream sales and drowning incidents. When ice cream sales go up, drownings also go up.

❌ Wrong conclusion: "Ice cream causes drowning!" or "Drowning causes people to buy ice cream!"

✓ Correct explanation: Both are caused by a third variable: hot weather. When it's hot, people buy more ice cream AND more people go swimming (increasing drowning risk).

Hot Weather (Cause)

↓ ↓

Ice Cream Sales ↑     Swimming ↑ → Drownings ↑

(These are correlated but neither causes the other)

More Examples of Misleading Correlations

1. Nicolas Cage Movies vs. Pool Drownings

Correlation: The number of Nicolas Cage movies per year correlates with pool drownings

Reality: Pure coincidence. No causal relationship.

2. Firefighters vs. Fire Damage

Correlation: More firefighters at a fire → More damage

Reality: Big fires cause both more firefighters to respond AND more damage. Fire size is the hidden cause.

3. Shoe Size vs. Reading Ability (in children)

Correlation: Bigger shoe size correlates with better reading

Reality: Age is the hidden variable. Older children have bigger feet AND read better.

Key takeaway: Correlation is useful for finding patterns, but always ask: "Could there be a hidden third variable?" or "Could this be reverse causation?" or "Could this be pure coincidence?" Before claiming one thing causes another, you need controlled experiments, not just correlation.

4. Cohort Analysis

What it is: Cohort analysis follows specific groups of people (cohorts) over time to understand how their behavior changes. A cohort is a group that shares a common characteristic during a specific time period.

When to use it:

  • Tracking retention rates for customers acquired in different months
  • Comparing performance of different student classes over time
  • Understanding how user engagement changes after signup
  • Measuring the long-term value of customers from different acquisition channels

Example: Customer Retention by Signup Month

Question: "Do customers who signed up in January stay longer than those who signed up in February?"

Cohort definition: Month of signup

Cohort Month 0 Month 1 Month 2 Month 3
January 100%
(1,000 users)
60%
(600 users)
45%
(450 users)
38%
(380 users)
February 100%
(1,200 users)
70%
(840 users)
58%
(696 users)
50%
(600 users)
March 100%
(900 users)
68%
(612 users)
55%
(495 users)
48%
(432 users)

How to read this:

  • Month 0: All cohorts start at 100% (their signup month)
  • Month 1: 60% of January signups are still active, 70% of February signups are still active
  • Month 3: January cohort has 38% retention, February has 50% retention

Insight: "The February cohort has better retention than January at every time period. After 3 months, 50% of February users are still active vs. only 38% of January users. We should investigate what was different about February (better onboarding? Different traffic source? Product improvements?)"

Example: Revenue per Cohort Over Time

Question: "How much revenue does each cohort generate over time?"

Acquisition Cohort Month 0 Month 1 Month 2 Total
Q1 Customers $25,000 $18,000 $15,000 $58,000
Q2 Customers $30,000 $24,000 $22,000 $76,000

Insight: "Q2 customers are more valuable—they spent more initially ($30K vs. $25K) and maintained higher spending over time. This suggests Q2 acquisition efforts attracted higher-quality customers."

🎓 Real-Life Analogy: Graduating Classes

Think of your high school graduating class as a cohort:

  • Class of 2020 (cohort defined by graduation year)
  • You track them: 1 year later, 5 years later, 10 years later
  • You measure: % employed, average salary, % in grad school
  • You compare Class of 2020 vs. Class of 2021 to see if outcomes differ

This is cohort analysis—following a specific group over time and comparing different groups.

Why cohort analysis is powerful: It reveals trends that simple averages hide. For example, overall retention might look stable, but cohort analysis could show that recent cohorts are performing worse—a warning sign you'd miss otherwise.

5. Funnel Analysis

What it is: Funnel analysis tracks how people progress through a series of steps, measuring how many drop off at each stage. It's called a "funnel" because the number of people decreases at each step, like a funnel shape.

When to use it:

  • E-commerce: Tracking visitors → cart → checkout → purchase
  • Sales: Leads → qualified → demo → proposal → closed
  • Hiring: Applicants → interview → offer → hired
  • Onboarding: Signup → profile setup → first action → active user

Example 1: E-commerce Purchase Funnel

Question: "Where are we losing potential customers?"

Step 1: Visitors
10,000 (100%)
Step 2: Add to Cart
4,000 (40%)
↓ 60% drop-off
Step 3: Checkout
3,000 (30%)
↓ 25% drop-off from cart
Step 4: Purchase
2,000 (20%)
↓ 33% drop-off from checkout

Conversion rates:

  • Visit → Cart: 40%
  • Cart → Checkout: 75% (3,000 ÷ 4,000)
  • Checkout → Purchase: 67% (2,000 ÷ 3,000)
  • Overall conversion: 20% (2,000 purchases ÷ 10,000 visitors)

Insight: "Our biggest drop-off is at the first step—60% of visitors never add anything to their cart. We should focus on improving product pages, images, and descriptions to encourage cart adds."

Example 2: Job Application Funnel

Question: "How efficient is our hiring process?"

Stage Candidates Pass Rate % of Original
Applications 500 - 100%
Phone Screen 100 20% 20%
Technical Interview 30 30% 6%
Final Interview 12 40% 2.4%
Offer Made 5 42% 1.0%
Offer Accepted 4 80% 0.8%

Insight: "We need 500 applicants to make 1 hire (0.8% overall conversion). Our biggest filter is the phone screen (only 20% pass). One concern: only 80% of offers are accepted—we may be losing good candidates to competitors at the final stage."

How to optimize a funnel:

  • Identify the biggest drop-offs: Focus on stages with the largest percentage loss
  • Ask why people leave: Survey drop-offs, run user tests, analyze behavior
  • Test improvements: Simplify forms, improve messaging, reduce friction
  • Measure impact: Track conversion rate changes after each improvement

6. Basket Analysis

What it is: Basket analysis (also called market basket analysis or association analysis) identifies which items are frequently purchased together. It answers: "What do customers buy together?"

When to use it:

  • Product recommendations: "Customers who bought X also bought Y"
  • Store layout: Placing related items near each other
  • Bundle pricing: Creating product bundles
  • Cross-selling: Suggesting complementary products at checkout

Example 1: Grocery Store Basket Analysis

Question: "What products are purchased together?"

Classic findings:

If customer buys... They also often buy... % of transactions
Diapers Beer 35%
Pasta Pasta sauce 68%
Hot dogs Hot dog buns 72%
Chips Soda 45%
Coffee Cream 52%

Insight: "72% of customers who buy hot dogs also buy buns. We should place these items near each other and create a 'BBQ Bundle' promotion. The diapers-beer correlation is famous—young parents buying diapers often grab beer in the same trip."

Example 2: E-commerce Product Associations

Question: "What should we recommend when someone views a laptop?"

Analysis results:

  • Laptop buyers also bought:
    • Laptop bag (42% of transactions)
    • Mouse (38% of transactions)
    • External hard drive (28% of transactions)
    • Screen protector (22% of transactions)

Action: "When someone adds a laptop to their cart, show: 'Customers also bought: Laptop Bag, Mouse, External Hard Drive.' This increases average order value."

🍔 Real-Life Analogy: Fast Food Combos

Fast food restaurants use basket analysis to create combo meals:

  • Data shows: 80% of burger buyers also buy fries and a drink
  • Action: Create a "Combo Meal" (burger + fries + drink) at a slight discount
  • Result: Customers are happy (convenience + savings), restaurant increases sales

This is basket analysis in action—using purchase patterns to create bundled offers.

Key metrics in basket analysis:

  • Support: How often do items appear together? (Raw frequency)
  • Confidence: If someone buys A, what's the probability they buy B?
  • Lift: How much more likely is B to be purchased when A is purchased vs. when A is not purchased?

Don't worry about the math for now—just understand that basket analysis finds patterns in what customers buy together.

7. What-If Analysis (Scenario Planning)

What it is: What-if analysis tests different scenarios by changing input variables to see how outputs change. It answers: "What would happen if we changed X?"

When to use it:

  • Pricing decisions: "What if we increase price by 10%?"
  • Staffing: "What if we hire 2 more salespeople?"
  • Budgeting: "What if costs increase 15%?"
  • Risk planning: Best case / expected case / worst case scenarios

Example 1: Price Increase Scenario

Question: "What happens to revenue if we raise prices?"

Current situation:

  • Price: $50 per unit
  • Sales volume: 2,000 units/month
  • Revenue: $50 × 2,000 = $100,000
Scenario Price Estimated Volume Revenue Change
Current $50 2,000 $100,000 -
5% increase $52.50 1,900 (-5%) $99,750 -$250
10% increase $55 1,800 (-10%) $99,000 -$1,000
15% increase $57.50 1,700 (-15%) $97,750 -$2,250

Insight: "If price increases cause proportional volume decreases, raising prices actually hurts revenue. We should explore other strategies (reduce costs, improve product, find new markets) rather than raising prices."

Example 2: Hiring Decision Scenario

Question: "Should we hire 2 more salespeople?"

Current situation:

  • Salespeople: 5
  • Average sales per person: $200,000/year
  • Total sales: $1,000,000/year
  • Salary per salesperson: $60,000/year
  • Total cost: $300,000/year
  • Profit margin: 30%
  • Net contribution: ($1M × 30%) - $300K = $0
Scenario Salespeople Total Sales Salary Cost Profit (30%) Net
Current 5 $1,000,000 $300,000 $300,000 $0
Add 2 (optimistic) 7 $1,400,000 $420,000 $420,000 +$0
Add 2 (realistic) 7 $1,300,000 $420,000 $390,000 -$30,000
Add 2 (pessimistic) 7 $1,200,000 $420,000 $360,000 -$60,000

Insight: "New salespeople might not hit $200K in year 1. In the realistic scenario, we'd lose $30K. We should only hire if we're confident new reps can hit at least $171K each ($1.2M ÷ 7) to break even."

Interactive What-If Calculator

See how changing variables affects your outcome:

Three scenario approach:

  • Best case: Optimistic assumptions (highest sales, lowest costs)
  • Expected case: Realistic assumptions (most likely outcome)
  • Worst case: Pessimistic assumptions (lowest sales, highest costs)

Planning for all three scenarios helps you make robust decisions and prepare for different outcomes.

Interactive Analysis Type Matcher

Test your understanding! For each business question, select the most appropriate analysis type.

1. "Which product categories generate the most revenue?"

2. "Do older customers spend more than younger customers?"

3. "Is there a relationship between ad spend and website traffic?"

4. "Where in our checkout process are we losing customers?"

5. "What products do customers frequently buy together?"

6. "Do customers who signed up in Q1 have better retention than Q2 customers?"

7. "What would happen to profit if we increased prices by 15%?"

8. "What percentage of our budget is spent on marketing vs. operations?"

9. "How many applicants make it from initial application to final interview?"

10. "Do customers in different regions have different purchasing patterns?"

Choosing the Right Analysis Type

Use this decision guide to select the appropriate analysis technique for your question:

❓ Question Type: "What drives the total?"

Use: Contribution Analysis

Examples: Revenue by category, traffic by source, budget by department

❓ Question Type: "How do groups differ?"

Use: Segmentation Analysis

Examples: Comparing age groups, regions, customer types

❓ Question Type: "Are two things related?"

Use: Correlation Analysis

Examples: Ad spend vs. sales, temperature vs. ice cream sales

Warning: Remember, correlation ≠ causation!

❓ Question Type: "How do groups change over time?"

Use: Cohort Analysis

Examples: Retention by signup month, performance by graduating class

❓ Question Type: "Where are we losing people in a process?"

Use: Funnel Analysis

Examples: Checkout process, hiring pipeline, onboarding flow

❓ Question Type: "What do people buy/use together?"

Use: Basket Analysis

Examples: Product recommendations, bundle creation, store layout

❓ Question Type: "What would happen if we changed X?"

Use: What-If Analysis

Examples: Price changes, hiring decisions, budget scenarios

Practice Exercises

Apply what you've learned with these real-world scenarios.

Exercise 1: Contribution Analysis

A company's quarterly revenue by region:

  • North: $180,000
  • South: $120,000
  • East: $90,000
  • West: $60,000

Calculate:

  • a) Total revenue
  • b) Percentage contribution of each region
  • c) Which region(s) should the company focus on?

Exercise 2: Segmentation Analysis

Email campaign results by customer segment:

Segment Emails Sent Opened Clicked
New Customers 5,000 1,000 150
Active Customers 8,000 3,200 640
Inactive Customers 12,000 1,200 60

Calculate open rates and click rates for each segment. What do you notice?

Exercise 3: Correlation vs. Causation

A study finds: "Cities with more Starbucks locations have higher average incomes."

Questions:

  • a) Is this positive or negative correlation?
  • b) Does Starbucks cause higher incomes?
  • c) What's a more likely explanation?

Exercise 4: Funnel Analysis

App onboarding funnel:

  • Downloaded app: 10,000
  • Created account: 6,000
  • Completed profile: 3,000
  • Made first action: 1,500
  • Active after 7 days: 600

Questions:

  • a) What's the biggest drop-off point?
  • b) What's the overall conversion rate (download to active)?
  • c) Where should the company focus improvements?

Exercise 5: What-If Analysis

Current situation: Selling 500 widgets/month at $100 each. Cost per widget: $60. Fixed costs: $10,000/month.

Calculate profit for these scenarios:

  • a) Current scenario
  • b) Increase price to $120 (expect 10% volume drop)
  • c) Reduce cost to $50 (keep price at $100)

Exercise 6: Choosing Analysis Types

Match each business question to the best analysis type:

  1. "Our Q1 customers seem to churn faster than Q2 customers. Is this true?"
  2. "If we cut our ad budget by 20%, how would it affect leads?"
  3. "Which customer segments contribute most to our profit?"
  4. "Do people who buy our premium product also buy accessories?"

Key Takeaways

  • Contribution Analysis: Shows what percentage each part contributes to the total (e.g., revenue by category)
  • Segmentation Analysis: Divides data into groups to find different patterns (e.g., customer age groups)
  • Correlation Analysis: Measures if two variables move together—but correlation ≠ causation!
  • Cohort Analysis: Follows specific groups over time (e.g., customers by signup month)
  • Funnel Analysis: Tracks progression through steps to find drop-off points (e.g., checkout process)
  • Basket Analysis: Identifies items purchased together (e.g., product recommendations)
  • What-If Analysis: Tests different scenarios to plan decisions (e.g., price changes)
  • Choose the right tool: Match your question to the appropriate analysis technique

📝 Knowledge Check

1. What does contribution analysis help you understand?

2. Which analysis type would you use to compare customer behavior across different age groups?

3. "Ice cream sales and drowning incidents are correlated." What's the most likely explanation?

4. What is the main purpose of cohort analysis?

5. A website tracks: 10,000 visitors → 4,000 add to cart → 3,000 checkout → 2,000 purchase. What analysis is this?

6. "Customers who buy laptops often buy laptop bags." Which analysis identifies this pattern?

7. "What would happen to profit if we raised prices by 10%?" Which analysis answers this?

8. Two variables have a positive correlation. What does this mean?

9. In a funnel analysis showing 10,000 visitors → 2,000 purchases, what is the overall conversion rate?

10. What is the key difference between segmentation and cohort analysis?