Chapter 16

Advanced Charts - Part 1

Explore scatter plots and area charts

🎯 Learning Objectives

πŸ“ŠScatter Plots

Understand how to show relationships between two numerical variables

πŸ“ˆCorrelation Patterns

Learn to identify positive, negative, and no correlation

🌊Area Charts

Master visualizing volume and cumulative trends over time

🎨Chart Selection

Know when to use scatter vs. area vs. line charts

PART 1: SCATTER PLOTS

πŸ“ Introduction to Scatter Plots

A scatter plot (also called scatter chart or scatter diagram) displays values for two numerical variables as dots on a two-dimensional graph. Each dot represents one observation from your data.

When to Use Scatter Plots:
  • Showing relationships between two numerical variables
  • Finding correlations and patterns
  • Identifying outliers that don't fit the pattern
  • Exploring dependencies between variables
Data Requirements:
  • X-axis: Numerical variable (often independent)
  • Y-axis: Numerical variable (often dependent)
  • Each point = one observation/record

πŸ“– Reading Scatter Plots

Understanding the components of a scatter plot:

Example: Study Hours vs. Test Scores

100 80 60 40 20 0 0 2 4 6 8 10 12 Study Hours Test Score Trend Line (Line of Best Fit) Each dot = 1 student

Interpretation: This scatter plot shows a positive correlation - as study hours increase, test scores tend to increase. The trend line helps visualize this overall pattern.

πŸ“Š Patterns in Scatter Plots

Scatter plots reveal different types of relationships between variables:

1. Positive Correlation

Upward slope β†—

Pattern: As X increases, Y tends to increase

Examples: Height vs. Weight, Study time vs. Score

2. Negative Correlation

Downward slope β†˜

Pattern: As X increases, Y tends to decrease

Examples: Car age vs. Value, Practice time vs. Errors

3. No Correlation

Random scatter

Pattern: No clear relationship between X and Y

Examples: Shoe size vs. IQ, Hair color vs. Salary

4. Strong vs. Weak Correlation

Strong (close to line) Weak (spread out)

Strength: How close points are to the trend line

Strong: Points cluster tightly; Weak: Points widely scattered

🎯 Use Cases for Scatter Plots

Real-World Applications:

  1. Health & Fitness: Height vs. Weight, Exercise hours vs. Calories burned
  2. Business: Advertising spend vs. Sales revenue, Price vs. Demand
  3. Education: Study hours vs. Test scores, Attendance vs. Grades
  4. Science: Temperature vs. Ice cream sales, Rainfall vs. Crop yield
  5. Economics: Income vs. Spending, Unemployment rate vs. Crime rate

Example: Advertising Spend vs. Sales

$100k $80k $60k $40k $20k $0 $5k $10k $15k $20k Ad Spend Sales Revenue

Insight: Strong positive correlation - higher ad spend generally leads to higher sales. The trend line helps predict expected sales for a given ad budget.

πŸ“ˆ Adding Trend Lines

A trend line (or line of best fit) is a straight line drawn through the data points to show the general direction of the relationship.

Purpose of Trend Lines:
  • Shows overall direction: Upward, downward, or flat
  • Helps make predictions: Estimate Y for a given X value
  • Quantifies strength: How closely points cluster around the line
  • Identifies outliers: Points far from the trend line
⚠️ Important Note: Trend lines should only be added when there's a clear linear relationship. Don't force a trend line on data with no correlation or non-linear patterns.

βœ… Scatter Plot Best Practices

βœ… Clear axis labels with units: Always specify what each axis represents and include units (dollars, hours, kg, etc.)

βœ… Appropriate scale: Start Y-axis at zero if comparing magnitudes; adjust if focusing on variation

βœ… Don't overcrowd: If you have thousands of points, consider sampling or using transparency/smaller dots

βœ… Add trend line when helpful: Include trend line for linear relationships to show direction and strength

βœ… Highlight outliers if relevant: Circle or annotate unusual data points that don't fit the pattern

❌ Common Mistakes to Avoid

❌ Assuming correlation = causation: Just because two variables correlate doesn't mean one causes the other. Ice cream sales and drowning deaths both correlate with temperature, but ice cream doesn't cause drowning!

❌ Ignoring outliers: Outliers can reveal important insights or data errors. Don't just remove them - investigate why they exist.

❌ Using for categorical data: Scatter plots require numerical variables. Don't use categories (like "Red", "Blue") on axes.

❌ Too many points (unreadable): With 10,000+ points, the plot becomes a blob. Use sampling, binning, or heatmap-style density plots instead.

PART 2: AREA CHARTS

🌊 Introduction to Area Charts

An area chart is like a line chart, but the area below the line is filled with color. This emphasizes the magnitude or volume of change over time.

When to Use Area Charts:
  • Showing volume/magnitude over time (not just trend)
  • Cumulative trends - totals that build up
  • Emphasizing quantity - make the "amount" visually prominent
  • Comparing multiple series with stacked variations
Data Requirements:
  • X-axis: Time period (days, months, years) or sequential categories
  • Y-axis: Numerical values (quantities, amounts, counts)
  • Series: One or more data series to plot

πŸ“Š Single Area Charts

Single area charts track one data series over time, with the area filled to emphasize total magnitude.

Example: Monthly Revenue Growth

$50k $40k $30k $20k $10k Jan Feb Mar Apr May Jun Jul Monthly Revenue Growth

Use case: The filled area emphasizes the magnitude of revenue, making growth visually prominent. Perfect for tracking total accumulation over time.

Common Single Area Chart Uses:

  • Total revenue over time - emphasizes overall financial performance
  • Population growth - shows magnitude of total population
  • Cumulative downloads - highlights total volume
  • Website traffic - emphasizes visitor volume

πŸ“š Stacked Area Charts

Stacked area charts show multiple data series stacked on top of each other. The total height shows the combined sum, while each layer shows one series.

Example: Sales by Product Category

$120k $100k $80k $60k $40k Q1 Q2 Q3 Q4 Total Sales by Product Category Food Clothing Electronics

How to read: The total height shows combined sales across all categories. Each colored layer shows one category's contribution. You can see both total trends and individual category performance.

Stacked Area Chart Benefits:
  • Shows total AND composition - see both overall trend and breakdown
  • Compares multiple series over time
  • Reveals changing proportions - which categories grow/shrink
  • Good for cumulative data where parts add to a whole

πŸ’― 100% Stacked Area Charts

100% stacked area charts show proportions over time. The total always reaches 100%, allowing you to focus on relative share rather than absolute values.

Example: Market Share Evolution

100% 75% 50% 25% 0% 2020 2021 2022 2023 Smartphone Market Share by Company Company C (Growing) Company B (Stable) Company A (Shrinking)

Insight: Company C is gaining market share, Company A is losing share, and Company B remains stable. The 100% format makes it easy to compare relative proportions rather than absolute numbers.

When to use 100% Stacked:

  • Market share analysis - comparing competitors' relative positions
  • Budget allocation - showing how spending is distributed
  • Demographic composition - age groups, gender ratios over time
  • Portfolio mix - investment allocation changes

Key difference: Total values don't matter - only proportions. Use when relative share is more important than absolute amounts.

πŸ†š When to Use Area vs. Line Charts

Aspect Area Chart Line Chart
Best for Emphasizing volume/magnitude Emphasizing trend/precision
Visual focus Total quantity (filled area) Rate of change (line slope)
Data type Cumulative totals, volumes Any time series data
Multiple series Use stacked (shows total + parts) Use multiple lines (easier to compare)
Examples Revenue, population, downloads Temperature, stock price, heart rate

βœ… Use Area Chart When:

  • Showing cumulative totals
  • Emphasizing magnitude/volume
  • Data represents quantities that "fill up"
  • Comparing parts of a whole (stacked)

Example: "Total website visitors this month"

βœ… Use Line Chart When:

  • Showing precise trends
  • Comparing multiple series (3+ lines)
  • Data has negative values
  • Emphasizing rate of change

Example: "Daily temperature fluctuations"

βœ… Area Chart Best Practices

βœ… Use for cumulative data: Area charts work best when showing totals, volumes, or quantities that accumulate

βœ… Don't stack too many series: Limit to 3-5 categories in stacked charts; more becomes unreadable

βœ… Use transparent colors for overlaps: If areas overlap (not stacked), use transparency so both are visible

βœ… Order matters in stacked: Place most important series at the bottom where it's easiest to read

βœ… Start Y-axis at zero: Since area represents magnitude, always start at zero to avoid visual distortion

⚠️ Avoid these mistakes:
  • Using area charts for data with negative values (use line instead)
  • Stacking unrelated series that don't sum meaningfully
  • Comparing individual series in stacked charts (top series harder to read)
  • Using dark, opaque colors that hide overlapping data

πŸ› οΈ Interactive Scatter & Area Chart Builder

Practice creating scatter plots and area charts with sample datasets.

Scatter Plot Builder

Dataset: 30 Students - Study Hours vs. Test Scores

100 80 60 40 20 0 Study Hours Test Score

Change the correlation type to see different patterns. Toggle the trend line to see its effect.

Area Chart Builder

Dataset: Quarterly Revenue for 3 Product Lines (2 years)

Quarterly Revenue

Switch between chart types to see how the same data looks in different area chart formats.

✏️ Practice Exercises

Test your understanding with these hands-on exercises.

Exercise 1: Identify Correlation Type

Task: For each scenario, identify whether you'd expect positive correlation, negative correlation, or no correlation:

  1. Hours spent exercising vs. Weight loss
  2. Car's age vs. Resale value
  3. Shoe size vs. Math test score
  4. Years of education vs. Salary
  5. Distance from equator vs. Average temperature
Show Answer

a) Positive correlation - More exercise typically leads to more weight loss

b) Negative correlation - Older cars generally worth less

c) No correlation - Shoe size doesn't affect math ability

d) Positive correlation - More education often leads to higher salary

e) Negative correlation - Further from equator typically means colder

Exercise 2: Scatter Plot Interpretation

Scenario: A scatter plot shows advertising budget (X-axis) vs. sales revenue (Y-axis) for 50 stores. Most points cluster tightly around an upward-sloping trend line, but 3 stores are far below the line.

Questions:

  1. What type of correlation is shown?
  2. Is the correlation strong or weak? How do you know?
  3. What might the 3 outlier stores indicate?
Show Answer

a) Positive correlation - upward slope means as ad budget increases, sales increase

b) Strong correlation - points cluster "tightly" around the trend line

c) Possible outlier explanations:

  • Poor ad targeting or execution in those stores
  • Other factors (location, competition, product issues)
  • Data errors in recording
  • These stores warrant investigation!
Exercise 3: Scatter vs. Area vs. Line

Task: For each scenario, choose the best chart type (Scatter, Area, or Line) and explain why:

  1. Showing relationship between employee years of experience and salary
  2. Displaying total app downloads growing from 0 to 1 million over 12 months
  3. Tracking daily stock price movements
  4. Comparing how three product lines contribute to total quarterly revenue
  5. Analyzing if there's a relationship between study hours and exam scores
Show Answer

a) Scatter plot - showing relationship between two numerical variables

b) Area chart - emphasizes cumulative growth and total magnitude

c) Line chart - precise trend with potentially negative values

d) Stacked area chart - shows total revenue AND breakdown by product

e) Scatter plot - exploring correlation between two variables

Exercise 4: Area Chart Design

Scenario: You need to show how your company's total revenue ($500k in Q1 to $800k in Q4) is split between 5 product categories over 4 quarters.

Questions:

  1. Should you use single area, stacked area, or 100% stacked area?
  2. What if you want to emphasize which categories are gaining/losing market share?
  3. What's a problem with having 5 categories?
Show Answer

a) Stacked area chart - shows both total revenue growth AND category breakdown

b) 100% stacked area - makes relative proportions easier to compare

c) Too many categories! 5 layers can be hard to read, especially top layers. Consider:

  • Grouping smaller categories into "Other"
  • Using a different chart type (grouped column)
  • Showing only top 3 categories
Exercise 5: Correlation vs. Causation

Scenario: A scatter plot shows a strong positive correlation between ice cream sales and drowning incidents across 100 beaches.

Questions:

  1. Does this mean ice cream causes drowning?
  2. What's a more likely explanation?
  3. What's the lesson here?
Show Answer

a) No! Correlation does NOT imply causation

b) Confounding variable: Hot weather!

  • Hot weather β†’ More people buy ice cream
  • Hot weather β†’ More people swim β†’ More drownings
  • Both are caused by a third factor (temperature)

c) Critical lesson: Always ask "Could there be a third variable?" before assuming one variable causes another. Correlation helps identify relationships to investigate further, but doesn't prove cause and effect.

Exercise 6: Reading Stacked Area Charts

Task: In a stacked area chart showing email types over time:

  • Bottom layer (red): Spam emails
  • Middle layer (blue): Work emails
  • Top layer (green): Personal emails

If the total height is decreasing but the red layer is growing, what does this tell you?

Show Answer

Interpretation:

  • Total emails are decreasing (overall height going down)
  • BUT spam is increasing (red layer growing)
  • Therefore: Work and/or personal emails must be decreasing significantly

This could indicate improved spam filtering is reducing legitimate emails being received, or people are getting less work/personal email. The spam growth is being offset by even larger decreases in other categories.

Exercise 7: Design Challenge

Task: You have data showing the relationship between employee satisfaction score (1-10) and number of sick days taken per year. Design a visualization:

  1. What chart type would you use?
  2. Which variable goes on which axis?
  3. Would you add a trend line? Why or why not?
  4. What pattern would you expect to see?
Show Answer

a) Scatter plot - exploring relationship between two numerical variables

b) X-axis: Satisfaction score (independent variable - what we think influences)

Y-axis: Sick days (dependent variable - what we think is influenced)

c) Yes, add trend line - helps show if there's a linear relationship and its direction

d) Expected pattern: Negative correlation - higher satisfaction likely correlates with fewer sick days. However, investigate outliers (high satisfaction but many sick days could indicate serious health issues unrelated to job satisfaction).

Exercise 8: When NOT to Use

Task: For each scenario, explain why the chosen chart is WRONG:

  1. Using a scatter plot to show categories of products (Electronics, Clothing, Food) vs. sales
  2. Using an area chart for profit over time when some months show negative profit
  3. Using a 100% stacked area for unrelated metrics (temperature, sales, and number of employees)
Show Answer

a) Wrong: Categories on scatter plot

  • Scatter plots require numerical X-axis
  • Product categories are categorical, not numerical
  • Use instead: Bar chart or column chart

b) Wrong: Area chart with negative values

  • Filled area below baseline doesn't work well for negative values
  • Visual becomes confusing
  • Use instead: Line chart or column chart with positive/negative bars

c) Wrong: Stacking unrelated metrics

  • Stacked charts imply parts sum to a meaningful whole
  • Temperature + Sales + Employees doesn't sum meaningfully
  • Use instead: Multiple separate charts or multi-line chart

πŸ“ Knowledge Check

Test your understanding of scatter plots and area charts!

1. What type of data is required for a scatter plot?

2. In a scatter plot showing hours studied vs. test scores, if most points slope upward from left to right, what does this indicate?

3. What is the main difference between a line chart and an area chart?

4. If a scatter plot shows ice cream sales correlating with drowning incidents, what should you conclude?

5. What does a 100% stacked area chart always show?

6. When should you add a trend line to a scatter plot?

7. Which chart type is BEST for showing how three product categories contribute to total quarterly revenue over time?

8. Why should you avoid using area charts for data with negative values?

9. What does each point on a scatter plot represent?

10. When should you use an area chart instead of a line chart?