Chapter 4

Types of Data

Learn to recognize and classify different types of data to choose the right analysis methods.

🔍 Why Data Types Matter

Imagine you're organizing a party. You have a list of guests with their information:

  • Name: Sarah Johnson
  • Age: 28
  • RSVP Status: Yes
  • T-shirt Size: Medium
  • Number of Guests: 2
  • Dietary Preference: Vegetarian

Each of these pieces of information is data, but they're all different types of data. And here's the critical insight: you can't treat them all the same way.

🎯 Real-Life Analogy: Tools for Different Jobs

Think about tools in a toolbox:

  • You can't hammer in a screw (wrong tool for the job)
  • You can't measure with a screwdriver (tool doesn't match the task)
  • You need to match the tool to what you're working with

Data is the same: Different types of data require different analysis approaches. Using the wrong method gives you meaningless or misleading results!

💡 What You'll Learn

By the end of this chapter, you'll be able to:

  • Recognize and classify different data types
  • Understand the difference between quantitative and qualitative data
  • Distinguish discrete from continuous, nominal from ordinal
  • Choose appropriate analysis methods for each data type
  • Avoid common data type mistakes

🌳 The Data Type Hierarchy

All data can be organized into a hierarchy. Let's visualize this:

📊 All Data

Everything splits into two main categories

Quantitative Data

Numbers that measure or count things

Examples: Age, height, price, temperature, number of items

Discrete

Countable, whole numbers only

Examples: Number of students (1, 2, 3...), products sold (10, 15, 20...)

Continuous

Measurable, can have decimals

Examples: Height (5.7 ft), temperature (98.6°F), weight (150.3 lbs)

Qualitative Data

Categories, labels, and descriptions

Examples: Colors, names, cities, product types, yes/no answers

Nominal

Categories with no inherent order

Examples: Colors (red, blue, green), cities (NYC, LA, Chicago)

Ordinal

Categories with a meaningful order

Examples: Ratings (1-5 stars), sizes (S, M, L, XL), education level

🔑 Key Distinction

The fundamental question: "Is it a number that you can do math with?"

  • YES → Quantitative (numbers with meaning)
  • NO → Qualitative (categories or labels)

Note: Just because something looks like a number doesn't make it quantitative! ZIP codes are numbers, but you can't add them together meaningfully.

🎯 Drag & Drop: Categorize the Data!

Drag each example to the correct category - is it Quantitative or Qualitative?

Data Examples:
Age in years (25, 30, 45)
Favorite color (red, blue, green)
Temperature (72°F, 98.6°F)
City name (NYC, LA, Chicago)
Product price ($19.99, $50.00)
Letter grade (A, B, C, D, F)
Categories:
🔢 Quantitative

Numbers you can do math with

🏷️ Qualitative

Categories or descriptions

🎯 Advanced: Discrete vs Continuous!

These are ALL quantitative. Can you tell which are discrete (countable) vs continuous (measurable)?

Quantitative Examples:
Number of students (30, 45, 60)
Person's height (5.7 ft, 6.2 ft)
Items sold (10, 25, 100)
Weight (150.5 lbs, 180.3 lbs)
Website clicks (500, 1000, 1500)
Time elapsed (2.5 hours, 3.75 hours)
Subcategories:
📊 Discrete

Whole numbers, countable

📈 Continuous

Can have decimals, measurable

⚡ True or False: Test Your Understanding!

1. Quantitative data is always numeric and you can do math with it.

2. ZIP codes are quantitative data because they're numbers.

3. The difference between discrete and continuous is that discrete data has gaps (like 1, 2, 3) while continuous can be any value (like 1.5, 2.7, 3.14).

4. Nominal and ordinal data are the same - they're both categories.

5. A 5-star rating system (1, 2, 3, 4, 5 stars) is ordinal qualitative data.

✍️ Complete the Definitions!

Fill in the blanks to complete these data type definitions:

Word Bank: Quantitative Qualitative Discrete Continuous Nominal Ordinal order countable
1. data consists of numbers that you can do math with, like age, temperature, or price.
2. data consists of categories, labels, or descriptions, like colors or city names.
3. data is and can only take specific whole number values.
4. data can take any value within a range and can have decimal places.
5. data has categories with a meaningful , like small, medium, and large.
6. data has categories with no inherent ranking or order.

🔢 Quantitative Data (Numbers)

Quantitative data represents quantities—things you can measure or count.

Discrete Data: Counting Things

Definition: Data that comes from counting. Always whole numbers, no decimals.

👥 Number of Students

Values: 0, 1, 2, 3, 4...

Why discrete? You can't have 2.5 students in a class!

🛍️ Products Sold

Values: 10, 15, 20, 100...

Why discrete? You sell whole items, not fractions.

⭐ Customer Reviews

Values: 1, 2, 3, 4, 5 stars

Why discrete? Rating systems use specific values.

🚗 Cars in Parking Lot

Values: 0, 1, 2, 50, 100...

Why discrete? You count whole vehicles.

Continuous Data: Measuring Things

Definition: Data that comes from measuring. Can have any value within a range, including decimals.

📏 Height

Values: 5.7 ft, 5.75 ft, 5.752 ft...

Why continuous? Height can be measured to any precision.

🌡️ Temperature

Values: 72.3°F, 72.35°F, 72.351°F...

Why continuous? Temperature exists between whole numbers.

⏱️ Time Duration

Values: 2.5 hours, 2.53 hours, 2.534 hours...

Why continuous? Time can be infinitely subdivided.

💰 Price

Values: $19.99, $19.995, $19.9949...

Why continuous? Money can have any value (in theory).

🎮 Interactive: Classify These Examples

For each example, determine if it's Discrete or Continuous:

  1. Number of emails received today
  2. Weight of a package
  3. Number of pages in a book
  4. Speed of a car

💡 Quick Test: Discrete vs. Continuous

Ask yourself: "Can this value exist between two whole numbers?"

  • NO → Discrete (you can have 3 or 4, but not 3.5)
  • YES → Continuous (temperature can be 72.5°F)

🏷️ Qualitative Data (Categories & Descriptions)

Qualitative data represents qualities—categories, labels, or descriptions that don't involve numbers.

Nominal Data: Categories Without Order

Definition: Categories that have no inherent ranking or order. One isn't "better" or "higher" than another.

🎨 Colors

Values: Red, Blue, Green, Yellow

Why nominal? No color is "greater" than another.

🌍 Cities

Values: New York, London, Tokyo, Sydney

Why nominal? Just different locations, no ranking.

🍕 Food Types

Values: Pizza, Pasta, Salad, Burger

Why nominal? Different categories, not ordered.

✅ Yes/No

Values: Yes, No

Why nominal? Binary choice with no order.

Ordinal Data: Categories With Order

Definition: Categories that have a meaningful order or ranking, but the distance between categories isn't necessarily equal.

⭐ Rating Scale

Values: 1 star, 2 stars, 3 stars, 4 stars, 5 stars

Why ordinal? Clear order, but is the difference between 1 and 2 stars the same as between 4 and 5?

👕 T-Shirt Sizes

Values: XS, S, M, L, XL, XXL

Why ordinal? Ordered by size, but gaps aren't uniform.

🎓 Education Level

Values: High School, Bachelor's, Master's, PhD

Why ordinal? Clear progression, but years between levels vary.

📊 Satisfaction Level

Values: Very Unsatisfied, Unsatisfied, Neutral, Satisfied, Very Satisfied

Why ordinal? Ordered from negative to positive.

✅ Nominal Example

Favorite Sport: Soccer, Basketball, Tennis

Key: No sport is "higher" or "better" objectively—they're just different options.

✅ Ordinal Example

Medal Type: Bronze, Silver, Gold

Key: Clear ranking (Gold > Silver > Bronze), but the "distance" between them isn't quantified.

💡 Quick Test: Nominal vs. Ordinal

Ask yourself: "Can I rank these categories from low to high?"

  • NO → Nominal (just different labels)
  • YES → Ordinal (there's an order)

🎯 Special Data Types

Some data types have unique characteristics that make them worth special attention:

📅 Date/Time Data

Why it's unique: Can be treated as both categorical and quantitative!

As Quantitative:

  • Calculate time differences (days between events)
  • Measure durations
  • Analyze trends over time

As Categorical:

  • Group by day of week (Monday, Tuesday...)
  • Group by month (January, February...)
  • Group by season (Spring, Summer, Fall, Winter)

Considerations:

  • Time zones (3 PM in NYC ≠ 3 PM in Tokyo)
  • Date formats (MM/DD/YYYY vs. DD/MM/YYYY)
  • Timestamps (exact moment in time)

📝 Text Data

Why it's unique: Unstructured and requires special processing.

Examples:

  • Customer reviews: "This product is amazing!"
  • Social media posts
  • Survey comments
  • Email content

Analysis Methods:

  • Sentiment analysis (positive, negative, neutral)
  • Keyword extraction
  • Topic modeling
  • Text mining

✔️ Boolean Data

Why it's unique: Only two possible values.

Forms:

  • True / False
  • Yes / No
  • 1 / 0
  • On / Off

Common Uses:

  • Email subscription status (subscribed: yes/no)
  • Feature flags (enabled: true/false)
  • Checkbox selections
  • Filtering conditions

🤔 Edge Case: Numbers That Aren't Quantitative

Just because something is written as a number doesn't mean it's quantitative data!

Data Looks Like Actually Is Why?
ZIP Code 10001, 90210 Nominal Can't do math (10001 + 90210 = meaningless)
Phone Number 555-1234 Nominal Just an identifier, not a quantity
Student ID 12345 Nominal Label, not a measurement
Jersey Number #23, #10 Nominal Just a label for identification

The Test: Ask "Does math on this number mean anything?" If NO → It's categorical!

🎯 Why Data Types Matter

Understanding data types isn't just academic—it directly impacts what you can and cannot do with your data.

Different Data Types → Different Operations

Data Type ✅ You CAN Do ❌ You CANNOT Do
Quantitative • Calculate average
• Find sum/total
• Measure spread
• Perform arithmetic
• N/A (most math works)
Discrete • Count frequency
• Calculate mode
• Create bar charts
• Some continuous-specific analyses
Continuous • Calculate precise averages
• Measure exact differences
• Create histograms
• Simple counting (need to group into ranges)
Nominal • Count occurrences
• Find mode (most common)
• Create pie charts
• Calculate average
• Perform math
• Rank/order
Ordinal • Rank/order
• Find median
• Compare greater/less
• Calculate meaningful average
• Precise arithmetic

🚫 Example: What Happens When You Use the Wrong Type

Scenario: Student Satisfaction Survey

Students rate satisfaction: 1 (Very Unsatisfied), 2 (Unsatisfied), 3 (Neutral), 4 (Satisfied), 5 (Very Satisfied)

Results: Student A = 5, Student B = 5, Student C = 1, Student D = 1

❌ Wrong: Treating as Quantitative

Calculation: Average = (5 + 5 + 1 + 1) / 4 = 3

Interpretation: "Average satisfaction is Neutral"

Problem: This hides the reality! Half love it, half hate it. There's no "neutral" middle ground—the average is misleading!

✅ Right: Treating as Ordinal

Analysis: Distribution

  • Very Satisfied: 50%
  • Very Unsatisfied: 50%
  • Others: 0%

Interpretation: "Opinion is polarized—half love it, half hate it"

Why better: Shows the true story!

🎯 The Golden Rule

Always identify your data type BEFORE analyzing!

Using the wrong analysis for a data type can lead to:

  • Misleading results
  • Incorrect conclusions
  • Bad decisions based on faulty analysis
  • Loss of trust in your insights

🌲 Data Type Decision Tree

Not sure what type your data is? Follow this decision tree:

Start: Look at your data

What kind of values do you have?

Numbers

Values are numeric

Words/Categories

Values are labels or text

🔄 Practice: Apply the Decision Tree

For each example, follow the decision tree:

  1. Movie genres (Action, Comedy, Drama) → ?
  2. Test scores (0-100) → ?
  3. Number of employees → ?
  4. Customer satisfaction (Poor, Fair, Good, Excellent) → ?
  5. Credit card numbers → ?

Answers: 1) Nominal, 2) Continuous, 3) Discrete, 4) Ordinal, 5) Nominal

⚠️ Common Mistakes with Data Types

Here are the most frequent errors people make—and how to avoid them:

❌ Mistake #1: Treating Ordinal as Nominal

The Error: Ignoring the order in ordinal data

Example: T-shirt sizes (XS, S, M, L, XL)

Wrong Approach

Treating them as unordered categories, like colors

Result: Missing insights about distribution (are most people M-L?)

Right Approach

Recognize the order and visualize accordingly

Result: Can see if sizes cluster or spread out

❌ Mistake #2: Averaging Rankings/Ratings

The Error: Calculating mean of ordinal data

Example: Restaurant ratings (1-5 stars)

Misleading

10 reviews: Five 5-stars, Five 1-stars

Average = 3 stars

Conclusion: "Average restaurant"

Problem: Hides polarization!

Accurate

Show distribution:

50% give 5 stars
50% give 1 star

Conclusion: "Polarizing—love it or hate it"

Better: Shows the reality!

❌ Mistake #3: Numbers That Aren't Quantitative

The Error: Assuming all numbers are quantitative

Example: ZIP codes

Wrong

Calculating average ZIP code

(10001 + 90210 + 60601) / 3 = 53,604

Problem: Meaningless number!

Right

Treating ZIP codes as nominal categories

Count: How many customers per ZIP?

Better: Useful information!

❌ Mistake #4: Inappropriate Visualizations

The Error: Using the wrong chart for the data type

Data Type ❌ Wrong Chart ✅ Right Chart
Nominal Line chart (implies order/trend) Bar chart or pie chart
Ordinal Pie chart (loses order) Bar chart (shows order)
Continuous Bar chart with gaps Histogram or line chart

✅ How to Avoid These Mistakes

  1. Always identify data type FIRST before analyzing
  2. Ask: "Does this operation make sense for this data type?"
  3. Check: Would the result be meaningful and interpretable?
  4. Verify: Does my visualization match my data type?

📝 Knowledge Check

Test your understanding of data types with these questions:

1. Which of the following is an example of discrete quantitative data?

2. T-shirt sizes (XS, S, M, L, XL) are an example of what type of data?

3. Why are ZIP codes considered nominal data, not quantitative?

4. Customer satisfaction ratings (Very Unsatisfied, Unsatisfied, Neutral, Satisfied, Very Satisfied) should be analyzed as:

5. Which of the following is continuous data?

6. Colors (Red, Blue, Green, Yellow) are what type of data?

7. What is the main problem with calculating the average of ordinal data like ratings?

8. Date/time data is unique because:

9. What is the fundamental difference between Quantitative and Qualitative data?

10. A common mistake is treating customer satisfaction ratings (Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied) as nominal instead of ordinal. Why is this problematic?