9.2 BETA

Interpreting statistical data

3 learning objectives

1. Overview

Interpreting statistical data is about understanding the story behind the numbers. It involves reading and extracting information from tables, charts, and graphs, calculating key statistical measures, comparing different data sets, and drawing meaningful conclusions. A crucial part is also understanding the limitations of the data and avoiding biased interpretations. This topic equips you with the skills to analyse data effectively and make informed decisions.


Key Definitions

  • Discrete Data: Data that can only take specific, separate values (e.g., number of students in a class, shoe sizes).
  • Continuous Data: Data that can take any value within a given range (e.g., height of a tree, time taken to run a race, mass of an object).
  • Inference: A conclusion or educated guess reached based on the evidence and reasoning derived from the data.
  • Average: A representative value for a data set. The three main types are Mean, Median, and Mode.
  • Spread: A measure of how much the data varies. The most common measure of spread is the Range.
  • Bias: When a data set or the way it was collected is not representative of the whole population, leading to unfair or inaccurate conclusions.

Core Content

A. Reading and Interpreting Diagrams

You must be able to accurately read and extract data from tables, bar charts, pie charts, line graphs, and pictograms. Pay close attention to the scales on the axes of graphs and the keys in pictograms.

Worked example 1 — Reading a Bar Chart

Question: The bar chart below shows the number of cars sold by a dealership each day of the week. How many cars were sold on Wednesday?

📊A bar chart showing the number of cars sold each day. Monday: 10, Tuesday: 12, Wednesday: 15, Thursday: 8, Friday: 20.
  • Step 1: Locate the bar representing Wednesday on the bar chart.
  • Step 2: Identify the scale on the vertical axis (number of cars sold).
  • Step 3: Read the value corresponding to the top of the Wednesday bar.

Answer: 15 cars were sold on Wednesday.

Worked example 2 — Pictogram Interpretation

Question: The pictogram below shows the number of books borrowed from a library each day. How many books were borrowed on Tuesday?

📊A pictogram showing the number of books borrowed. Key: One full book symbol = 8 books. Monday shows 2 full book symbols. Tuesday shows 3 full book symbols and one quarter-book symbol.
  • Step 1: Identify the value of one full book symbol from the key. 1 full book symbol = 8 books
  • Step 2: Count the number of full book symbols for Tuesday. 3 full book symbols
  • Step 3: Calculate the total value of the full book symbols. $3 \times 8 = 24$ books
  • Step 4: Determine the value of the quarter-book symbol. $8 \div 4 = 2$ books
  • Step 5: Add the values together to find the total number of books borrowed on Tuesday. $24 + 2 = 26$ books

Answer: 26 books were borrowed on Tuesday.

B. Comparing Sets of Data

When asked to compare two or more sets of data, you must always comment on at least one measure of average and one measure of spread.

  1. An Average: Usually the Mean or Median. This indicates which data set has higher or lower central tendency.
  2. The Spread: Usually the Range. This indicates which data set is more or less consistent.

Worked example 3 — Comparing Exam Scores

Question: The exam scores for two students, Ali and Ben, are shown below. Compare their performance.

  • Ali: Mean = 75%, Range = 20%

  • Ben: Mean = 82%, Range = 35%

  • Comparison 1 (Average): Ben performed better on average because his mean score (82%) is higher than Ali's mean score (75%).

  • Comparison 2 (Spread): Ali was more consistent because his range (20%) is smaller than Ben's range (35%).

Worked example 4 — Comparing Heights

Question: The heights (in cm) of students in two different classes are recorded. Class X has a mean height of 165cm and a range of 25cm. Class Y has a mean height of 160cm and a range of 15cm. Compare the heights of the students in the two classes.

  • Comparison 1 (Average): The students in Class X are, on average, taller than the students in Class Y because the mean height of Class X (165cm) is higher than the mean height of Class Y (160cm).
  • Comparison 2 (Spread): The heights of the students in Class Y are more consistent than the heights of the students in Class X because the range of Class Y (15cm) is smaller than the range of Class X (25cm).

C. Restrictions on Drawing Conclusions

Be aware that not all data allows for definitive conclusions. Consider these factors:

  • Sample Size: Is the sample size large enough to be representative of the entire population? A very small sample size may not accurately reflect the overall trend.
  • Bias: Is there any bias in the way the data was collected? For example, a survey conducted only among members of a specific group may not be representative of the general population.
  • Outliers: Are there any extreme values (outliers) that could disproportionately affect the mean or range? Outliers can skew the results and lead to misleading conclusions.

Extended Content (Extended Only)

While the IGCSE syllabus for topic 9.2 doesn't explicitly list Extended-only objectives, Extended students are expected to apply the core skills to more complex scenarios and data presentations. This includes:

  • Interpreting grouped data: This involves working with data presented in frequency tables with class intervals. You should be able to estimate the mean from grouped data.
  • Drawing Inferences from Cumulative Frequency Diagrams: Extended students should be able to interpret cumulative frequency diagrams to compare distributions and estimate percentiles.
  • Interpreting Box Plots: Box plots (box-and-whisker plots) provide a visual representation of the median, quartiles, and range of a data set. Extended students should be able to compare data sets using box plots.

Worked example 5 — Estimating the Mean from Grouped Data

Question: The table below shows the heights of 100 plants in a garden. Estimate the mean height.

Height (cm) Frequency
0 < h ≤ 10 10
10 < h ≤ 20 25
20 < h ≤ 30 35
30 < h ≤ 40 20
40 < h ≤ 50 10
  • Step 1: Find the midpoint of each class interval.
    • 0 < h ≤ 10: Midpoint = $(0 + 10) / 2 = 5$
    • 10 < h ≤ 20: Midpoint = $(10 + 20) / 2 = 15$
    • 20 < h ≤ 30: Midpoint = $(20 + 30) / 2 = 25$
    • 30 < h ≤ 40: Midpoint = $(30 + 40) / 2 = 35$
    • 40 < h ≤ 50: Midpoint = $(40 + 50) / 2 = 45$
  • Step 2: Multiply each midpoint by its corresponding frequency.
    • $5 \times 10 = 50$
    • $15 \times 25 = 375$
    • $25 \times 35 = 875$
    • $35 \times 20 = 700$
    • $45 \times 10 = 450$
  • Step 3: Sum the products from Step 2. $\sum fx = 50 + 375 + 875 + 700 + 450 = 2450$
  • Step 4: Divide the sum by the total frequency (which is 100 in this case). Mean = $\frac{2450}{100} = 24.5$

Answer: The estimated mean height of the plants is 24.5 cm.


Key Equations

These formulas are not provided on the IGCSE formula sheet and must be memorised.

Mean = $\frac{\sum x}{n}$ where $\sum x$ is the sum of all values and $n$ is the total number of values.

Range = Highest value - Lowest value Measures the spread of the data.

Angle in Pie Chart = $\frac{\text{Frequency}}{\text{Total Frequency}} \times 360^\circ$ Use a protractor for drawing; check that the angles sum to $360^\circ$.


Common Mistakes to Avoid

  • Wrong: Only stating the mode when asked to describe the average. ✓ Right: Calculate and state the mean, median, or mode, and specify which one you are using.

  • Wrong: Ignoring the key in a pictogram and simply counting the symbols. ✓ Right: Always check the key first to determine the value represented by each symbol or part of a symbol. For example, if the key states that one circle represents 4 items, then half a circle represents 2 items.

  • Wrong: Assuming a larger range always indicates a "better" data set. ✓ Right: A larger range indicates greater variability or inconsistency in the data. A smaller range indicates more consistent data.

  • Wrong: Calculating the mean from grouped data by simply averaging the class interval endpoints. ✓ Right: Use the midpoints of the class intervals as representative values when estimating the mean from grouped data.


Exam Tips

  • Command Words: If the question uses the word "Interpret", provide a sentence explaining the meaning of the statistical value in the context of the problem. For example, "The mean score of 70% indicates that, on average, students performed well on the test."
  • Calculators: Use your calculator to compute the mean, especially in Paper 2 and Paper 4. However, always show your working (the sum of the values and the division by the number of values). This allows you to earn method marks even if you make a small error in the final calculation.
  • Units: Always include the appropriate units in your answer (e.g., cm, kg, seconds, $) if the units are provided in the data. Omitting units can result in a loss of marks.
  • Real-world Context: Many questions are set in real-world scenarios involving topics like weather patterns, student grades, or business profits. Ensure that your interpretations and inferences are logical and make sense within the given context.
  • Mark Allocation: For comparison questions worth multiple marks, allocate your time and effort accordingly. Typically, one mark is awarded for comparing the average (mean/median) and another mark for comparing the spread (range). Present your comparisons clearly and concisely, using bullet points if appropriate.

Test Your Knowledge

Ready to check what you've learned? Practice with 10 flashcards covering key definitions and concepts from Interpreting statistical data.

Study Flashcards Practice MCQs

Frequently Asked Questions: Interpreting statistical data

What is Discrete Data in Interpreting statistical data?

Discrete Data: Data that can only take specific, separate values (e.g., number of students in a class, shoe sizes).

What is Continuous Data in Interpreting statistical data?

Continuous Data: Data that can take any value within a given range (e.g., height of a tree, time taken to run a race, mass of an object).

What is Inference in Interpreting statistical data?

Inference: A conclusion or educated guess reached based on the evidence and reasoning derived from the data.

What is Average in Interpreting statistical data?

Average: A representative value for a data set. The three main types are Mean, Median, and Mode.

What is Spread in Interpreting statistical data?

Spread: A measure of how much the data varies. The most common measure of spread is the Range.

What is Bias in Interpreting statistical data?

Bias: When a data set or the way it was collected is not representative of the whole population, leading to unfair or inaccurate conclusions.

What are common mistakes students make about Interpreting statistical data?

Common mistake: Only stating the mode when asked to describe the average. → Correct: Calculate and state the mean, median, or mode, and specify which one you are using. Common mistake: Ignoring the key in a pictogram and simply counting the symbols. → Correct: Always check the key first to determine the value represented by each symbol or part of a symbol. For example, if the key states that one circle represents 4 items, then half a circle represents 2 items.