Measures of dispersion: range, interquartile range, standard deviation

IGCSE Mathematics 0580 - Statistics - Measures of Dispersion

IGCSE Mathematics 0580 - Statistics

Measures of Dispersion

Measures of dispersion describe the spread or variability of a dataset. They tell us how much the individual data points deviate from the central tendency (e.g., mean, median). Understanding dispersion is crucial for interpreting data and making informed conclusions.

1. Range

The range is the simplest measure of dispersion. It is the difference between the highest and lowest values in a dataset.

Formula:

$Range = \text{Highest Value} - \text{Lowest Value}$

Example: Consider the dataset: 5, 8, 2, 9, 1

Range = 9 - 1 = 8

2. Interquartile Range (IQR)

The IQR is a more robust measure of dispersion than the range, as it is less affected by extreme values (outliers). It represents the spread of the middle 50% of the data.

Steps to calculate IQR:

Order the data: Arrange the data in ascending order.
Find Q1 (First Quartile): Q1 is the median of the lower half of the data.
Find Q3 (Third Quartile): Q3 is the median of the upper half of the data.
Calculate IQR: $IQR = Q3 - Q1$

Example: Consider the dataset: 1, 2, 3, 4, 5, 6, 7, 8, 9

Ordered data: 1, 2, 3, 4, 5, 6, 7, 8, 9

Q1 = 3 (the median of 1, 2, 3)

Q3 = 7 (the median of 5, 6, 7, 8, 9)

IQR = 7 - 3 = 4

3. Standard Deviation

The standard deviation is a measure of the average distance of data points from the mean. It is a more sophisticated measure of dispersion than the range and IQR.

Formula (for a sample):

$$s = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}}$$

Formula (for a population):

$$σ = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \mu)^2}{n}}$$

Where:

$s$ represents the sample standard deviation.
$σ$ represents the population standard deviation.
$x_i$ represents each individual data point.
$\bar{x}$ represents the sample mean.
$\mu$ represents the population mean.
$n$ represents the number of data points.

Steps to calculate Standard Deviation:

Calculate the mean (average) of the data.
Find the difference between each data point and the mean.
Square each of those differences.
Sum the squared differences.
Divide the sum by (n-1) for a sample or n for a population.
Take the square root of the result.

Example: Consider the dataset: 4, 6, 5, 7, 8

Mean ($\bar{x}$) = (4 + 6 + 5 + 7 + 8) / 5 = 6

Differences from the mean: -2, 0, -1, 1, 2

Squared differences: 4, 0, 1, 1, 4

Sum of squared differences: 4 + 0 + 1 + 1 + 4 = 10

Variance (sample): 10 / (5 - 1) = 10 / 4 = 2.5

Standard Deviation (sample): $\sqrt{2.5} \approx 1.58$

Comparison of Measures

Measure	Advantages	Disadvantages
Range	Simple to calculate	Highly sensitive to outliers
IQR	Robust to outliers	Ignores the entire dataset beyond the quartiles
Standard Deviation	Uses all data points, more informative	Sensitive to outliers (although less so than range), more complex to calculate

Choosing the appropriate measure of dispersion depends on the nature of the data and the presence of outliers. If outliers are present, the IQR is generally preferred. If the data is relatively symmetrical and outliers are not a major concern, the standard deviation provides a more complete picture of the data's spread.