Resources | Subject Notes | Mathematics
Measures of dispersion describe the spread or variability of a dataset. They tell us how much the individual data points deviate from the central tendency (e.g., mean, median). Understanding dispersion is crucial for interpreting data and making informed conclusions.
The range is the simplest measure of dispersion. It is the difference between the highest and lowest values in a dataset.
Formula:
$Range = \text{Highest Value} - \text{Lowest Value}$
Example: Consider the dataset: 5, 8, 2, 9, 1
Range = 9 - 1 = 8
The IQR is a more robust measure of dispersion than the range, as it is less affected by extreme values (outliers). It represents the spread of the middle 50% of the data.
Steps to calculate IQR:
Example: Consider the dataset: 1, 2, 3, 4, 5, 6, 7, 8, 9
Ordered data: 1, 2, 3, 4, 5, 6, 7, 8, 9
Q1 = 3 (the median of 1, 2, 3)
Q3 = 7 (the median of 5, 6, 7, 8, 9)
IQR = 7 - 3 = 4
The standard deviation is a measure of the average distance of data points from the mean. It is a more sophisticated measure of dispersion than the range and IQR.
Formula (for a sample):
$$s = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}}$$Formula (for a population):
$$σ = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \mu)^2}{n}}$$Where:
Steps to calculate Standard Deviation:
Example: Consider the dataset: 4, 6, 5, 7, 8
Mean ($\bar{x}$) = (4 + 6 + 5 + 7 + 8) / 5 = 6
Differences from the mean: -2, 0, -1, 1, 2
Squared differences: 4, 0, 1, 1, 4
Sum of squared differences: 4 + 0 + 1 + 1 + 4 = 10
Variance (sample): 10 / (5 - 1) = 10 / 4 = 2.5
Standard Deviation (sample): $\sqrt{2.5} \approx 1.58$
Measure | Advantages | Disadvantages |
---|---|---|
Range | Simple to calculate | Highly sensitive to outliers |
IQR | Robust to outliers | Ignores the entire dataset beyond the quartiles |
Standard Deviation | Uses all data points, more informative | Sensitive to outliers (although less so than range), more complex to calculate |
Choosing the appropriate measure of dispersion depends on the nature of the data and the presence of outliers. If outliers are present, the IQR is generally preferred. If the data is relatively symmetrical and outliers are not a major concern, the standard deviation provides a more complete picture of the data's spread.