Lies, Damned Lies, and Statistics (21): Misleading Averages

Did you hear the joke about the statistician who put her head in the oven and her feet in the refrigerator? She said, “On average, I feel just fine.” That’s the same message as in this more widely known joke about statisticians drowning in a pond with an average depth of 3ft. And then there’s this one: did you know that the great majority of people have more than the average number of legs? It’s obvious, really: Among the 57 million people in Britain, there are probably 5,000 people who have only one leg. Therefore, the average number of legs is less than 2. In this case, the median would be a better measure than the average or the mean.

But seriously now, averages can be very misleading, also in statistical work in the field of human rights. Take income data, for example. Income as such isn’t a human rights issue, but poverty is. When we look at income data, we may see that average income is rising. However, this may be due to extreme increases at the top 1% of income. If you then exclude the income increases of the top 1% of the population, the large majority of people may not experience rising income. Possible even the opposite. And rising average income – even excluding extremes at the top levels – is perfectly compatible with rising poverty for certain parts of the population.

Averages are often skewed by outliers. That is why it’s necessary to remove outliers and calculate the averages without them. That will give you a better picture of the characteristics of the general population (the “real” average income evolution in my example). A simple way to neutralize outliers is to look at the median – the middle value of a series of values – rather than the average (or the mean).

An average (or a median for that matter) also doesn’t say anything about the extremes (or, in stat-speak, about the variability or dispersion of the population). A high average income level can hide extremely low and high income levels for certain parts of the population. So, for example, if you start to compare income levels across different countries, you’ll use the average income. Yet country A may have a lower average income than country B, but also lower levels of poverty than country B. That’s because the dispersion of income levels in country A is much smaller than in country B. The average in B is the result of adding together extremely low incomes (i.e. poverty) and extremely high incomes, whereas the average in A comes from the sum of incomes that are much more equal. From the point of view of poverty average income is misleading because it identifies country A as most poor, whereas in reality there are more poor people in country B. So when looking at averages, it’s always good to look at the standard deviation as well. SD is a measure of the dispersion around the mean.