It may seem trivial to get upset about whether you are using the median or the mean, but it can really matter. We all feel comfortable with the vague idea that an 'average' represents a sort of general tendency. However, it depends on the data whether the mean or median actually gives you a more accurate reflection of an 'average' value.
In early April 2005 there was considerable debate in the media about whether 'average' incomes have gone up or down in the UK. The Institute for Fiscal Studies produced a report in which they stated that the mean 'real household income' fell by 0.2% over 2003/04 against the previous year. This sounds very authoritative, but it is worth pausing to consider if the mean is really the most appropriate measure.
The mean is calculated by adding together all the values, and then dividing them by the number of values you have. As long as the data is symmetrically distributed (that is, if when you plot them on a frequency chart you get a nice symmetrical shape) this is fine - but it can still be thrown right out by a few extreme values, and if the data is not symmetrical (ie. skewed) it can be downright misleading.
It only takes a moment's thought to realise that more people earn low salaries than high ones, because a fairly large proportion of the population works part-time - so the mean is not the best 'average' to use in this case.
The median, on the other hand, really is the middle value. 50% of values are above it, and 50% below it. So when the data is not symmetrical, this is the form of 'average' that gives a better idea of any general tendency in the data. The same report from the IFS states that median real household incomes rose for the same period by 0.5%.
The slightly shocking thing is that where this was reported in the media, some commentators were glorying in this apparent reduction of average incomes as an opportunity to criticise the government. (Gordon Brown, the chancellor, was very frustrated trying to explain that the median is the measure you use for things like income, because the distribution is skewed.)
Either the media commentators didn't know that it was wrong to use the mean in this case or they assumed that their audience wouldn't know, so they could gloss over it, present a more dramatic report and score some unwarranted political points. Neither state of affairs does them credit. To be fair to the media, the IFS report does include the sentence "This is the first time that incomes have fallen since the recession in the early 1990s." when perhaps it would have been more accurate if they had said "This is the first time that mean incomes have fallen..." After all, median income increased. However, despite this, even a brief reading of this section of the IFS report gives the true picture. Maybe next week's headlines will read 'Media mean median'.
So why is the mean quoted most of the time? The mean is a much easier thing to deal with than the median, mathematically, particularly in more complex situations, but it carries an assumption with it that the distribution is symmetrical. Unfortunately because the mean is seen so frequently this distinction gets forgotten and then the mean is wrongly used to summarise non-symmetrical populations.
Always use the median when the distribution is skewed. You can use either the mean or the median when the population is symmetrical, because then they will give almost identical results.
4 April 2005
You can examine the original Institute for Fiscal Studies report
Poverty and Inequality
in Britain: 2005 (with a chart showing the distribution of the data). The statistics
considered here are in
Section 2 Living standards and inequality.
Statistics for the Terrified is a tutorial which provides a thorough grounding in basic statistics for the non-mathematician, using straightforward english and commonsense explanations. It assumes that you have no prior knowledge, and will guide you through from first principles, demonstrating what Statistics is, what it does, and some common mistakes. It will enable users to read and understand statistics quoted in published articles, and can be used as a refresher and a reference manual for professionals who use Statistics in their work. The course is widely used in colleges and universities, and in commercial organisations.