What is the mean? |
The mean is an average, one of several that summarise the typical value of a set of data. The mean is the grand total divided by the number of data points.
What is the median? |
The median is the middle value in a sample sorted into ascending order. If the sample contains an even number of values, the median is defined as the mean of the middle two.
Which is better? |
Is it better to use the mean or the median? This may sound like an obscure technical question, but it really can matter. The short answer is "it depends" - to know which you should use, you must know how your data is distributed. The mean is the one to use with symmetrically distributed data; otherwise, use the median. If you follow this rule, you will get a more accurate reflection of an 'average' value.
This question was at the heart of a UK news story in 2005, where the use of median or mean was at the bottom of substantial national confusion. The following paragraphs tell the story.
In early April 2005 there was considerable debate in the UK media about whether 'average' incomes had gone up or down. The Institute for Fiscal Studies produced a report in which they stated that the mean 'real household income' fell by 0.2% over 2003/04 against the previous year. This sounds very authoritative, but it is worth pausing to consider if the mean is really the most appropriate measure.
The mean is calculated by adding together all the values, and then dividing them by the number of values you have. As long as the data is symmetrically distributed (that is, if when you plot them on a frequency chart you get a nice symmetrical shape) this is fine - but the mean can still be thrown right out by a few extreme values, and if the data is not symmetrical (ie. skewed) it can be downright misleading.
It only takes a moment's thought to realise that more people earn low salaries than high ones, because a fairly large proportion of the population works part-time - so the data will not be symmetrically distributed. Therefore the mean is not the best 'average' to use in this case.
The median, on the other hand, really is the middle value. 50% of values are above it, and 50% below it. So when the data is not symmetrical, this is the form of 'average' that gives a better idea of any general tendency in the data. The same report from the IFS states that median real household incomes rose for the same period by 0.5%.
The slightly shocking thing is that where this was reported in the media, some commentators were glorying in this apparent reduction of average incomes as an opportunity to criticise the government. (Gordon Brown, who was the chancellor at the time, was very frustrated trying to explain that the median is the measure you use for things like income, because the distribution is skewed.)
Either the media commentators didn't know that it was wrong to use the mean in this case or they assumed that their audience wouldn't know, so they could gloss over it, present a more dramatic report and score some unwarranted political points. Neither state of affairs does them credit. To be fair to the media, the IFS report does include the sentence "This is the first time that incomes have fallen since the recession in the early 1990s." when perhaps it would have been more accurate if they had said "This is the first time that meanincomes have fallen..." After all, median income increased. However, despite this, even a brief reading of this section of the IFS report gives the true picture. It would have been nice if the next week's headlines had read 'Media mean median'.
We are all much more familiar with the mean - why? People like using the mean because it is a much easier thing to deal with than the median, mathematically, particularly in more complex situations: but as we have seen, it carries an assumption with it that the distribution is symmetrical. A great deal of data is symmetrical; the normal distribution is so named for this reason. But unfortunately because the mean is seen so frequently this distinction gets forgotten - in many people's minds the 'average' is the mean - and then the mean is wrongly used to summarise non-symmetrical populations.
So remember:
Always use the median when the distribution is skewed. You can use either the mean or the median when the population is symmetrical, because then they will give almost identical results.