Taking an Average of Averages. Why it’s Wrong

I frequently create reports that show the average (mean) value of a metric based on any number of dimensions. For example, I might report the average cost of a product by state. Usually, I include the overall average cost at the bottom of the report, and one of the first things people do to “check” my work is to highlight the entire row of average costs, to have Excel show them the average of the entire group. These numbers rarely match because taking an average of averages is wrong.

The reason an average of averages is wrong is that it doesn’t take into account how many units went into each average. For example: Let’s say we only have orders in 2 states, New York and Pennsylvania. The average cost of the product in NY is $100, and the average cost in PA is $50. Now…let’s say order volume in NY is 10 orders, and volume in PA is 2 orders. Taking the average of the two averages would give us $75 as the overall average cost. (($100+$50 = $150)/2) = $75 However, if we take volume into account, as we should, the average is $91.67. 10 NY orders at $100 = $1,000 + 2 PA orders at $50 = $100, for a total cost of $1,100, divided by all 12 orders = $91.67. That’s a difference of over $16! Now what if we had 100,000 vs 20,000 orders and we were trying to budget or plan for next year? Instead of reporting the actual total costs of $11,400,000, we would inaccurately report that our total costs are expected to be around $9 Million. Finding out late next year that you miscalculated by $2.4 Million might be considered a big deal…

math, education, chalkboard-1500720.jpg

If you must calculate an average of these averages, and you have volume information, there’s a handy Excel function called sumproduct. SumProduct is a function that takes two arrays (vectors) and multiplies them together, line by line, before adding up the total. In the example above, assuming our state is in column A, Volume is in Column B, and Avg Cost in Column C, we’d use the formula =sumproduct(B2:B3,C2:C3). This takes B2*C2 + B3*C3, giving us the total of $1,100, as we had calculated previously. If we take that formula a step further, to divide by the total number of orders, we can get the actual average cost per order. =sumproduct(B2:B3,C2:C3)/sum(B2:B3)

So, the next time you need to get the average of values that have already been averaged, make sure you take the volume into account.