Statisticians cringe when people toss around the term “**average**“. “Average” has a narrow application: predicting the sum-total, in the long run, with many repeated instances on the way. But people use the average to predict what the next event will be — and average just doesn’t work that way. Local and recent trends in data aren’t reflected in averages — e.g. predicting tomorrow’s weather based on the annual average; and tiny-seeming changes in averages can mean huge changes in the real world (think: global warming).

What people really need when trying to detect change in the short term is the **median**. Unfortunately, it is easier for software to compute “average” than “median”: add up the numbers, divide by the count. So common software tools have always provides handy AVERAGE functions. AVERAGE gives you a pretty number. MEDIAN gives you something you can make a decision on.

I love SQL for the same reason I like bit twiddling — inherent parallelism. But”median” is just not something basic SQL was built to do. Suppose I have a table *List*(*num *INTEGER). The SQL for MEDIAN looks like:

SELECT ( (SELECT MIN(num) FROM List X
WHERE (SELECT COUNT(*) FROM List Y WHERE Y.num <= X.num)
>= (SELECT COUNT(*) FROM List) / 2)
+ (SELECT MAX(num) FROM List X
WHERE (SELECT COUNT(*) FROM List Y WHERE Y.num >= X.num)
>= (SELECT COUNT(*) FROM List) / 2)
) / 2

Ew — not even considering execution time. The good news is, there is a common “special” case in the real world: small numbers are more common than large numbers. That’s typical for any population that divvies up a limited resource. For positive numbers, this leads to the situation that AVG(num) and 1/AVG(1/num) neatly bracket the median; it is very closely approximated by the geometric mean of the two. The SQL for this is somewhat smaller:

SELECT SQRT(SUM(num) / SUM(1.0/num))

### Like this:

Like Loading...

*Related*

## About mischasan

I've had the privilege to work in a field where abstract thinking has concrete value. That applies at the macro level --- optimizing actions on terabyte database --- or the micro level --- fast parallel string searches in memory. You can find my documents on production-system radix sort (NOT just for academics!) and some neat little tricks for developers, on my blog https://mischasan.wordpress.com
My e-mail sig (since 1976):
Engineers think equations approximate reality.
Physicists think reality approximates the equations.
Mathematicians never make the connection.