用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。
示例数据:
id | val
--------
1 4
2 7
3 2
4 2
5 9
6 8
7 3
对val排序得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val) == 5。
我没有将这个解决方案的性能与这里发布的其他答案进行比较,但我发现这个解决方案是最容易理解的,并且涵盖了计算中位数的全部数学公式。换句话说,这个解决方案对于偶数和奇数数据集足够健壮:
SELECT CASE
-- odd-numbered data sets:
WHEN MOD(COUNT(*), 2) = 1 THEN (SELECT median.<value> AS median
FROM
(SELECT t1.<value>
FROM (SELECT <value>,
ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
FROM <data>) t1,
(SELECT COUNT(*) AS num_records FROM <data>) t2
WHERE t1.rownum =(t2.num_records) / 2) as median)
-- even-numbered data sets:
ELSE (select (low_bound.<value> + up_bound.<value>) / 2 AS median
FROM
(SELECT t1.<value>
FROM (SELECT <value>,
ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
FROM <data>) t1,
(SELECT COUNT(*) AS num_records FROM <data>) t2
WHERE t1.rownum =(t2.num_records - 1) / 2) as low_bound,
(SELECT t1.<value>
FROM (SELECT <value>,
ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
FROM station) t1,
(SELECT COUNT(*) AS num_records FROM data) t2
WHERE t1.rownum =(t2.num_records + 1) / 2) as up_bound)
END
FROM <data>
如果MySQL有ROW_NUMBER,那么MEDIAN是(受SQL Server查询的启发):
WITH Numbered AS
(
SELECT *, COUNT(*) OVER () AS Cnt,
ROW_NUMBER() OVER (ORDER BY val) AS RowNum
FROM yourtable
)
SELECT id, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
;
如果您有偶数个条目,则使用IN。
如果你想找到每个组的中位数,那么只需要在你的OVER子句中PARTITION BY组。
Rob
归档完美中位数的单个查询:
SELECT
COUNT(*) as total_rows,
IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), ROUND((CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL)) / 2)) as median,
AVG(val) as average
FROM
data
我刚刚在网上的评论中找到了另一个答案:
对于几乎所有SQL中的中位数:
SELECT x.val from data x, data y
GROUP BY x.val
总和(符号(1-SIGN (y.val-x.val))) = (COUNT (*) + 1) / 2
确保列有良好的索引,并且索引用于筛选和排序。与解释计划核对。
select count(*) from table --find the number of rows
计算“中值”行号。可能使用:median_row = floor(count / 2)。
然后把它从列表中挑出来:
select val from table order by val asc limit median_row,1
这将返回您想要的值的一行。