用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

对val排序得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val) == 5。


当前回答

来自: http://mdb-blog.blogspot.com/2015/06/mysql-find-median-nth-element-without.html

我建议另一种方法,不用连接, 但是处理字符串

我没有用大数据表来检验, 但是中小型的桌子就可以了。

这里的好处是,它也可以通过GROUPING工作,所以它可以返回几个项目的中值。

下面是测试表的测试代码:

DROP TABLE test.test_median
CREATE TABLE test.test_median AS
SELECT 'book' AS grp, 4 AS val UNION ALL
SELECT 'book', 7 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 9 UNION ALL
SELECT 'book', 8 UNION ALL
SELECT 'book', 3 UNION ALL

SELECT 'note', 11 UNION ALL

SELECT 'bike', 22 UNION ALL
SELECT 'bike', 26 

求每组中位数的代码:

SELECT grp,
         SUBSTRING_INDEX( SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val), ',', COUNT(*)/2 ), ',', -1) as the_median,
         GROUP_CONCAT(val ORDER BY val) as all_vals_for_debug
FROM test.test_median
GROUP BY grp

输出:

grp | the_median| all_vals_for_debug
bike| 22        | 22,26
book| 4         | 2,2,3,4,7,8,9
note| 11        | 11

其他回答

关心奇数的计数-给出中间两个值的平均值。

SELECT AVG(val) FROM
  ( SELECT x.id, x.val from data x, data y
      GROUP BY x.id, x.val
      HAVING SUM(SIGN(1-SIGN(IF(y.val-x.val=0 AND x.id != y.id, SIGN(x.id-y.id), y.val-x.val)))) IN (ROUND((COUNT(*))/2), ROUND((COUNT(*)+1)/2))
  ) sq

我没有将这个解决方案的性能与这里发布的其他答案进行比较,但我发现这个解决方案是最容易理解的,并且涵盖了计算中位数的全部数学公式。换句话说,这个解决方案对于偶数和奇数数据集足够健壮:

SELECT CASE 
-- odd-numbered data sets:
WHEN MOD(COUNT(*), 2) = 1 THEN (SELECT median.<value> AS median
FROM
(SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM <data>) t1,
       (SELECT COUNT(*) AS num_records FROM <data>) t2
 WHERE t1.rownum =(t2.num_records) / 2) as median)
-- even-numbered data sets:
ELSE (select (low_bound.<value> + up_bound.<value>) / 2 AS median
FROM
(SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM <data>) t1,
       (SELECT COUNT(*) AS num_records FROM <data>) t2
 WHERE t1.rownum =(t2.num_records - 1) / 2) as low_bound,
 (SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM station) t1,
       (SELECT COUNT(*) AS num_records FROM data) t2
 WHERE t1.rownum =(t2.num_records + 1) / 2) as up_bound)
END
FROM <data>

我发现接受的解决方案在我的MySQL安装上不起作用,返回一个空集,但这个查询在我测试的所有情况下都适用:

SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
LIMIT 1

如果MySQL有ROW_NUMBER,那么MEDIAN是(受SQL Server查询的启发):

WITH Numbered AS 
(
SELECT *, COUNT(*) OVER () AS Cnt,
    ROW_NUMBER() OVER (ORDER BY val) AS RowNum
FROM yourtable
)
SELECT id, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
;

如果您有偶数个条目,则使用IN。

如果你想找到每个组的中位数,那么只需要在你的OVER子句中PARTITION BY组。

Rob

下面的查询对于奇数行和偶数行都非常有效。在子查询中,我们正在寻找前后行数相同的值。对于奇数行的情况,having子句的值将为0(前后相同的行数将抵消符号)。

类似地,对于偶数行,having子句对于两行(中间的两行)的计算结果为1,因为它们(总的来说)前后的行数相同。

在外层查询中,我们将平均出单个值(奇数行)或(偶数行2个值)。

select avg(val) as median
from
(
    select d1.val
    from data d1 cross join data d2
    group by d1.val
    having abs(sum(sign(d1.val-d2.val))) in (0,1)
) sub

注意:如果你的表有重复的值,上面的having子句应该更改为下面的条件。在这种情况下,可能有一些值超出了原来的可能性(0,1)下面的条件将使这个条件动态,并在重复的情况下工作。

having sum(case when d1.val=d2.val then 1 else 0 end)>=
abs(sum(sign(d1.val-d2.val)))