用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

对val排序得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val) == 5。


当前回答

对于一个表站和列lat_n,下面是MySQL代码来获得中位数:

set @rows := (select count(1) from station);
set @v1 := 0;
set @sql1 := concat('select lat_n into @v1 from station order by lat_n asc limit 1 offset ', ceil(@rows/2) - 1);
prepare statement1 from @sql1;
execute statement1;
set @v2 := 0;
set @sql2 := concat('select lat_n into @v2 from station order by lat_n asc limit 1 offset ', ceil((@rows + 1)/2) - 1);
prepare statement2 from @sql2;
execute statement2;
select (@v1 + @v2)/2;

其他回答

您可以使用窗口函数row_number()来回答查询以找到介质

select val 
from (select val, row_number() over (order by val) as rownumber, x.cnt 
from data, (select count(*) as cnt from data) x) abc
where rownumber=ceil(cnt/2);

MariaDB / MySQL:

SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
  FROM data d, (SELECT @rownum:=0) r
  WHERE d.val is NOT NULL
  -- put some where clause here
  ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );

Steve Cohen指出,在第一次传递之后,@rownum将包含总行数。这可用于确定中值,因此不需要第二次传递或连接。

此外,AVG(dd.val)和dd.row_number IN(…)用于在有偶数条记录时正确地产生中位数。推理:

SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3

最后,MariaDB 10.3.3+包含一个MEDIAN函数

我建议一个更快的方法。

获取行数:

SELECT CEIL(COUNT(*)/2);

然后取排序子查询的中间值:

SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue);

我用5x10e6的随机数数据集进行了测试,它将在10秒内找到中位数。

我没有将这个解决方案的性能与这里发布的其他答案进行比较,但我发现这个解决方案是最容易理解的,并且涵盖了计算中位数的全部数学公式。换句话说,这个解决方案对于偶数和奇数数据集足够健壮:

SELECT CASE 
-- odd-numbered data sets:
WHEN MOD(COUNT(*), 2) = 1 THEN (SELECT median.<value> AS median
FROM
(SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM <data>) t1,
       (SELECT COUNT(*) AS num_records FROM <data>) t2
 WHERE t1.rownum =(t2.num_records) / 2) as median)
-- even-numbered data sets:
ELSE (select (low_bound.<value> + up_bound.<value>) / 2 AS median
FROM
(SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM <data>) t1,
       (SELECT COUNT(*) AS num_records FROM <data>) t2
 WHERE t1.rownum =(t2.num_records - 1) / 2) as low_bound,
 (SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM station) t1,
       (SELECT COUNT(*) AS num_records FROM data) t2
 WHERE t1.rownum =(t2.num_records + 1) / 2) as up_bound)
END
FROM <data>

知道确切的行数,你可以使用这个查询:

SELECT <value> AS VAL FROM <table> ORDER BY VAL LIMIT 1 OFFSET <half>

Where <half> = ceiling(<size> / 2.0) - 1