用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

对val排序得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val) == 5。


当前回答

create table med(id integer);
insert into med(id) values(1);
insert into med(id) values(2);
insert into med(id) values(3);
insert into med(id) values(4);
insert into med(id) values(5);
insert into med(id) values(6);

select (MIN(count)+MAX(count))/2 from 
(select case when (select count(*) from 
med A where A.id<B.id)=(select count(*)/2 from med) OR 
(select count(*) from med A where A.id>B.id)=(select count(*)/2 
from med) then cast(B.id as float)end as count from med B) C;

 ?column? 
----------
  3.5
(1 row)

OR

select cast(avg(id) as float) from 
(select t1.id from med t1 JOIN med t2 on t1.id!= t2.id 
group by t1.id having ABS(SUM(SIGN(t1.id-t2.id)))=1) A;

其他回答

安装和使用本mysql统计函数:http://www.xarg.org/2012/07/statistical-functions-in-mysql/

之后,计算中值就很简单了:

SELECT median(val) FROM data;

这些方法从同一个表中选择两次。如果源数据来自一个昂贵的查询,这是一种避免运行两次的方法:

select KEY_FIELD, AVG(VALUE_FIELD) MEDIAN_VALUE
from (
    select KEY_FIELD, VALUE_FIELD, RANKF
    , @rownumr := IF(@prevrowidr=KEY_FIELD,@rownumr+1,1) RANKR
    , @prevrowidr := KEY_FIELD
    FROM (
        SELECT KEY_FIELD, VALUE_FIELD, RANKF
        FROM (
            SELECT KEY_FIELD, VALUE_FIELD 
            , @rownumf := IF(@prevrowidf=KEY_FIELD,@rownumf+1,1) RANKF
            , @prevrowidf := KEY_FIELD     
            FROM (
                SELECT KEY_FIELD, VALUE_FIELD 
                FROM (
                    -- some expensive query
                )   B
                ORDER BY  KEY_FIELD, VALUE_FIELD
            ) C
            , (SELECT @rownumf := 1) t_rownum
            , (SELECT @prevrowidf := '*') t_previd
        ) D
        ORDER BY  KEY_FIELD, RANKF DESC
    ) E
    , (SELECT @rownumr := 1) t_rownum
    , (SELECT @prevrowidr := '*') t_previd
) F
WHERE RANKF-RANKR BETWEEN -1 and 1
GROUP BY KEY_FIELD

我没有将这个解决方案的性能与这里发布的其他答案进行比较,但我发现这个解决方案是最容易理解的,并且涵盖了计算中位数的全部数学公式。换句话说,这个解决方案对于偶数和奇数数据集足够健壮:

SELECT CASE 
-- odd-numbered data sets:
WHEN MOD(COUNT(*), 2) = 1 THEN (SELECT median.<value> AS median
FROM
(SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM <data>) t1,
       (SELECT COUNT(*) AS num_records FROM <data>) t2
 WHERE t1.rownum =(t2.num_records) / 2) as median)
-- even-numbered data sets:
ELSE (select (low_bound.<value> + up_bound.<value>) / 2 AS median
FROM
(SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM <data>) t1,
       (SELECT COUNT(*) AS num_records FROM <data>) t2
 WHERE t1.rownum =(t2.num_records - 1) / 2) as low_bound,
 (SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM station) t1,
       (SELECT COUNT(*) AS num_records FROM data) t2
 WHERE t1.rownum =(t2.num_records + 1) / 2) as up_bound)
END
FROM <data>

按维度分组的中位数:

SELECT your_dimension, avg(t1.val) as median_val FROM (
SELECT @rownum:=@rownum+1 AS `row_number`,
   IF(@dim <> d.your_dimension, @rownum := 0, NULL),
   @dim := d.your_dimension AS your_dimension,
   d.val
   FROM data d,  (SELECT @rownum:=0) r, (SELECT @dim := 'something_unreal') d
  WHERE 1
  -- put some where clause here
  ORDER BY d.your_dimension, d.val
) as t1
INNER JOIN  
(
  SELECT d.your_dimension,
    count(*) as total_rows
  FROM data d
  WHERE 1
  -- put same where clause here
  GROUP BY d.your_dimension
) as t2 USING(your_dimension)
WHERE 1
AND t1.row_number in ( floor((total_rows+1)/2), floor((total_rows+2)/2) )

GROUP BY your_dimension;

上面的大多数解决方案只适用于表中的一个字段,您可能需要获得查询中多个字段的中位数(第50百分位数)。

我用这个:

SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
 GROUP_CONCAT(field_name ORDER BY field_name SEPARATOR ','),
  ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `Median`
FROM table_name;

你可以将上面例子中的“50”替换为任何百分位数,这是非常有效的。

只要确保你有足够的内存给GROUP_CONCAT,你可以改变它:

SET group_concat_max_len = 10485760; #10MB max length

详情:http://web.performancerasta.com/metrics-tips-calculating-95th-99th-or-any-percentile-with-single-mysql-query/