用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

对val排序得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val) == 5。


当前回答

根据魔术贴的答案,对于那些必须根据另一个参数分组的东西做中位数的人来说

SELECT grp_field, t1。val FROM ( SELECT grp_field, @rownum:=IF(@s = grp_field, @rownum + 1,0) AS row_number, @s:=IF(@s = grp_field, @s, grp_field) AS sec, d.val FROM data d, (SELECT @rownum:=0, @s:=0 ORDER BY grp_field, d.val )作为t1 JOIN ( SELECT grp_field, count(*)为total_rows 数据d GROUP BY grp_field )为t2 在t1。Grp_field = t2.grp_field 在t1.row_number =地板(total_rows / 2) + 1;

其他回答

不幸的是,无论是TheJacobTaylor还是velcrow的答案都不会返回当前版本MySQL的准确结果。

从上面来看,魔术贴的答案是接近的,但它不能正确计算具有偶数行数的结果集。中位数定义为1)奇数集上的中间数,或2)偶数集上两个中间数的平均值。

所以,这里是魔术贴的解决方案修补处理奇数和偶数集:

SELECT AVG(middle_values) AS 'median' FROM (
  SELECT t1.median_column AS 'middle_values' FROM
    (
      SELECT @row:=@row+1 as `row`, x.median_column
      FROM median_table AS x, (SELECT @row:=0) AS r
      WHERE 1
      -- put some where clause here
      ORDER BY x.median_column
    ) AS t1,
    (
      SELECT COUNT(*) as 'count'
      FROM median_table x
      WHERE 1
      -- put same where clause here
    ) AS t2
    -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
    WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;

要使用它,请遵循以下3个简单步骤:

将上面代码中的“median_table”(出现2次)替换为您的表名 将“median_column”(3次)替换为您希望为其查找中位数的列名 如果你有一个WHERE条件,用WHERE条件替换“WHERE 1”(2次)

你可以使用这里的用户定义函数。

下面的查询对于奇数行和偶数行都非常有效。在子查询中,我们正在寻找前后行数相同的值。对于奇数行的情况,having子句的值将为0(前后相同的行数将抵消符号)。

类似地,对于偶数行,having子句对于两行(中间的两行)的计算结果为1,因为它们(总的来说)前后的行数相同。

在外层查询中,我们将平均出单个值(奇数行)或(偶数行2个值)。

select avg(val) as median
from
(
    select d1.val
    from data d1 cross join data d2
    group by d1.val
    having abs(sum(sign(d1.val-d2.val))) in (0,1)
) sub

注意:如果你的表有重复的值,上面的having子句应该更改为下面的条件。在这种情况下,可能有一些值超出了原来的可能性(0,1)下面的条件将使这个条件动态,并在重复的情况下工作。

having sum(case when d1.val=d2.val then 1 else 0 end)>=
abs(sum(sign(d1.val-d2.val)))

您可以使用窗口函数row_number()来回答查询以找到介质

select val 
from (select val, row_number() over (order by val) as rownumber, x.cnt 
from data, (select count(*) as cnt from data) x) abc
where rownumber=ceil(cnt/2);

MySQL文档中这一页的注释有以下建议:

-- (mostly) High Performance scaling MEDIAN function per group
-- Median defined in http://en.wikipedia.org/wiki/Median
--
-- by Peter Hlavac
-- 06.11.2008
--
-- Example Table:

DROP table if exists table_median;
CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
COMMIT;


INSERT INTO table_median (id, val) VALUES
(1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
(2, 4),
(3, 5), (3, 2),
(4, 5), (4, 12), (4, 1), (4, 7);



-- Calculating the MEDIAN
SELECT @a := 0;
SELECT
id,
AVG(val) AS MEDIAN
FROM (
SELECT
id,
val
FROM (
SELECT
-- Create an index n for every id
@a := (@a + 1) mod o.c AS shifted_n,
IF(@a mod o.c=0, o.c, @a) AS n,
o.id,
o.val,
-- the number of elements for every id
o.c
FROM (
SELECT
t_o.id,
val,
c
FROM
table_median t_o INNER JOIN
(SELECT
id,
COUNT(1) AS c
FROM
table_median
GROUP BY
id
) t2
ON (t2.id = t_o.id)
ORDER BY
t_o.id,val
) o
) a
WHERE
IF(
-- if there is an even number of elements
-- take the lower and the upper median
-- and use AVG(lower,upper)
c MOD 2 = 0,
n = c DIV 2 OR n = (c DIV 2)+1,

-- if its an odd number of elements
-- take the first if its only one element
-- or take the one in the middle
IF(
c = 1,
n = 1,
n = c DIV 2 + 1
)
)
) a
GROUP BY
id;

-- Explanation:
-- The Statement creates a helper table like
--
-- n id val count
-- ----------------
-- 1, 1, 1, 7
-- 2, 1, 3, 7
-- 3, 1, 4, 7
-- 4, 1, 5, 7
-- 5, 1, 6, 7
-- 6, 1, 7, 7
-- 7, 1, 8, 7
--
-- 1, 2, 4, 1

-- 1, 3, 2, 2
-- 2, 3, 5, 2
--
-- 1, 4, 1, 4
-- 2, 4, 5, 4
-- 3, 4, 7, 4
-- 4, 4, 12, 4


-- from there we can select the n-th element on the position: count div 2 + 1