用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

对val排序得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val) == 5。


当前回答

按维度分组的中位数:

SELECT your_dimension, avg(t1.val) as median_val FROM (
SELECT @rownum:=@rownum+1 AS `row_number`,
   IF(@dim <> d.your_dimension, @rownum := 0, NULL),
   @dim := d.your_dimension AS your_dimension,
   d.val
   FROM data d,  (SELECT @rownum:=0) r, (SELECT @dim := 'something_unreal') d
  WHERE 1
  -- put some where clause here
  ORDER BY d.your_dimension, d.val
) as t1
INNER JOIN  
(
  SELECT d.your_dimension,
    count(*) as total_rows
  FROM data d
  WHERE 1
  -- put same where clause here
  GROUP BY d.your_dimension
) as t2 USING(your_dimension)
WHERE 1
AND t1.row_number in ( floor((total_rows+1)/2), floor((total_rows+2)/2) )

GROUP BY your_dimension;

其他回答

MySQL文档中这一页的注释有以下建议:

-- (mostly) High Performance scaling MEDIAN function per group
-- Median defined in http://en.wikipedia.org/wiki/Median
--
-- by Peter Hlavac
-- 06.11.2008
--
-- Example Table:

DROP table if exists table_median;
CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
COMMIT;


INSERT INTO table_median (id, val) VALUES
(1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
(2, 4),
(3, 5), (3, 2),
(4, 5), (4, 12), (4, 1), (4, 7);



-- Calculating the MEDIAN
SELECT @a := 0;
SELECT
id,
AVG(val) AS MEDIAN
FROM (
SELECT
id,
val
FROM (
SELECT
-- Create an index n for every id
@a := (@a + 1) mod o.c AS shifted_n,
IF(@a mod o.c=0, o.c, @a) AS n,
o.id,
o.val,
-- the number of elements for every id
o.c
FROM (
SELECT
t_o.id,
val,
c
FROM
table_median t_o INNER JOIN
(SELECT
id,
COUNT(1) AS c
FROM
table_median
GROUP BY
id
) t2
ON (t2.id = t_o.id)
ORDER BY
t_o.id,val
) o
) a
WHERE
IF(
-- if there is an even number of elements
-- take the lower and the upper median
-- and use AVG(lower,upper)
c MOD 2 = 0,
n = c DIV 2 OR n = (c DIV 2)+1,

-- if its an odd number of elements
-- take the first if its only one element
-- or take the one in the middle
IF(
c = 1,
n = 1,
n = c DIV 2 + 1
)
)
) a
GROUP BY
id;

-- Explanation:
-- The Statement creates a helper table like
--
-- n id val count
-- ----------------
-- 1, 1, 1, 7
-- 2, 1, 3, 7
-- 3, 1, 4, 7
-- 4, 1, 5, 7
-- 5, 1, 6, 7
-- 6, 1, 7, 7
-- 7, 1, 8, 7
--
-- 1, 2, 4, 1

-- 1, 3, 2, 2
-- 2, 3, 5, 2
--
-- 1, 4, 1, 4
-- 2, 4, 5, 4
-- 3, 4, 7, 4
-- 4, 4, 12, 4


-- from there we can select the n-th element on the position: count div 2 + 1 

不幸的是,无论是TheJacobTaylor还是velcrow的答案都不会返回当前版本MySQL的准确结果。

从上面来看,魔术贴的答案是接近的,但它不能正确计算具有偶数行数的结果集。中位数定义为1)奇数集上的中间数,或2)偶数集上两个中间数的平均值。

所以,这里是魔术贴的解决方案修补处理奇数和偶数集:

SELECT AVG(middle_values) AS 'median' FROM (
  SELECT t1.median_column AS 'middle_values' FROM
    (
      SELECT @row:=@row+1 as `row`, x.median_column
      FROM median_table AS x, (SELECT @row:=0) AS r
      WHERE 1
      -- put some where clause here
      ORDER BY x.median_column
    ) AS t1,
    (
      SELECT COUNT(*) as 'count'
      FROM median_table x
      WHERE 1
      -- put same where clause here
    ) AS t2
    -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
    WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;

要使用它,请遵循以下3个简单步骤:

将上面代码中的“median_table”(出现2次)替换为您的表名 将“median_column”(3次)替换为您希望为其查找中位数的列名 如果你有一个WHERE条件,用WHERE条件替换“WHERE 1”(2次)

下面的SQL代码将帮助您使用用户定义的变量来计算MySQL中的中位数。

create table employees(salary int); insert into employees values(8); insert into employees values(23); insert into employees values(45); insert into employees values(123); insert into employees values(93); insert into employees values(2342); insert into employees values(2238); select * from employees; Select salary from employees order by salary; set @rowid=0; set @cnt=(select count(*) from employees); set @middle_no=ceil(@cnt/2); set @odd_even=null; select AVG(salary) from (select salary,@rowid:=@rowid+1 as rid, (CASE WHEN(mod(@cnt,2)=0) THEN @odd_even:=1 ELSE @odd_even:=0 END) as odd_even_status from employees order by salary) as tbl where tbl.rid=@middle_no or tbl.rid=(@middle_no+@odd_even);

如果你正在寻找详细的解释,请参考这个博客。

来自: http://mdb-blog.blogspot.com/2015/06/mysql-find-median-nth-element-without.html

我建议另一种方法,不用连接, 但是处理字符串

我没有用大数据表来检验, 但是中小型的桌子就可以了。

这里的好处是,它也可以通过GROUPING工作,所以它可以返回几个项目的中值。

下面是测试表的测试代码:

DROP TABLE test.test_median
CREATE TABLE test.test_median AS
SELECT 'book' AS grp, 4 AS val UNION ALL
SELECT 'book', 7 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 9 UNION ALL
SELECT 'book', 8 UNION ALL
SELECT 'book', 3 UNION ALL

SELECT 'note', 11 UNION ALL

SELECT 'bike', 22 UNION ALL
SELECT 'bike', 26 

求每组中位数的代码:

SELECT grp,
         SUBSTRING_INDEX( SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val), ',', COUNT(*)/2 ), ',', -1) as the_median,
         GROUP_CONCAT(val ORDER BY val) as all_vals_for_debug
FROM test.test_median
GROUP BY grp

输出:

grp | the_median| all_vals_for_debug
bike| 22        | 22,26
book| 4         | 2,2,3,4,7,8,9
note| 11        | 11

在阅读了所有之前的内容后,它们与我的实际需求不匹配,所以我实现了自己的一个不需要任何过程或复杂的语句,只是我GROUP_CONCAT所有来自我想要获得MEDIAN的列的值,并应用COUNT DIV BY 2,我从列表中间提取值,就像下面的查询一样:

(POS是我想要获得其中位数的列的名称)

(query) SELECT
SUBSTRING_INDEX ( 
   SUBSTRING_INDEX ( 
       GROUP_CONCAT(pos ORDER BY CAST(pos AS SIGNED INTEGER) desc SEPARATOR ';') 
    , ';', COUNT(*)/2 ) 
, ';', -1 ) AS `pos_med`
FROM table_name
GROUP BY any_criterial

我希望这能对一些人有用,就像这个网站上的许多其他评论对我一样。