用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

对val排序得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val) == 5。


当前回答

MySQL文档中这一页的注释有以下建议:

-- (mostly) High Performance scaling MEDIAN function per group
-- Median defined in http://en.wikipedia.org/wiki/Median
--
-- by Peter Hlavac
-- 06.11.2008
--
-- Example Table:

DROP table if exists table_median;
CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
COMMIT;


INSERT INTO table_median (id, val) VALUES
(1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
(2, 4),
(3, 5), (3, 2),
(4, 5), (4, 12), (4, 1), (4, 7);



-- Calculating the MEDIAN
SELECT @a := 0;
SELECT
id,
AVG(val) AS MEDIAN
FROM (
SELECT
id,
val
FROM (
SELECT
-- Create an index n for every id
@a := (@a + 1) mod o.c AS shifted_n,
IF(@a mod o.c=0, o.c, @a) AS n,
o.id,
o.val,
-- the number of elements for every id
o.c
FROM (
SELECT
t_o.id,
val,
c
FROM
table_median t_o INNER JOIN
(SELECT
id,
COUNT(1) AS c
FROM
table_median
GROUP BY
id
) t2
ON (t2.id = t_o.id)
ORDER BY
t_o.id,val
) o
) a
WHERE
IF(
-- if there is an even number of elements
-- take the lower and the upper median
-- and use AVG(lower,upper)
c MOD 2 = 0,
n = c DIV 2 OR n = (c DIV 2)+1,

-- if its an odd number of elements
-- take the first if its only one element
-- or take the one in the middle
IF(
c = 1,
n = 1,
n = c DIV 2 + 1
)
)
) a
GROUP BY
id;

-- Explanation:
-- The Statement creates a helper table like
--
-- n id val count
-- ----------------
-- 1, 1, 1, 7
-- 2, 1, 3, 7
-- 3, 1, 4, 7
-- 4, 1, 5, 7
-- 5, 1, 6, 7
-- 6, 1, 7, 7
-- 7, 1, 8, 7
--
-- 1, 2, 4, 1

-- 1, 3, 2, 2
-- 2, 3, 5, 2
--
-- 1, 4, 1, 4
-- 2, 4, 5, 4
-- 3, 4, 7, 4
-- 4, 4, 12, 4


-- from there we can select the n-th element on the position: count div 2 + 1 

其他回答

你也可以选择在存储过程中这样做:

DROP PROCEDURE IF EXISTS median;
DELIMITER //
CREATE PROCEDURE median (table_name VARCHAR(255), column_name VARCHAR(255), where_clause VARCHAR(255))
BEGIN
  -- Set default parameters
  IF where_clause IS NULL OR where_clause = '' THEN
    SET where_clause = 1;
  END IF;

  -- Prepare statement
  SET @sql = CONCAT(
    "SELECT AVG(middle_values) AS 'median' FROM (
      SELECT t1.", column_name, " AS 'middle_values' FROM
        (
          SELECT @row:=@row+1 as `row`, x.", column_name, "
          FROM ", table_name," AS x, (SELECT @row:=0) AS r
          WHERE ", where_clause, " ORDER BY x.", column_name, "
        ) AS t1,
        (
          SELECT COUNT(*) as 'count'
          FROM ", table_name, " x
          WHERE ", where_clause, "
        ) AS t2
        -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
        WHERE t1.row >= t2.count/2
          AND t1.row <= ((t2.count/2)+1)) AS t3
    ");

  -- Execute statement
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
END//
DELIMITER ;


-- Sample usage:
-- median(table_name, column_name, where_condition);
CALL median('products', 'price', NULL);

关心奇数的计数-给出中间两个值的平均值。

SELECT AVG(val) FROM
  ( SELECT x.id, x.val from data x, data y
      GROUP BY x.id, x.val
      HAVING SUM(SIGN(1-SIGN(IF(y.val-x.val=0 AND x.id != y.id, SIGN(x.id-y.id), y.val-x.val)))) IN (ROUND((COUNT(*))/2), ROUND((COUNT(*)+1)/2))
  ) sq

我使用下表的解决方案在MySQL:

CREATE TABLE transactions (
  transaction_id int , user_id int , merchant_name varchar(255), transaction_date date , amount int
);

INSERT INTO transactions (transaction_id, user_id, merchant_name, transaction_date, amount)  
VALUES (1, 1 ,'abc', '2015-08-17', 100),(2, 2, 'ced', '2015-2-17', 100),(3, 1, 'def', '2015-2-16', 121),
(4, 1 ,'ced', '2015-3-17', 110),(5, 1, 'ced', '2015-3-17', 150),(6, 2 ,'abc', '2015-4-17', 130), 
(7, 3 ,'ced', '2015-12-17', 10),(8, 3 ,'abc', '2015-8-17', 100),(9, 2 ,'abc', '2015-12-17', 140),(10, 1,'abc', '2015-9-17', 100),
(11, 1 ,'abc', '2015-08-17', 121),(12, 2 ,'ced', '2015-12-23', 130),(13, 1 ,'def', '2015-12-23', 13),(3, 4, 'abc', '2015-2-16', 120),(3, 4, 'def', '2015-2-16', 121),(3, 4, 'ced', '2015-2-16', 121);

计算“金额”列的中位数:

WITH Numbered AS 
(
SELECT *, COUNT(*) OVER () AS TotatRecords,
    ROW_NUMBER() OVER (ORDER BY amount) AS RowNum
FROM transactions
)
SELECT Avg(amount)
FROM Numbered
WHERE RowNum IN ( FLOOR((TotatRecords+1)/2), FLOOR((TotatRecords+2)/2) )
;

TotalRecords = 16 and Median = 120.5000

此查询将适用于两种情况,即偶数和奇数记录。

MySQL从8.0版本开始支持窗口函数,您可以使用ROW_NUMBER或DENSE_RANK(不要使用RANK,因为它将相同的RANK分配给相同的值,就像在体育排名):

SELECT AVG(t1.val) AS median_val
  FROM (SELECT val, 
               ROW_NUMBER() OVER(ORDER BY val) AS rownum
          FROM data) t1,
       (SELECT COUNT(*) AS num_records FROM data) t2
 WHERE t1.row_num IN
       (FLOOR((t2.num_records + 1) / 2), 
        FLOOR((t2.num_records + 2) / 2));

我发现这个答案非常有用——https://www.eversql.com/how-to-calculate-median-value-in-mysql-using-a-simple-sql-query/

SET @rowindex := -1;

SELECT
   AVG(g.grade)
FROM
   (SELECT @rowindex:=@rowindex + 1 AS rowindex,
       grades.grade AS grade
    FROM grades
    ORDER BY grades.grade) AS g
WHERE
g.rowindex IN (FLOOR(@rowindex / 2) , CEIL(@rowindex / 2));