用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

对val排序得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val) == 5。


当前回答

在某些情况下,中位数的计算如下:

“中位数”是数字列表中按值排序时的“中间”值。对于偶数集,中位数是两个中间值的平均值。 我为此创建了一个简单的代码:

$midValue = 0;
$rowCount = "SELECT count(*) as count {$from} {$where}";

$even = FALSE;
$offset = 1;
$medianRow = floor($rowCount / 2);
if ($rowCount % 2 == 0 && !empty($medianRow)) {
  $even = TRUE;
  $offset++;
  $medianRow--;
}

$medianValue = "SELECT column as median 
               {$fromClause} {$whereClause} 
               ORDER BY median 
               LIMIT {$medianRow},{$offset}";

$medianValDAO = db_query($medianValue);
while ($medianValDAO->fetch()) {
  if ($even) {
    $midValue = $midValue + $medianValDAO->median;
  }
  else {
    $median = $medianValDAO->median;
  }
}
if ($even) {
  $median = $midValue / 2;
}
return $median;

返回的$中位数将是所需的结果:-)

其他回答

通常,我们不仅需要为整个表计算Median,还需要为与ID相关的聚合计算Median。换句话说,计算表中每个ID的中位数,其中每个ID有许多记录。(良好的性能和工作在许多SQL +修复偶数和赔率的问题,更多关于不同的中值方法的性能https://sqlperformance.com/2012/08/t-sql-queries/median)

SELECT our_id, AVG(1.0 * our_val) as Median
FROM
( SELECT our_id, our_val, 
  COUNT(*) OVER (PARTITION BY our_id) AS cnt,
  ROW_NUMBER() OVER (PARTITION BY our_id ORDER BY our_val) AS rn
  FROM our_table
) AS x
WHERE rn IN ((cnt + 1)/2, (cnt + 2)/2) GROUP BY our_id;

希望能有所帮助

你可以使用这里的用户定义函数。

我发现接受的解决方案在我的MySQL安装上不起作用,返回一个空集,但这个查询在我测试的所有情况下都适用:

SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
LIMIT 1

我刚刚在网上的评论中找到了另一个答案:

对于几乎所有SQL中的中位数: SELECT x.val from data x, data y GROUP BY x.val 总和(符号(1-SIGN (y.val-x.val))) = (COUNT (*) + 1) / 2

确保列有良好的索引,并且索引用于筛选和排序。与解释计划核对。

select count(*) from table --find the number of rows

计算“中值”行号。可能使用:median_row = floor(count / 2)。

然后把它从列表中挑出来:

select val from table order by val asc limit median_row,1

这将返回您想要的值的一行。

MySQL文档中这一页的注释有以下建议:

-- (mostly) High Performance scaling MEDIAN function per group
-- Median defined in http://en.wikipedia.org/wiki/Median
--
-- by Peter Hlavac
-- 06.11.2008
--
-- Example Table:

DROP table if exists table_median;
CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
COMMIT;


INSERT INTO table_median (id, val) VALUES
(1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
(2, 4),
(3, 5), (3, 2),
(4, 5), (4, 12), (4, 1), (4, 7);



-- Calculating the MEDIAN
SELECT @a := 0;
SELECT
id,
AVG(val) AS MEDIAN
FROM (
SELECT
id,
val
FROM (
SELECT
-- Create an index n for every id
@a := (@a + 1) mod o.c AS shifted_n,
IF(@a mod o.c=0, o.c, @a) AS n,
o.id,
o.val,
-- the number of elements for every id
o.c
FROM (
SELECT
t_o.id,
val,
c
FROM
table_median t_o INNER JOIN
(SELECT
id,
COUNT(1) AS c
FROM
table_median
GROUP BY
id
) t2
ON (t2.id = t_o.id)
ORDER BY
t_o.id,val
) o
) a
WHERE
IF(
-- if there is an even number of elements
-- take the lower and the upper median
-- and use AVG(lower,upper)
c MOD 2 = 0,
n = c DIV 2 OR n = (c DIV 2)+1,

-- if its an odd number of elements
-- take the first if its only one element
-- or take the one in the middle
IF(
c = 1,
n = 1,
n = c DIV 2 + 1
)
)
) a
GROUP BY
id;

-- Explanation:
-- The Statement creates a helper table like
--
-- n id val count
-- ----------------
-- 1, 1, 1, 7
-- 2, 1, 3, 7
-- 3, 1, 4, 7
-- 4, 1, 5, 7
-- 5, 1, 6, 7
-- 6, 1, 7, 7
-- 7, 1, 8, 7
--
-- 1, 2, 4, 1

-- 1, 3, 2, 2
-- 2, 3, 5, 2
--
-- 1, 4, 1, 4
-- 2, 4, 5, 4
-- 3, 4, 7, 4
-- 4, 4, 12, 4


-- from there we can select the n-th element on the position: count div 2 + 1