用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

对val排序得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val) == 5。


当前回答

如果这是MySQL,现在有窗口函数,你可以这样做(假设你想四舍五入到最接近的整数-否则只需将round替换为CEIL或FLOOR或其他什么)。下面的解决方案适用于表,无论表的行数是偶数还是奇数:


WITH CTE AS (
    SELECT val,
            ROW_NUMBER() OVER (ORDER BY val ASC) AS rn,
            COUNT(*) OVER () AS total_count
    FROM data
)
SELECT ROUND(AVG(val)) AS median
FROM CTE
WHERE
    rn BETWEEN
    total_count / 2.0 AND
    total_count / 2.0 + 1;

I think some of the more recent answers on this thread were already getting at this approach, but it also seemed like people were overthinking it, so consider this an improved version. Regardless of SQL flavor, there is no reason anyone should be writing a huge paragraph of code with multiple subqueries just to get the median in 2021. However, please note that the above query only works if you're asked to find the median for a continuous series. Of course, regardless of row number, sometimes people do make a distinction between what is referred to as the Discrete Median and what is referred to as the Interpolated Median for a continuous series.

如果你被要求为一个离散级数找到中位数,而表的行数是偶数,那么上面的解决方案就不适合你,你应该恢复使用其他解决方案之一,比如TheJacobTaylor的。

下面的第二个解决方案是对TheJacobTaylor的稍微修改的版本,其中我显式地声明了CROSS JOIN。这个方法也适用于行数为奇数的表,不管你是被要求求连续序列的中位数还是离散序列的中位数,但我特别会在被要求求离散序列的中位数时使用这个方法。否则,使用第一种解决方案。这样,您就永远不必考虑数据是包含“偶数”还是“奇数”个数的数据点。


SELECT x.val AS median
FROM data x
CROSS JOIN data y
GROUP BY x.val
HAVING SUM(SIGN(1 - SIGN(y.val - x.val))) = (COUNT(*) + 1) / 2;

最后,你可以在PostgreSQL中使用内置函数轻松做到这一点。这里有一个很好的解释,以及关于离散中位数和插值中位数的有效总结。

https://leafo.net/guides/postgresql-calculating-percentile.html#calculating-the-median

其他回答

来自: http://mdb-blog.blogspot.com/2015/06/mysql-find-median-nth-element-without.html

我建议另一种方法,不用连接, 但是处理字符串

我没有用大数据表来检验, 但是中小型的桌子就可以了。

这里的好处是,它也可以通过GROUPING工作,所以它可以返回几个项目的中值。

下面是测试表的测试代码:

DROP TABLE test.test_median
CREATE TABLE test.test_median AS
SELECT 'book' AS grp, 4 AS val UNION ALL
SELECT 'book', 7 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 9 UNION ALL
SELECT 'book', 8 UNION ALL
SELECT 'book', 3 UNION ALL

SELECT 'note', 11 UNION ALL

SELECT 'bike', 22 UNION ALL
SELECT 'bike', 26 

求每组中位数的代码:

SELECT grp,
         SUBSTRING_INDEX( SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val), ',', COUNT(*)/2 ), ',', -1) as the_median,
         GROUP_CONCAT(val ORDER BY val) as all_vals_for_debug
FROM test.test_median
GROUP BY grp

输出:

grp | the_median| all_vals_for_debug
bike| 22        | 22,26
book| 4         | 2,2,3,4,7,8,9
note| 11        | 11

我刚刚在网上的评论中找到了另一个答案:

对于几乎所有SQL中的中位数: SELECT x.val from data x, data y GROUP BY x.val 总和(符号(1-SIGN (y.val-x.val))) = (COUNT (*) + 1) / 2

确保列有良好的索引,并且索引用于筛选和排序。与解释计划核对。

select count(*) from table --find the number of rows

计算“中值”行号。可能使用:median_row = floor(count / 2)。

然后把它从列表中挑出来:

select val from table order by val asc limit median_row,1

这将返回您想要的值的一行。

ORACLE的简单解决方案:

SELECT ROUND(MEDIAN(Lat_N), 4) FROM Station;

简单的解决方案,理解MySQL:

select case MOD(count(lat_n),2) 
when 1 then (select round(S.LAT_N,4) from station S where (select count(Lat_N) from station where Lat_N < S.LAT_N ) = (select count(Lat_N) from station where Lat_N > S.LAT_N))
else (select round(AVG(S.LAT_N),4) from station S where 1 = (select count(Lat_N) from station where Lat_N < S.LAT_N ) - (select count(Lat_N) from station where Lat_N > S.LAT_N))
end from station;

解释

STATION是表名。LAT_N是具有数值的列名

假设站表中有101条记录(奇数)。这意味着如果表以asc或desc排序,则中位数是第51条记录。

In above query for every S.LAT_N of S table I am creating two tables. One for number of LAT_N values less than S.LAT_N and another for number of LAT_N values greater than S.LAT_N. Later I am comparing these two tables and if they are matched then I am selecting that S.LAT_N value. When I check for 51st records there are 50 values less than 51st record and there 50 records greater than 51st record. As you see, there are 50 records in both tables. So this is our answer. For every other record there are different number of records in two tables created for comparison. So, only 51st record meets the condition.

现在假设站表中有100条记录(偶数)。这意味着如果表以asc或desc排序,则中位数是第50条和第51条记录的平均值。

Same as odd logic I am creating two tables. One for number of LAT_N values less than S.LAT_N and another for number of LAT_N values greater than S.LAT_N. Later I am comparing these two tables and if their difference is equal to 1 then I am selecting that S.LAT_N value and find the average. When I check for 50th records there are 49 values less than 50th record and there 51 records greater than 50th record. As you see, there is difference of 1 record in both tables. So this(50th record) is our 1st record for average. Similarly, When I check for 51st records there are 50 values less than 51st record and there 49 records greater than 51st record. As you see, there is difference of 1 record in both tables. So this(51st record) is our 2nd record for average. For every other record there are different number of records in two tables created for comparison. So, only 50th and 51st records meet the condition.

在某些情况下,中位数的计算如下:

“中位数”是数字列表中按值排序时的“中间”值。对于偶数集,中位数是两个中间值的平均值。 我为此创建了一个简单的代码:

$midValue = 0;
$rowCount = "SELECT count(*) as count {$from} {$where}";

$even = FALSE;
$offset = 1;
$medianRow = floor($rowCount / 2);
if ($rowCount % 2 == 0 && !empty($medianRow)) {
  $even = TRUE;
  $offset++;
  $medianRow--;
}

$medianValue = "SELECT column as median 
               {$fromClause} {$whereClause} 
               ORDER BY median 
               LIMIT {$medianRow},{$offset}";

$medianValDAO = db_query($medianValue);
while ($medianValDAO->fetch()) {
  if ($even) {
    $midValue = $midValue + $medianValDAO->median;
  }
  else {
    $median = $medianValDAO->median;
  }
}
if ($even) {
  $median = $midValue / 2;
}
return $median;

返回的$中位数将是所需的结果:-)

如果MySQL有ROW_NUMBER,那么MEDIAN是(受SQL Server查询的启发):

WITH Numbered AS 
(
SELECT *, COUNT(*) OVER () AS Cnt,
    ROW_NUMBER() OVER (ORDER BY val) AS RowNum
FROM yourtable
)
SELECT id, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
;

如果您有偶数个条目,则使用IN。

如果你想找到每个组的中位数,那么只需要在你的OVER子句中PARTITION BY组。

Rob