用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

对val排序得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val) == 5。


当前回答

通常,我们不仅需要为整个表计算Median,还需要为与ID相关的聚合计算Median。换句话说,计算表中每个ID的中位数,其中每个ID有许多记录。(良好的性能和工作在许多SQL +修复偶数和赔率的问题,更多关于不同的中值方法的性能https://sqlperformance.com/2012/08/t-sql-queries/median)

SELECT our_id, AVG(1.0 * our_val) as Median
FROM
( SELECT our_id, our_val, 
  COUNT(*) OVER (PARTITION BY our_id) AS cnt,
  ROW_NUMBER() OVER (PARTITION BY our_id ORDER BY our_val) AS rn
  FROM our_table
) AS x
WHERE rn IN ((cnt + 1)/2, (cnt + 2)/2) GROUP BY our_id;

希望能有所帮助

其他回答

ORACLE的简单解决方案:

SELECT ROUND(MEDIAN(Lat_N), 4) FROM Station;

简单的解决方案,理解MySQL:

select case MOD(count(lat_n),2) 
when 1 then (select round(S.LAT_N,4) from station S where (select count(Lat_N) from station where Lat_N < S.LAT_N ) = (select count(Lat_N) from station where Lat_N > S.LAT_N))
else (select round(AVG(S.LAT_N),4) from station S where 1 = (select count(Lat_N) from station where Lat_N < S.LAT_N ) - (select count(Lat_N) from station where Lat_N > S.LAT_N))
end from station;

解释

STATION是表名。LAT_N是具有数值的列名

假设站表中有101条记录(奇数)。这意味着如果表以asc或desc排序,则中位数是第51条记录。

In above query for every S.LAT_N of S table I am creating two tables. One for number of LAT_N values less than S.LAT_N and another for number of LAT_N values greater than S.LAT_N. Later I am comparing these two tables and if they are matched then I am selecting that S.LAT_N value. When I check for 51st records there are 50 values less than 51st record and there 50 records greater than 51st record. As you see, there are 50 records in both tables. So this is our answer. For every other record there are different number of records in two tables created for comparison. So, only 51st record meets the condition.

现在假设站表中有100条记录(偶数)。这意味着如果表以asc或desc排序,则中位数是第50条和第51条记录的平均值。

Same as odd logic I am creating two tables. One for number of LAT_N values less than S.LAT_N and another for number of LAT_N values greater than S.LAT_N. Later I am comparing these two tables and if their difference is equal to 1 then I am selecting that S.LAT_N value and find the average. When I check for 50th records there are 49 values less than 50th record and there 51 records greater than 50th record. As you see, there is difference of 1 record in both tables. So this(50th record) is our 1st record for average. Similarly, When I check for 51st records there are 50 values less than 51st record and there 49 records greater than 51st record. As you see, there is difference of 1 record in both tables. So this(51st record) is our 2nd record for average. For every other record there are different number of records in two tables created for comparison. So, only 50th and 51st records meet the condition.

我发现接受的解决方案在我的MySQL安装上不起作用,返回一个空集,但这个查询在我测试的所有情况下都适用:

SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
LIMIT 1

这是我的办法。当然,你可以把它放到一个过程中:-)

SET @median_counter = (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`);

SET @median = CONCAT('SELECT `val` FROM `data` ORDER BY `val` LIMIT ', @median_counter, ', 1');

PREPARE median FROM @median;

EXECUTE median;

你可以避免变量@median_counter,如果你替换它:

SET @median = CONCAT( 'SELECT `val` FROM `data` ORDER BY `val` LIMIT ',
                      (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`),
                      ', 1'
                    );

PREPARE median FROM @median;

EXECUTE median;

安装和使用本mysql统计函数:http://www.xarg.org/2012/07/statistical-functions-in-mysql/

之后,计算中值就很简单了:

SELECT median(val) FROM data;

因为我只需要一个中位数和百分位数的解决方案,我根据这个线程中的发现做了一个简单而相当灵活的函数。我知道,如果我发现“现成的”功能很容易包含在我的项目中,我自己会很高兴,所以我决定快速分享:

function mysql_percentile($table, $column, $where, $percentile = 0.5) {

    $sql = "
            SELECT `t1`.`".$column."` as `percentile` FROM (
            SELECT @rownum:=@rownum+1 as `row_number`, `d`.`".$column."`
              FROM `".$table."` `d`,  (SELECT @rownum:=0) `r`
              ".$where."
              ORDER BY `d`.`".$column."`
            ) as `t1`, 
            (
              SELECT count(*) as `total_rows`
              FROM `".$table."` `d`
              ".$where."
            ) as `t2`
            WHERE 1
            AND `t1`.`row_number`=floor(`total_rows` * ".$percentile.")+1;
        ";

    $result = sql($sql, 1);

    if (!empty($result)) {
        return $result['percentile'];       
    } else {
        return 0;
    }

}

使用非常简单,例子来自我目前的项目:

...
$table = DBPRE."zip_".$slug;
$column = 'seconds';
$where = "WHERE `reached` = '1' AND `time` >= '".$start_time."'";

    $reaching['median'] = mysql_percentile($table, $column, $where, 0.5);
    $reaching['percentile25'] = mysql_percentile($table, $column, $where, 0.25);
    $reaching['percentile75'] = mysql_percentile($table, $column, $where, 0.75);
...