根据MSDN, Median在Transact-SQL中不能作为聚合函数使用。但是,我想知道是否可以创建此功能(使用create Aggregate函数、用户定义函数或其他方法)。

最好的方法(如果可能的话)是什么——允许在聚合查询中计算中值(假设是数值数据类型)?


当前回答

如果你使用的是SQL 2005或更好的版本,这是一个很好的,简单的中位数计算表中的单列:

SELECT
(
 (SELECT MAX(Score) FROM
   (SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score) AS BottomHalf)
 +
 (SELECT MIN(Score) FROM
   (SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score DESC) AS TopHalf)
) / 2 AS Median

其他回答

使用COUNT聚合, 首先可以计算有多少行,并存储在一个名为@cnt的变量中。然后 你可以计算OFFSET-FETCH过滤器的参数来指定,基于数量排序, 要跳过多少行(偏移值)和筛选多少行(获取值)。

行数 跳过是(@cnt - 1) / 2。很明显,对于奇数,这个计算是正确的,因为 首先对单个中间值减去1,然后再除以2。

这也适用于偶数计数,因为表达式中使用的除法是 整数除法;所以,当一个偶数减去1时,你得到的是一个奇数。

When dividing that odd value by 2, the fraction part of the result (.5) is truncated. The number of rows to fetch is 2 - (@cnt % 2). The idea is that when the count is odd the result of the modulo operation is 1, and you need to fetch 1 row. When the count is even the result of the modulo operation is 0, and you need to fetch 2 rows. By subtracting the 1 or 0 result of the modulo operation from 2, you get the desired 1 or 2, respectively. Finally, to compute the median quantity, take the one or two result quantities, and apply an average after converting the input integer value to a numeric one as follows:

DECLARE @cnt AS INT = (SELECT COUNT(*) FROM [Sales].[production].[stocks]);
SELECT AVG(1.0 * quantity) AS median
FROM ( SELECT quantity
FROM [Sales].[production].[stocks]
ORDER BY quantity
OFFSET (@cnt - 1) / 2 ROWS FETCH NEXT 2 - @cnt % 2 ROWS ONLY ) AS D;

我最初的回答是:

select  max(my_column) as [my_column], quartile
from    (select my_column, ntile(4) over (order by my_column) as [quartile]
         from   my_table) i
--where quartile = 2
group by quartile

这将使您一举获得中位数和四分位范围。如果你真的只想要一行作为中值,那么取消注释where子句。

当你把它放入解释计划时,60%的工作是对数据进行排序,这在计算像这样的位置依赖统计数据时是不可避免的。

我修改了答案,以遵循Robert Ševčík-Robajz在下面的评论中提出的优秀建议:

;with PartitionedData as
  (select my_column, ntile(10) over (order by my_column) as [percentile]
   from   my_table),
MinimaAndMaxima as
  (select  min(my_column) as [low], max(my_column) as [high], percentile
   from    PartitionedData
   group by percentile)
select
  case
    when b.percentile = 10 then cast(b.high as decimal(18,2))
    else cast((a.low + b.high)  as decimal(18,2)) / 2
  end as [value], --b.high, a.low,
  b.percentile
from    MinimaAndMaxima a
  join  MinimaAndMaxima b on (a.percentile -1 = b.percentile) or (a.percentile = 10 and b.percentile = 10)
--where b.percentile = 5

当您有偶数个数据项时,这应该计算正确的中位数和百分比值。同样,如果您只想要中位数而不是整个百分位数分布,请取消最后的where子句的注释。

以下是我的解决方案:

with tempa as

 (

    select value,row_number() over (order by value) as Rn,/* Assigning a 
                                                           row_number */
           count(value) over () as Cnt /*Taking total count of the values */
    from numbers
    where value is not null /* Excluding the null values */
 ),

tempb as

  (

    /* Since we don't know whether the number of rows is odd or even, we shall 
     consider both the scenarios */

    select round(cnt/2) as Ref from tempa where mod(cnt,2)=1
    union all
    select round(cnt/2) a Ref from tempa where mod(cnt,2)=0
     union all
    select round(cnt/2) + 1 as Ref from tempa where mod(cnt,2)=0
   )
  select avg(value) as Median_Value

  from tempa where rn in

    ( select Ref from tempb);

犹斯丁上面的例子很好。但是主键的需求应该非常清楚地说明。我曾在野外见过没有密钥的代码,结果很糟糕。

我对Percentile_Cont的抱怨是它不会从数据集中给你一个实际的值。 要从数据集中获得一个实际值的“中值”,请使用Percentile_Disc。

SELECT SalesOrderID, OrderQty,
    PERCENTILE_DISC(0.5) 
        WITHIN GROUP (ORDER BY OrderQty)
        OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC

MS SQL Server 2012(及以后版本)有PERCENTILE_DISC函数,计算排序值的特定百分比。PERCENTILE_DISC(0.5)将计算中位数- https://msdn.microsoft.com/en-us/library/hh231327.aspx