根据MSDN, Median在Transact-SQL中不能作为聚合函数使用。但是,我想知道是否可以创建此功能(使用create Aggregate函数、用户定义函数或其他方法)。
最好的方法(如果可能的话)是什么——允许在聚合查询中计算中值(假设是数值数据类型)?
根据MSDN, Median在Transact-SQL中不能作为聚合函数使用。但是,我想知道是否可以创建此功能(使用create Aggregate函数、用户定义函数或其他方法)。
最好的方法(如果可能的话)是什么——允许在聚合查询中计算中值(假设是数值数据类型)?
当前回答
如果你使用的是SQL 2005或更好的版本,这是一个很好的,简单的中位数计算表中的单列:
SELECT
(
(SELECT MAX(Score) FROM
(SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score) AS BottomHalf)
+
(SELECT MIN(Score) FROM
(SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score DESC) AS TopHalf)
) / 2 AS Median
其他回答
使用COUNT聚合, 首先可以计算有多少行,并存储在一个名为@cnt的变量中。然后 你可以计算OFFSET-FETCH过滤器的参数来指定,基于数量排序, 要跳过多少行(偏移值)和筛选多少行(获取值)。
行数 跳过是(@cnt - 1) / 2。很明显,对于奇数,这个计算是正确的,因为 首先对单个中间值减去1,然后再除以2。
这也适用于偶数计数,因为表达式中使用的除法是 整数除法;所以,当一个偶数减去1时,你得到的是一个奇数。
When dividing that odd value by 2, the fraction part of the result (.5) is truncated. The number of rows to fetch is 2 - (@cnt % 2). The idea is that when the count is odd the result of the modulo operation is 1, and you need to fetch 1 row. When the count is even the result of the modulo operation is 0, and you need to fetch 2 rows. By subtracting the 1 or 0 result of the modulo operation from 2, you get the desired 1 or 2, respectively. Finally, to compute the median quantity, take the one or two result quantities, and apply an average after converting the input integer value to a numeric one as follows:
DECLARE @cnt AS INT = (SELECT COUNT(*) FROM [Sales].[production].[stocks]);
SELECT AVG(1.0 * quantity) AS median
FROM ( SELECT quantity
FROM [Sales].[production].[stocks]
ORDER BY quantity
OFFSET (@cnt - 1) / 2 ROWS FETCH NEXT 2 - @cnt % 2 ROWS ONLY ) AS D;
我最初的回答是:
select max(my_column) as [my_column], quartile
from (select my_column, ntile(4) over (order by my_column) as [quartile]
from my_table) i
--where quartile = 2
group by quartile
这将使您一举获得中位数和四分位范围。如果你真的只想要一行作为中值,那么取消注释where子句。
当你把它放入解释计划时,60%的工作是对数据进行排序,这在计算像这样的位置依赖统计数据时是不可避免的。
我修改了答案,以遵循Robert Ševčík-Robajz在下面的评论中提出的优秀建议:
;with PartitionedData as
(select my_column, ntile(10) over (order by my_column) as [percentile]
from my_table),
MinimaAndMaxima as
(select min(my_column) as [low], max(my_column) as [high], percentile
from PartitionedData
group by percentile)
select
case
when b.percentile = 10 then cast(b.high as decimal(18,2))
else cast((a.low + b.high) as decimal(18,2)) / 2
end as [value], --b.high, a.low,
b.percentile
from MinimaAndMaxima a
join MinimaAndMaxima b on (a.percentile -1 = b.percentile) or (a.percentile = 10 and b.percentile = 10)
--where b.percentile = 5
当您有偶数个数据项时,这应该计算正确的中位数和百分比值。同样,如果您只想要中位数而不是整个百分位数分布,请取消最后的where子句的注释。
以下是我的解决方案:
with tempa as
(
select value,row_number() over (order by value) as Rn,/* Assigning a
row_number */
count(value) over () as Cnt /*Taking total count of the values */
from numbers
where value is not null /* Excluding the null values */
),
tempb as
(
/* Since we don't know whether the number of rows is odd or even, we shall
consider both the scenarios */
select round(cnt/2) as Ref from tempa where mod(cnt,2)=1
union all
select round(cnt/2) a Ref from tempa where mod(cnt,2)=0
union all
select round(cnt/2) + 1 as Ref from tempa where mod(cnt,2)=0
)
select avg(value) as Median_Value
from tempa where rn in
( select Ref from tempb);
犹斯丁上面的例子很好。但是主键的需求应该非常清楚地说明。我曾在野外见过没有密钥的代码,结果很糟糕。
我对Percentile_Cont的抱怨是它不会从数据集中给你一个实际的值。 要从数据集中获得一个实际值的“中值”,请使用Percentile_Disc。
SELECT SalesOrderID, OrderQty,
PERCENTILE_DISC(0.5)
WITHIN GROUP (ORDER BY OrderQty)
OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
MS SQL Server 2012(及以后版本)有PERCENTILE_DISC函数,计算排序值的特定百分比。PERCENTILE_DISC(0.5)将计算中位数- https://msdn.microsoft.com/en-us/library/hh231327.aspx