根据MSDN, Median在Transact-SQL中不能作为聚合函数使用。但是,我想知道是否可以创建此功能(使用create Aggregate函数、用户定义函数或其他方法)。

最好的方法(如果可能的话)是什么——允许在聚合查询中计算中值(假设是数值数据类型)?


当前回答

我最初的回答是:

select  max(my_column) as [my_column], quartile
from    (select my_column, ntile(4) over (order by my_column) as [quartile]
         from   my_table) i
--where quartile = 2
group by quartile

这将使您一举获得中位数和四分位范围。如果你真的只想要一行作为中值,那么取消注释where子句。

当你把它放入解释计划时,60%的工作是对数据进行排序,这在计算像这样的位置依赖统计数据时是不可避免的。

我修改了答案,以遵循Robert Ševčík-Robajz在下面的评论中提出的优秀建议:

;with PartitionedData as
  (select my_column, ntile(10) over (order by my_column) as [percentile]
   from   my_table),
MinimaAndMaxima as
  (select  min(my_column) as [low], max(my_column) as [high], percentile
   from    PartitionedData
   group by percentile)
select
  case
    when b.percentile = 10 then cast(b.high as decimal(18,2))
    else cast((a.low + b.high)  as decimal(18,2)) / 2
  end as [value], --b.high, a.low,
  b.percentile
from    MinimaAndMaxima a
  join  MinimaAndMaxima b on (a.percentile -1 = b.percentile) or (a.percentile = 10 and b.percentile = 10)
--where b.percentile = 5

当您有偶数个数据项时,这应该计算正确的中位数和百分比值。同样,如果您只想要中位数而不是整个百分位数分布,请取消最后的where子句的注释。

其他回答

我尝试了几种替代方案,但由于我的数据记录有重复的值,ROW_NUMBER版本似乎不是我的选择。这里是我使用的查询(NTILE版本):

SELECT distinct
   CustomerId,
   (
       MAX(CASE WHEN Percent50_Asc=1 THEN TotalDue END) OVER (PARTITION BY CustomerId)  +
       MIN(CASE WHEN Percent50_desc=1 THEN TotalDue END) OVER (PARTITION BY CustomerId) 
   )/2 MEDIAN
FROM
(
   SELECT
      CustomerId,
      TotalDue,
     NTILE(2) OVER (
         PARTITION BY CustomerId
         ORDER BY TotalDue ASC) AS Percent50_Asc,
     NTILE(2) OVER (
         PARTITION BY CustomerId
         ORDER BY TotalDue DESC) AS Percent50_desc
   FROM Sales.SalesOrderHeader SOH
) x
ORDER BY CustomerId;
--Create Temp Table to Store Results in
DECLARE @results AS TABLE 
(
    [Month] datetime not null
 ,[Median] int not null
);

--This variable will determine the date
DECLARE @IntDate as int 
set @IntDate = -13


WHILE (@IntDate < 0) 
BEGIN

--Create Temp Table
DECLARE @table AS TABLE 
(
    [Rank] int not null
 ,[Days Open] int not null
);

--Insert records into Temp Table
insert into @table 

SELECT 
    rank() OVER (ORDER BY DATEADD(mm, DATEDIFF(mm, 0, DATEADD(ss, SVR.close_date, '1970')), 0), DATEDIFF(day,DATEADD(ss, SVR.open_date, '1970'),DATEADD(ss, SVR.close_date, '1970')),[SVR].[ref_num]) as [Rank]
 ,DATEDIFF(day,DATEADD(ss, SVR.open_date, '1970'),DATEADD(ss, SVR.close_date, '1970')) as [Days Open]
FROM
 mdbrpt.dbo.View_Request SVR
 LEFT OUTER JOIN dbo.dtv_apps_systems vapp 
 on SVR.category = vapp.persid
 LEFT OUTER JOIN dbo.prob_ctg pctg 
 on SVR.category = pctg.persid
 Left Outer Join [mdbrpt].[dbo].[rootcause] as [Root Cause] 
 on [SVR].[rootcause]=[Root Cause].[id]
 Left Outer Join [mdbrpt].[dbo].[cr_stat] as [Status]
 on [SVR].[status]=[Status].[code]
 LEFT OUTER JOIN [mdbrpt].[dbo].[net_res] as [net] 
 on [net].[id]=SVR.[affected_rc]
WHERE
 SVR.Type IN ('P') 
 AND
 SVR.close_date IS NOT NULL 
 AND
 [Status].[SYM] = 'Closed'
 AND
 SVR.parent is null
 AND
 [Root Cause].[sym] in ( 'RC - Application','RC - Hardware', 'RC - Operational', 'RC - Unknown')
 AND
 (
  [vapp].[appl_name] in ('3PI','Billing Rpts/Files','Collabrent','Reports','STMS','STMS 2','Telco','Comergent','OOM','C3-BAU','C3-DD','DIRECTV','DIRECTV Sales','DIRECTV Self Care','Dealer Website','EI Servlet','Enterprise Integration','ET','ICAN','ODS','SB-SCM','SeeBeyond','Digital Dashboard','IVR','OMS','Order Services','Retail Services','OSCAR','SAP','CTI','RIO','RIO Call Center','RIO Field Services','FSS-RIO3','TAOS','TCS')
 OR
  pctg.sym in ('Systems.Release Health Dashboard.Problem','DTV QA Test.Enterprise Release.Deferred Defect Log')
 AND  
  [Net].[nr_desc] in ('3PI','Billing Rpts/Files','Collabrent','Reports','STMS','STMS 2','Telco','Comergent','OOM','C3-BAU','C3-DD','DIRECTV','DIRECTV Sales','DIRECTV Self Care','Dealer Website','EI Servlet','Enterprise Integration','ET','ICAN','ODS','SB-SCM','SeeBeyond','Digital Dashboard','IVR','OMS','Order Services','Retail Services','OSCAR','SAP','CTI','RIO','RIO Call Center','RIO Field Services','FSS-RIO3','TAOS','TCS')
 )
 AND
 DATEADD(mm, DATEDIFF(mm, 0, DATEADD(ss, SVR.close_date, '1970')), 0) = DATEADD(mm, DATEDIFF(mm,0,DATEADD(mm,@IntDate,getdate())), 0)
ORDER BY [Days Open]



DECLARE @Count AS INT
SELECT @Count = COUNT(*) FROM @table;

WITH MyResults(RowNo, [Days Open]) AS
(
    SELECT RowNo, [Days Open] FROM
        (SELECT ROW_NUMBER() OVER (ORDER BY [Days Open]) AS RowNo, [Days Open] FROM @table) AS Foo
)


insert into @results
SELECT 
 DATEADD(mm, DATEDIFF(mm,0,DATEADD(mm,@IntDate,getdate())), 0) as [Month]
 ,AVG([Days Open])as [Median] FROM MyResults WHERE RowNo = (@Count+1)/2 OR RowNo = ((@Count+1)%2) * ((@Count+2)/2) 


set @IntDate = @IntDate+1
DELETE FROM @table
END

select *
from @results
order by [Month]

我最初的回答是:

select  max(my_column) as [my_column], quartile
from    (select my_column, ntile(4) over (order by my_column) as [quartile]
         from   my_table) i
--where quartile = 2
group by quartile

这将使您一举获得中位数和四分位范围。如果你真的只想要一行作为中值,那么取消注释where子句。

当你把它放入解释计划时,60%的工作是对数据进行排序,这在计算像这样的位置依赖统计数据时是不可避免的。

我修改了答案,以遵循Robert Ševčík-Robajz在下面的评论中提出的优秀建议:

;with PartitionedData as
  (select my_column, ntile(10) over (order by my_column) as [percentile]
   from   my_table),
MinimaAndMaxima as
  (select  min(my_column) as [low], max(my_column) as [high], percentile
   from    PartitionedData
   group by percentile)
select
  case
    when b.percentile = 10 then cast(b.high as decimal(18,2))
    else cast((a.low + b.high)  as decimal(18,2)) / 2
  end as [value], --b.high, a.low,
  b.percentile
from    MinimaAndMaxima a
  join  MinimaAndMaxima b on (a.percentile -1 = b.percentile) or (a.percentile = 10 and b.percentile = 10)
--where b.percentile = 5

当您有偶数个数据项时,这应该计算正确的中位数和百分比值。同样,如果您只想要中位数而不是整个百分位数分布,请取消最后的where子句的注释。

这段代码有点长,但很容易理解

medii是有列val的表,它有数据集, Smedi是一个cte,它将列idx作为行号,val作为medi表中的'val',该表是升序排序的。 这是基本的数学,如果行号是奇数,那么它的中值来自smedi。 当它是偶数时,它是中间两个值的平均值。

with smedi(idx,vals) as(
                select ROW_NUMBER() over(order by val),val from medi
                )
select (case
            when (select count(*) from medi)%2!=0 then (select vals from smedi where (((select count(*) from medi)/2))=idx)
            else (select avg(vals) from smedi where idx in ((select count(*)/2 from medi),(select (count(*)/2)+1 from medi)))
            end)

试试下面的逻辑来找出中位数:

考虑一个包含以下数字的表格: 1、1、2、3、4、5所示

中位数是2.5

with tempa as 
(
    select num,count(num) over() as Cnt,
        row_number() over (order by num) as Rnum
    from temp),
tempb as
    (
        select round(cnt/2) as ref_value
        from tempa where mod(cnt,2)<>0
        union all
        select round(cnt/2) from tempa where mod(cnt,2)=0
        union all
        select round(cnt/2+1)
        from tempa where mod(cnt,2)=0
    )
select avg(num) from tempa
where rnum in (select * from tempb);