根据MSDN, Median在Transact-SQL中不能作为聚合函数使用。但是,我想知道是否可以创建此功能(使用create Aggregate函数、用户定义函数或其他方法)。

最好的方法(如果可能的话)是什么——允许在聚合查询中计算中值(假设是数值数据类型)?


当前回答

这适用于SQL 2000:

DECLARE @testTable TABLE 
( 
    VALUE   INT
)
--INSERT INTO @testTable -- Even Test
--SELECT 3 UNION ALL
--SELECT 5 UNION ALL
--SELECT 7 UNION ALL
--SELECT 12 UNION ALL
--SELECT 13 UNION ALL
--SELECT 14 UNION ALL
--SELECT 21 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 29 UNION ALL
--SELECT 40 UNION ALL
--SELECT 56

--
--INSERT INTO @testTable -- Odd Test
--SELECT 3 UNION ALL
--SELECT 5 UNION ALL
--SELECT 7 UNION ALL
--SELECT 12 UNION ALL
--SELECT 13 UNION ALL
--SELECT 14 UNION ALL
--SELECT 21 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 29 UNION ALL
--SELECT 39 UNION ALL
--SELECT 40 UNION ALL
--SELECT 56


DECLARE @RowAsc TABLE
(
    ID      INT IDENTITY,
    Amount  INT
)

INSERT INTO @RowAsc
SELECT  VALUE 
FROM    @testTable 
ORDER BY VALUE ASC

SELECT  AVG(amount)
FROM @RowAsc ra
WHERE ra.id IN
(
    SELECT  ID 
    FROM    @RowAsc
    WHERE   ra.id -
    (
        SELECT  MAX(id) / 2.0 
        FROM    @RowAsc
    ) BETWEEN 0 AND 1

)

其他回答

--Create Temp Table to Store Results in
DECLARE @results AS TABLE 
(
    [Month] datetime not null
 ,[Median] int not null
);

--This variable will determine the date
DECLARE @IntDate as int 
set @IntDate = -13


WHILE (@IntDate < 0) 
BEGIN

--Create Temp Table
DECLARE @table AS TABLE 
(
    [Rank] int not null
 ,[Days Open] int not null
);

--Insert records into Temp Table
insert into @table 

SELECT 
    rank() OVER (ORDER BY DATEADD(mm, DATEDIFF(mm, 0, DATEADD(ss, SVR.close_date, '1970')), 0), DATEDIFF(day,DATEADD(ss, SVR.open_date, '1970'),DATEADD(ss, SVR.close_date, '1970')),[SVR].[ref_num]) as [Rank]
 ,DATEDIFF(day,DATEADD(ss, SVR.open_date, '1970'),DATEADD(ss, SVR.close_date, '1970')) as [Days Open]
FROM
 mdbrpt.dbo.View_Request SVR
 LEFT OUTER JOIN dbo.dtv_apps_systems vapp 
 on SVR.category = vapp.persid
 LEFT OUTER JOIN dbo.prob_ctg pctg 
 on SVR.category = pctg.persid
 Left Outer Join [mdbrpt].[dbo].[rootcause] as [Root Cause] 
 on [SVR].[rootcause]=[Root Cause].[id]
 Left Outer Join [mdbrpt].[dbo].[cr_stat] as [Status]
 on [SVR].[status]=[Status].[code]
 LEFT OUTER JOIN [mdbrpt].[dbo].[net_res] as [net] 
 on [net].[id]=SVR.[affected_rc]
WHERE
 SVR.Type IN ('P') 
 AND
 SVR.close_date IS NOT NULL 
 AND
 [Status].[SYM] = 'Closed'
 AND
 SVR.parent is null
 AND
 [Root Cause].[sym] in ( 'RC - Application','RC - Hardware', 'RC - Operational', 'RC - Unknown')
 AND
 (
  [vapp].[appl_name] in ('3PI','Billing Rpts/Files','Collabrent','Reports','STMS','STMS 2','Telco','Comergent','OOM','C3-BAU','C3-DD','DIRECTV','DIRECTV Sales','DIRECTV Self Care','Dealer Website','EI Servlet','Enterprise Integration','ET','ICAN','ODS','SB-SCM','SeeBeyond','Digital Dashboard','IVR','OMS','Order Services','Retail Services','OSCAR','SAP','CTI','RIO','RIO Call Center','RIO Field Services','FSS-RIO3','TAOS','TCS')
 OR
  pctg.sym in ('Systems.Release Health Dashboard.Problem','DTV QA Test.Enterprise Release.Deferred Defect Log')
 AND  
  [Net].[nr_desc] in ('3PI','Billing Rpts/Files','Collabrent','Reports','STMS','STMS 2','Telco','Comergent','OOM','C3-BAU','C3-DD','DIRECTV','DIRECTV Sales','DIRECTV Self Care','Dealer Website','EI Servlet','Enterprise Integration','ET','ICAN','ODS','SB-SCM','SeeBeyond','Digital Dashboard','IVR','OMS','Order Services','Retail Services','OSCAR','SAP','CTI','RIO','RIO Call Center','RIO Field Services','FSS-RIO3','TAOS','TCS')
 )
 AND
 DATEADD(mm, DATEDIFF(mm, 0, DATEADD(ss, SVR.close_date, '1970')), 0) = DATEADD(mm, DATEDIFF(mm,0,DATEADD(mm,@IntDate,getdate())), 0)
ORDER BY [Days Open]



DECLARE @Count AS INT
SELECT @Count = COUNT(*) FROM @table;

WITH MyResults(RowNo, [Days Open]) AS
(
    SELECT RowNo, [Days Open] FROM
        (SELECT ROW_NUMBER() OVER (ORDER BY [Days Open]) AS RowNo, [Days Open] FROM @table) AS Foo
)


insert into @results
SELECT 
 DATEADD(mm, DATEDIFF(mm,0,DATEADD(mm,@IntDate,getdate())), 0) as [Month]
 ,AVG([Days Open])as [Median] FROM MyResults WHERE RowNo = (@Count+1)/2 OR RowNo = ((@Count+1)%2) * ((@Count+2)/2) 


set @IntDate = @IntDate+1
DELETE FROM @table
END

select *
from @results
order by [Month]

2019 UPDATE: In the 10 years since I wrote this answer, more solutions have been uncovered that may yield better results. Also, SQL Server releases since then (especially SQL 2012) have introduced new T-SQL features that can be used to calculate medians. SQL Server releases have also improved its query optimizer which may affect perf of various median solutions. Net-net, my original 2009 post is still OK but there may be better solutions on for modern SQL Server apps. Take a look at this article from 2012 which is a great resource: https://sqlperformance.com/2012/08/t-sql-queries/median

本文发现,以下模式比所有其他选择都要快得多,至少在他们测试的简单模式上是这样。该解决方案比测试的最慢解决方案(PERCENTILE_CONT)快373x (!!)注意,这个技巧需要两个独立的查询,这可能不是在所有情况下都可行。它还需要SQL 2012或更高版本。

DECLARE @c BIGINT = (SELECT COUNT(*) FROM dbo.EvenRows);

SELECT AVG(1.0 * val)
FROM (
    SELECT val FROM dbo.EvenRows
     ORDER BY val
     OFFSET (@c - 1) / 2 ROWS
     FETCH NEXT 1 + (1 - @c % 2) ROWS ONLY
) AS x;

当然,仅仅因为2012年对一个模式的一次测试产生了很好的结果,您的实际情况可能会有所不同,特别是如果您使用的是SQL Server 2014或更高版本。如果性能对中值计算很重要,我强烈建议尝试并测试那篇文章中推荐的几个选项,以确保您找到了最适合您的模式的选项。

我还会特别小心地使用(SQL Server 2012新增的)函数PERCENTILE_CONT,这是这个问题的其他答案之一中推荐的,因为上面链接的文章发现这个内置函数比最快的解决方案慢373x。在过去的7年里,这种差异可能已经得到了改善,但就我个人而言,在验证它与其他解决方案的性能之前,我不会在大型表上使用这个函数。

2009年的原始帖子如下:

有很多方法可以做到这一点,它们的性能差别很大。下面是一个优化得特别好的解决方案,包括median、ROW_NUMBERs和性能。当涉及到执行期间生成的实际I/ o时,这是一个特别优的解决方案——它看起来比其他解决方案成本更高,但实际上要快得多。

该页还包含对其他解决方案和性能测试细节的讨论。请注意,如果有多行具有相同的中位数列值,则使用唯一列作为消歧器。

就像所有的数据库性能场景一样,总是尝试在真实的硬件上用真实的数据测试解决方案——你永远不知道什么时候对SQL Server优化器的更改或环境中的某个特性会使正常快速的解决方案变慢。

SELECT
   CustomerId,
   AVG(TotalDue)
FROM
(
   SELECT
      CustomerId,
      TotalDue,
      -- SalesOrderId in the ORDER BY is a disambiguator to break ties
      ROW_NUMBER() OVER (
         PARTITION BY CustomerId
         ORDER BY TotalDue ASC, SalesOrderId ASC) AS RowAsc,
      ROW_NUMBER() OVER (
         PARTITION BY CustomerId
         ORDER BY TotalDue DESC, SalesOrderId DESC) AS RowDesc
   FROM Sales.SalesOrderHeader SOH
) x
WHERE
   RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY CustomerId
ORDER BY CustomerId;

关于你的问题,杰夫·阿特伍德已经给出了简单有效的解决方案。但是,如果您正在寻找一些计算中位数的替代方法,下面的SQL代码将帮助您。

create table employees(salary int); insert into employees values(8); insert into employees values(23); insert into employees values(45); insert into employees values(123); insert into employees values(93); insert into employees values(2342); insert into employees values(2238); select * from employees; declare @odd_even int; declare @cnt int; declare @middle_no int; set @cnt=(select count(*) from employees); set @middle_no=(@cnt/2)+1; select @odd_even=case when (@cnt%2=0) THEN -1 ELse 0 END ; select AVG(tbl.salary) from (select salary,ROW_NUMBER() over (order by salary) as rno from employees group by salary) tbl where tbl.rno=@middle_no or tbl.rno=@middle_no+@odd_even;

如果你想在MySQL中计算中位数,这个github链接会很有用。

犹斯丁上面的例子很好。但是主键的需求应该非常清楚地说明。我曾在野外见过没有密钥的代码,结果很糟糕。

我对Percentile_Cont的抱怨是它不会从数据集中给你一个实际的值。 要从数据集中获得一个实际值的“中值”,请使用Percentile_Disc。

SELECT SalesOrderID, OrderQty,
    PERCENTILE_DISC(0.5) 
        WITHIN GROUP (ORDER BY OrderQty)
        OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC

从员工表中得到工资的中位数

with cte as (select salary, ROW_NUMBER() over (order by salary asc) as num from employees)

select avg(salary) from cte where num in ((select (count(*)+1)/2 from employees), (select (count(*)+2)/2 from employees));