根据MSDN, Median在Transact-SQL中不能作为聚合函数使用。但是,我想知道是否可以创建此功能(使用create Aggregate函数、用户定义函数或其他方法)。
最好的方法(如果可能的话)是什么——允许在聚合查询中计算中值(假设是数值数据类型)?
根据MSDN, Median在Transact-SQL中不能作为聚合函数使用。但是,我想知道是否可以创建此功能(使用create Aggregate函数、用户定义函数或其他方法)。
最好的方法(如果可能的话)是什么——允许在聚合查询中计算中值(假设是数值数据类型)?
当前回答
对于连续变量/测量'table1'中的'col1'
select col1
from
(select top 50 percent col1,
ROW_NUMBER() OVER(ORDER BY col1 ASC) AS Rowa,
ROW_NUMBER() OVER(ORDER BY col1 DESC) AS Rowd
from table1 ) tmp
where tmp.Rowa = tmp.Rowd
其他回答
--Create Temp Table to Store Results in
DECLARE @results AS TABLE
(
[Month] datetime not null
,[Median] int not null
);
--This variable will determine the date
DECLARE @IntDate as int
set @IntDate = -13
WHILE (@IntDate < 0)
BEGIN
--Create Temp Table
DECLARE @table AS TABLE
(
[Rank] int not null
,[Days Open] int not null
);
--Insert records into Temp Table
insert into @table
SELECT
rank() OVER (ORDER BY DATEADD(mm, DATEDIFF(mm, 0, DATEADD(ss, SVR.close_date, '1970')), 0), DATEDIFF(day,DATEADD(ss, SVR.open_date, '1970'),DATEADD(ss, SVR.close_date, '1970')),[SVR].[ref_num]) as [Rank]
,DATEDIFF(day,DATEADD(ss, SVR.open_date, '1970'),DATEADD(ss, SVR.close_date, '1970')) as [Days Open]
FROM
mdbrpt.dbo.View_Request SVR
LEFT OUTER JOIN dbo.dtv_apps_systems vapp
on SVR.category = vapp.persid
LEFT OUTER JOIN dbo.prob_ctg pctg
on SVR.category = pctg.persid
Left Outer Join [mdbrpt].[dbo].[rootcause] as [Root Cause]
on [SVR].[rootcause]=[Root Cause].[id]
Left Outer Join [mdbrpt].[dbo].[cr_stat] as [Status]
on [SVR].[status]=[Status].[code]
LEFT OUTER JOIN [mdbrpt].[dbo].[net_res] as [net]
on [net].[id]=SVR.[affected_rc]
WHERE
SVR.Type IN ('P')
AND
SVR.close_date IS NOT NULL
AND
[Status].[SYM] = 'Closed'
AND
SVR.parent is null
AND
[Root Cause].[sym] in ( 'RC - Application','RC - Hardware', 'RC - Operational', 'RC - Unknown')
AND
(
[vapp].[appl_name] in ('3PI','Billing Rpts/Files','Collabrent','Reports','STMS','STMS 2','Telco','Comergent','OOM','C3-BAU','C3-DD','DIRECTV','DIRECTV Sales','DIRECTV Self Care','Dealer Website','EI Servlet','Enterprise Integration','ET','ICAN','ODS','SB-SCM','SeeBeyond','Digital Dashboard','IVR','OMS','Order Services','Retail Services','OSCAR','SAP','CTI','RIO','RIO Call Center','RIO Field Services','FSS-RIO3','TAOS','TCS')
OR
pctg.sym in ('Systems.Release Health Dashboard.Problem','DTV QA Test.Enterprise Release.Deferred Defect Log')
AND
[Net].[nr_desc] in ('3PI','Billing Rpts/Files','Collabrent','Reports','STMS','STMS 2','Telco','Comergent','OOM','C3-BAU','C3-DD','DIRECTV','DIRECTV Sales','DIRECTV Self Care','Dealer Website','EI Servlet','Enterprise Integration','ET','ICAN','ODS','SB-SCM','SeeBeyond','Digital Dashboard','IVR','OMS','Order Services','Retail Services','OSCAR','SAP','CTI','RIO','RIO Call Center','RIO Field Services','FSS-RIO3','TAOS','TCS')
)
AND
DATEADD(mm, DATEDIFF(mm, 0, DATEADD(ss, SVR.close_date, '1970')), 0) = DATEADD(mm, DATEDIFF(mm,0,DATEADD(mm,@IntDate,getdate())), 0)
ORDER BY [Days Open]
DECLARE @Count AS INT
SELECT @Count = COUNT(*) FROM @table;
WITH MyResults(RowNo, [Days Open]) AS
(
SELECT RowNo, [Days Open] FROM
(SELECT ROW_NUMBER() OVER (ORDER BY [Days Open]) AS RowNo, [Days Open] FROM @table) AS Foo
)
insert into @results
SELECT
DATEADD(mm, DATEDIFF(mm,0,DATEADD(mm,@IntDate,getdate())), 0) as [Month]
,AVG([Days Open])as [Median] FROM MyResults WHERE RowNo = (@Count+1)/2 OR RowNo = ((@Count+1)%2) * ((@Count+2)/2)
set @IntDate = @IntDate+1
DELETE FROM @table
END
select *
from @results
order by [Month]
如果你使用的是SQL 2005或更好的版本,这是一个很好的,简单的中位数计算表中的单列:
SELECT
(
(SELECT MAX(Score) FROM
(SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score) AS BottomHalf)
+
(SELECT MIN(Score) FROM
(SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score DESC) AS TopHalf)
) / 2 AS Median
在SQL Server 2012中,您应该使用PERCENTILE_CONT:
SELECT SalesOrderID, OrderQty,
PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY OrderQty)
OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
参见:http://blog.sqlauthority.com/2011/11/20/sql-server-introduction-to-percentile_cont-analytic-functions-introduced-in-sql-server-2012/
在我的解决方案表中是一个只有分数列的学生表,我正在计算分数的中位数,这个解决方案是基于SQL server 2019的
with total_c as ( --Total_c CTE counts total number of rows in a table
select count(*) as n from student
),
even as ( --Even CTE extract two middle rows if the number of rows are even
select marks from student
order by marks
offset (select n from total_c)/2 -1 rows
fetch next 2 rows only
),
odd as ( --Odd CTE extract middle row if the number of rows are odd
select marks from student
order by marks
offset (select n + 1 from total_c)/2 -1 rows
fetch next 1 rows only
)
--Case statement helps to select odd or even CTE based on number of rows
select
case when n%2 = 0 then (select avg(cast(marks as float)) from even)
else (select marks from odd)
end as med_marks
from total_c
这是我能想到的求中位数的最优解。示例中的名称基于Justin示例。确保表有索引 销售。SalesOrderHeader以索引列CustomerId和TotalDue的顺序存在。
SELECT
sohCount.CustomerId,
AVG(sohMid.TotalDue) as TotalDueMedian
FROM
(SELECT
soh.CustomerId,
COUNT(*) as NumberOfRows
FROM
Sales.SalesOrderHeader soh
GROUP BY soh.CustomerId) As sohCount
CROSS APPLY
(Select
soh.TotalDue
FROM
Sales.SalesOrderHeader soh
WHERE soh.CustomerId = sohCount.CustomerId
ORDER BY soh.TotalDue
OFFSET sohCount.NumberOfRows / 2 - ((sohCount.NumberOfRows + 1) % 2) ROWS
FETCH NEXT 1 + ((sohCount.NumberOfRows + 1) % 2) ROWS ONLY
) As sohMid
GROUP BY sohCount.CustomerId
更新
我有点不确定哪种方法性能最好,所以我比较了我的方法Justin Grants和Jeff Atwoods,在一个批量中运行基于这三种方法的查询,每个查询的批量成本为:
没有指数:
我的30% Justin Grants 13% Jeff Atwoods 58%
还有index
我的3%。 Justin Grants 10% Jeff Atwoods 87%
I tried to see how well the queries scale if you have index by creating more data from around 14 000 rows by a factor of 2 up to 512 which means in the end around 7,2 millions rows. Note I made sure CustomeId field where unique for each time I did a single copy, so the proportion of rows compared to unique instance of CustomerId was kept constant. While I was doing this I ran executions where I rebuilt index afterwards, and I noticed the results stabilized at around a factor of 128 with the data I had to these values:
我的3%。 贾斯汀·格兰特5% Jeff Atwoods 92%
我想知道,在保持惟一CustomerId不变的情况下,扩展行数会如何影响性能,因此我设置了一个新的测试,在其中执行了上述操作。现在,批成本比率并没有稳定下来,而是不断分化,每个CustomerId平均大约有20行,最后每个这样唯一的Id大约有10000行。数字如下:
我的4% 贾斯汀60% 杰夫斯35%
通过比较结果,我确保我正确地实现了每个方法。 我的结论是,只要索引存在,我使用的方法通常更快。还要注意,本文针对这个特定问题推荐使用这种方法https://www.microsoftpressstore.com/articles/article.aspx?p=2314819&seqNum=5
进一步提高对该查询的后续调用的性能的一种方法是在辅助表中持久化计数信息。您甚至可以通过一个触发器来维护它,该触发器更新并保存有关依赖于CustomerId的SalesOrderHeader行计数的信息,当然您也可以简单地存储中值。