根据MSDN, Median在Transact-SQL中不能作为聚合函数使用。但是,我想知道是否可以创建此功能(使用create Aggregate函数、用户定义函数或其他方法)。
最好的方法(如果可能的话)是什么——允许在聚合查询中计算中值(假设是数值数据类型)?
根据MSDN, Median在Transact-SQL中不能作为聚合函数使用。但是,我想知道是否可以创建此功能(使用create Aggregate函数、用户定义函数或其他方法)。
最好的方法(如果可能的话)是什么——允许在聚合查询中计算中值(假设是数值数据类型)?
当前回答
使用一条语句——一种方法是使用ROW_NUMBER(), COUNT()窗口函数并过滤子查询。下面是薪资中位数:
SELECT AVG(e_salary)
FROM
(SELECT
ROW_NUMBER() OVER(ORDER BY e_salary) as row_no,
e_salary,
(COUNT(*) OVER()+1)*0.5 AS row_half
FROM Employee) t
WHERE row_no IN (FLOOR(row_half),CEILING(row_half))
我在网上看到过类似的解决方案,使用地板和天花板,但尝试使用单一的语句。(编辑)
其他回答
中找到
这是查找属性中值的最简单方法。
Select round(S.salary,4) median from employee S
where (select count(salary) from station
where salary < S.salary ) = (select count(salary) from station
where salary > S.salary)
我想自己想出一个解决办法,但我的大脑绊倒了。我觉得很管用,但别让我早上解释。: P
DECLARE @table AS TABLE
(
Number int not null
);
insert into @table select 2;
insert into @table select 4;
insert into @table select 9;
insert into @table select 15;
insert into @table select 22;
insert into @table select 26;
insert into @table select 37;
insert into @table select 49;
DECLARE @Count AS INT
SELECT @Count = COUNT(*) FROM @table;
WITH MyResults(RowNo, Number) AS
(
SELECT RowNo, Number FROM
(SELECT ROW_NUMBER() OVER (ORDER BY Number) AS RowNo, Number FROM @table) AS Foo
)
SELECT AVG(Number) FROM MyResults WHERE RowNo = (@Count+1)/2 OR RowNo = ((@Count+1)%2) * ((@Count+2)/2)
2019 UPDATE: In the 10 years since I wrote this answer, more solutions have been uncovered that may yield better results. Also, SQL Server releases since then (especially SQL 2012) have introduced new T-SQL features that can be used to calculate medians. SQL Server releases have also improved its query optimizer which may affect perf of various median solutions. Net-net, my original 2009 post is still OK but there may be better solutions on for modern SQL Server apps. Take a look at this article from 2012 which is a great resource: https://sqlperformance.com/2012/08/t-sql-queries/median
本文发现,以下模式比所有其他选择都要快得多,至少在他们测试的简单模式上是这样。该解决方案比测试的最慢解决方案(PERCENTILE_CONT)快373x (!!)注意,这个技巧需要两个独立的查询,这可能不是在所有情况下都可行。它还需要SQL 2012或更高版本。
DECLARE @c BIGINT = (SELECT COUNT(*) FROM dbo.EvenRows);
SELECT AVG(1.0 * val)
FROM (
SELECT val FROM dbo.EvenRows
ORDER BY val
OFFSET (@c - 1) / 2 ROWS
FETCH NEXT 1 + (1 - @c % 2) ROWS ONLY
) AS x;
当然,仅仅因为2012年对一个模式的一次测试产生了很好的结果,您的实际情况可能会有所不同,特别是如果您使用的是SQL Server 2014或更高版本。如果性能对中值计算很重要,我强烈建议尝试并测试那篇文章中推荐的几个选项,以确保您找到了最适合您的模式的选项。
我还会特别小心地使用(SQL Server 2012新增的)函数PERCENTILE_CONT,这是这个问题的其他答案之一中推荐的,因为上面链接的文章发现这个内置函数比最快的解决方案慢373x。在过去的7年里,这种差异可能已经得到了改善,但就我个人而言,在验证它与其他解决方案的性能之前,我不会在大型表上使用这个函数。
2009年的原始帖子如下:
有很多方法可以做到这一点,它们的性能差别很大。下面是一个优化得特别好的解决方案,包括median、ROW_NUMBERs和性能。当涉及到执行期间生成的实际I/ o时,这是一个特别优的解决方案——它看起来比其他解决方案成本更高,但实际上要快得多。
该页还包含对其他解决方案和性能测试细节的讨论。请注意,如果有多行具有相同的中位数列值,则使用唯一列作为消歧器。
就像所有的数据库性能场景一样,总是尝试在真实的硬件上用真实的数据测试解决方案——你永远不知道什么时候对SQL Server优化器的更改或环境中的某个特性会使正常快速的解决方案变慢。
SELECT
CustomerId,
AVG(TotalDue)
FROM
(
SELECT
CustomerId,
TotalDue,
-- SalesOrderId in the ORDER BY is a disambiguator to break ties
ROW_NUMBER() OVER (
PARTITION BY CustomerId
ORDER BY TotalDue ASC, SalesOrderId ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY CustomerId
ORDER BY TotalDue DESC, SalesOrderId DESC) AS RowDesc
FROM Sales.SalesOrderHeader SOH
) x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY CustomerId
ORDER BY CustomerId;
关于你的问题,杰夫·阿特伍德已经给出了简单有效的解决方案。但是,如果您正在寻找一些计算中位数的替代方法,下面的SQL代码将帮助您。
create table employees(salary int); insert into employees values(8); insert into employees values(23); insert into employees values(45); insert into employees values(123); insert into employees values(93); insert into employees values(2342); insert into employees values(2238); select * from employees; declare @odd_even int; declare @cnt int; declare @middle_no int; set @cnt=(select count(*) from employees); set @middle_no=(@cnt/2)+1; select @odd_even=case when (@cnt%2=0) THEN -1 ELse 0 END ; select AVG(tbl.salary) from (select salary,ROW_NUMBER() over (order by salary) as rno from employees group by salary) tbl where tbl.rno=@middle_no or tbl.rno=@middle_no+@odd_even;
如果你想在MySQL中计算中位数,这个github链接会很有用。
我只是在寻找一个基于集的中位数的解决方案时偶然发现了这一页。在研究了一些解决方案之后,我想到了以下几点。希望是有用的。
DECLARE @test TABLE(
i int identity(1,1),
id int,
score float
)
INSERT INTO @test (id,score) VALUES (1,10)
INSERT INTO @test (id,score) VALUES (1,11)
INSERT INTO @test (id,score) VALUES (1,15)
INSERT INTO @test (id,score) VALUES (1,19)
INSERT INTO @test (id,score) VALUES (1,20)
INSERT INTO @test (id,score) VALUES (2,20)
INSERT INTO @test (id,score) VALUES (2,21)
INSERT INTO @test (id,score) VALUES (2,25)
INSERT INTO @test (id,score) VALUES (2,29)
INSERT INTO @test (id,score) VALUES (2,30)
INSERT INTO @test (id,score) VALUES (3,20)
INSERT INTO @test (id,score) VALUES (3,21)
INSERT INTO @test (id,score) VALUES (3,25)
INSERT INTO @test (id,score) VALUES (3,29)
DECLARE @counts TABLE(
id int,
cnt int
)
INSERT INTO @counts (
id,
cnt
)
SELECT
id,
COUNT(*)
FROM
@test
GROUP BY
id
SELECT
drv.id,
drv.start,
AVG(t.score)
FROM
(
SELECT
MIN(t.i)-1 AS start,
t.id
FROM
@test t
GROUP BY
t.id
) drv
INNER JOIN @test t ON drv.id = t.id
INNER JOIN @counts c ON t.id = c.id
WHERE
t.i = ((c.cnt+1)/2)+drv.start
OR (
t.i = (((c.cnt+1)%2) * ((c.cnt+2)/2))+drv.start
AND ((c.cnt+1)%2) * ((c.cnt+2)/2) <> 0
)
GROUP BY
drv.id,
drv.start