根据MSDN, Median在Transact-SQL中不能作为聚合函数使用。但是,我想知道是否可以创建此功能(使用create Aggregate函数、用户定义函数或其他方法)。

最好的方法(如果可能的话)是什么——允许在聚合查询中计算中值(假设是数值数据类型)?


当前回答

这适用于SQL 2000:

DECLARE @testTable TABLE 
( 
    VALUE   INT
)
--INSERT INTO @testTable -- Even Test
--SELECT 3 UNION ALL
--SELECT 5 UNION ALL
--SELECT 7 UNION ALL
--SELECT 12 UNION ALL
--SELECT 13 UNION ALL
--SELECT 14 UNION ALL
--SELECT 21 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 29 UNION ALL
--SELECT 40 UNION ALL
--SELECT 56

--
--INSERT INTO @testTable -- Odd Test
--SELECT 3 UNION ALL
--SELECT 5 UNION ALL
--SELECT 7 UNION ALL
--SELECT 12 UNION ALL
--SELECT 13 UNION ALL
--SELECT 14 UNION ALL
--SELECT 21 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 29 UNION ALL
--SELECT 39 UNION ALL
--SELECT 40 UNION ALL
--SELECT 56


DECLARE @RowAsc TABLE
(
    ID      INT IDENTITY,
    Amount  INT
)

INSERT INTO @RowAsc
SELECT  VALUE 
FROM    @testTable 
ORDER BY VALUE ASC

SELECT  AVG(amount)
FROM @RowAsc ra
WHERE ra.id IN
(
    SELECT  ID 
    FROM    @RowAsc
    WHERE   ra.id -
    (
        SELECT  MAX(id) / 2.0 
        FROM    @RowAsc
    ) BETWEEN 0 AND 1

)

其他回答

中找到

这是查找属性中值的最简单方法。

Select round(S.salary,4) median from employee S 
where (select count(salary) from station 
where salary < S.salary ) = (select count(salary) from station
where salary > S.salary)

在UDF中,写:

 Select Top 1 medianSortColumn from Table T
  Where (Select Count(*) from Table
         Where MedianSortColumn <
           (Select Count(*) From Table) / 2)
  Order By medianSortColumn

如果你想在SQL Server中使用创建聚合函数,这是如何做到的。这样做的好处是能够编写干净的查询。注意,这个过程可以很容易地计算一个百分位值。

创建一个新的Visual Studio项目,并将目标框架设置为。net 3.5(这是针对SQL 2008的,在SQL 2012中可能有所不同)。然后创建一个类文件,并放入以下代码或c#等效代码:

Imports Microsoft.SqlServer.Server
Imports System.Data.SqlTypes
Imports System.IO

<Serializable>
<SqlUserDefinedAggregate(Format.UserDefined, IsInvariantToNulls:=True, IsInvariantToDuplicates:=False, _
  IsInvariantToOrder:=True, MaxByteSize:=-1, IsNullIfEmpty:=True)>
Public Class Median
  Implements IBinarySerialize
  Private _items As List(Of Decimal)

  Public Sub Init()
    _items = New List(Of Decimal)()
  End Sub

  Public Sub Accumulate(value As SqlDecimal)
    If Not value.IsNull Then
      _items.Add(value.Value)
    End If
  End Sub

  Public Sub Merge(other As Median)
    If other._items IsNot Nothing Then
      _items.AddRange(other._items)
    End If
  End Sub

  Public Function Terminate() As SqlDecimal
    If _items.Count <> 0 Then
      Dim result As Decimal
      _items = _items.OrderBy(Function(i) i).ToList()
      If _items.Count Mod 2 = 0 Then
        result = ((_items((_items.Count / 2) - 1)) + (_items(_items.Count / 2))) / 2@
      Else
        result = _items((_items.Count - 1) / 2)
      End If

      Return New SqlDecimal(result)
    Else
      Return New SqlDecimal()
    End If
  End Function

  Public Sub Read(r As BinaryReader) Implements IBinarySerialize.Read
    'deserialize it from a string
    Dim list = r.ReadString()
    _items = New List(Of Decimal)

    For Each value In list.Split(","c)
      Dim number As Decimal
      If Decimal.TryParse(value, number) Then
        _items.Add(number)
      End If
    Next

  End Sub

  Public Sub Write(w As BinaryWriter) Implements IBinarySerialize.Write
    'serialize the list to a string
    Dim list = ""

    For Each item In _items
      If list <> "" Then
        list += ","
      End If      
      list += item.ToString()
    Next
    w.Write(list)
  End Sub
End Class

然后编译它,并将DLL和PDB文件复制到您的SQL Server机器,在SQL Server中运行以下命令:

CREATE ASSEMBLY CustomAggregate FROM '{path to your DLL}'
WITH PERMISSION_SET=SAFE;
GO

CREATE AGGREGATE Median(@value decimal(9, 3))
RETURNS decimal(9, 3) 
EXTERNAL NAME [CustomAggregate].[{namespace of your DLL}.Median];
GO

然后你可以写一个查询来计算中位数,就像这样: SELECT dbo.Median(Field) FROM Table

对于连续变量/测量'table1'中的'col1'

select col1  
from
    (select top 50 percent col1, 
    ROW_NUMBER() OVER(ORDER BY col1 ASC) AS Rowa,
    ROW_NUMBER() OVER(ORDER BY col1 DESC) AS Rowd
    from table1 ) tmp
where tmp.Rowa = tmp.Rowd

我最初的回答是:

select  max(my_column) as [my_column], quartile
from    (select my_column, ntile(4) over (order by my_column) as [quartile]
         from   my_table) i
--where quartile = 2
group by quartile

这将使您一举获得中位数和四分位范围。如果你真的只想要一行作为中值,那么取消注释where子句。

当你把它放入解释计划时,60%的工作是对数据进行排序,这在计算像这样的位置依赖统计数据时是不可避免的。

我修改了答案,以遵循Robert Ševčík-Robajz在下面的评论中提出的优秀建议:

;with PartitionedData as
  (select my_column, ntile(10) over (order by my_column) as [percentile]
   from   my_table),
MinimaAndMaxima as
  (select  min(my_column) as [low], max(my_column) as [high], percentile
   from    PartitionedData
   group by percentile)
select
  case
    when b.percentile = 10 then cast(b.high as decimal(18,2))
    else cast((a.low + b.high)  as decimal(18,2)) / 2
  end as [value], --b.high, a.low,
  b.percentile
from    MinimaAndMaxima a
  join  MinimaAndMaxima b on (a.percentile -1 = b.percentile) or (a.percentile = 10 and b.percentile = 10)
--where b.percentile = 5

当您有偶数个数据项时,这应该计算正确的中位数和百分比值。同样,如果您只想要中位数而不是整个百分位数分布,请取消最后的where子句的注释。