USE AdventureWorks2008R2;
GO
SELECT SalesOrderID, ProductID, OrderQty
    ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total'
    ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg'
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count'
    ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min'
    ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max'
FROM Sales.SalesOrderDetail 
WHERE SalesOrderID IN(43659,43664);

我读过那个条款,我不明白为什么我需要它。 Over函数是做什么的?Partitioning By有什么作用? 为什么我不能写Group By SalesOrderID查询?


OVER子句的强大之处在于,无论是否使用GROUP BY,都可以在不同的范围内进行聚合(“窗口”)

示例:获取每个SalesOrderID的计数和所有计数

SELECT
    SalesOrderID, ProductID, OrderQty
    ,COUNT(OrderQty) AS 'Count'
    ,COUNT(*) OVER () AS 'CountAll'
FROM Sales.SalesOrderDetail 
WHERE
     SalesOrderID IN(43659,43664)
GROUP BY
     SalesOrderID, ProductID, OrderQty

获得不同的计数,没有GROUP BY

SELECT
    SalesOrderID, ProductID, OrderQty
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'CountQtyPerOrder'
    ,COUNT(OrderQty) OVER(PARTITION BY ProductID) AS 'CountQtyPerProduct',
    ,COUNT(*) OVER () AS 'CountAllAgain'
FROM Sales.SalesOrderDetail 
WHERE
     SalesOrderID IN(43659,43664)

当OVER子句与PARTITION BY结合使用时,表示前面的函数调用必须通过计算查询返回的行进行分析。可以把它看作内联的GROUP BY语句。

OVER (PARTITION BY SalesOrderID)表示对于SUM, AVG等…函数,返回值OVER查询返回记录的子集,并将该子集由外键SalesOrderID进行分区。

因此,我们将对每个UNIQUE SalesOrderID的每个OrderQty记录求和,并且该列名将被称为'Total'。

这是一种比使用多个内联视图查找相同信息更有效的方法。您可以将此查询放在一个内联视图中,然后在Total上进行筛选。

SELECT ...,
FROM (your query) inlineview
WHERE Total < 200

如果您只想通过SalesOrderID分组,那么您将无法在SELECT子句中包括ProductID和OrderQty列。

PARTITION BY子句可以分解聚合函数。一个明显而有用的例子是,如果你想为订单中的订单行生成行号:

SELECT
    O.order_id,
    O.order_date,
    ROW_NUMBER() OVER(PARTITION BY O.order_id) AS line_item_no,
    OL.product_id
FROM
    Orders O
INNER JOIN Order_Lines OL ON OL.order_id = O.order_id

(我的语法可能有点错误)

然后你会得到类似这样的东西:

order_id    order_date    line_item_no    product_id
--------    ----------    ------------    ----------
    1       2011-05-02         1              5
    1       2011-05-02         2              4
    1       2011-05-02         3              7
    2       2011-05-12         1              8
    2       2011-05-12         2              1

您可以使用GROUP BY SalesOrderID。区别在于,使用GROUP BY,您只能获得GROUP BY中不包括的列的聚合值。

相反,使用带窗口的聚合函数而不是GROUP BY,可以检索聚合值和非聚合值。也就是说,尽管您在示例查询中没有这样做,但您可以在相同的salesorderid组上检索单个OrderQty值及其总和、计数、平均值等。

下面是一个实际的例子,说明了窗口聚合的优点。假设您需要计算每个值占总数的百分比。如果没有窗口聚合,你必须首先派生一个聚合值列表,然后将其连接回原始行集,即如下所示:

SELECT
  orig.[Partition],
  orig.Value,
  orig.Value * 100.0 / agg.TotalValue AS ValuePercent
FROM OriginalRowset orig
  INNER JOIN (
    SELECT
      [Partition],
      SUM(Value) AS TotalValue
    FROM OriginalRowset
    GROUP BY [Partition]
  ) agg ON orig.[Partition] = agg.[Partition]

现在看看你如何对一个带窗口的聚合做同样的事情:

SELECT
  [Partition],
  Value,
  Value * 100.0 / SUM(Value) OVER (PARTITION BY [Partition]) AS ValuePercent
FROM OriginalRowset orig

更简单,更干净,不是吗?


让我用一个例子来解释,你就能明白它是如何工作的。

假设你有下面的表DIM_EQUIPMENT:

VIN         MAKE    MODEL   YEAR    COLOR
-----------------------------------------
1234ASDF    Ford    Taurus  2008    White
1234JKLM    Chevy   Truck   2005    Green
5678ASDF    Ford    Mustang 2008    Yellow

在SQL下面运行

SELECT VIN,
  MAKE,
  MODEL,
  YEAR,
  COLOR ,
  COUNT(*) OVER (PARTITION BY YEAR) AS COUNT2
FROM DIM_EQUIPMENT

结果如下所示

VIN         MAKE    MODEL   YEAR    COLOR     COUNT2
 ----------------------------------------------  
1234JKLM    Chevy   Truck   2005    Green     1
5678ASDF    Ford    Mustang 2008    Yellow    2
1234ASDF    Ford    Taurus  2008    White     2

看看发生了什么。

你能够计数没有组通过年和匹配ROW。

另一种有趣的方法是获得相同的结果,如果下面使用WITH子句,WITH作为内联视图,可以简化查询,特别是复杂的查询,但这里不是这样,因为我只是试图展示用法

 WITH EQ AS
  ( SELECT YEAR AS YEAR2, COUNT(*) AS COUNT2 FROM DIM_EQUIPMENT GROUP BY YEAR
  )
SELECT VIN,
  MAKE,
  MODEL,
  YEAR,
  COLOR,
  COUNT2
FROM DIM_EQUIPMENT,
  EQ
WHERE EQ.YEAR2=DIM_EQUIPMENT.YEAR;

也称为质疑请愿条款。 类似于Group By子句 将数据分解成块(或分区) 按分区界限分隔 在分区内执行函数 当越过分离边界时重新初始化

语法: 函数(…)(PARTITION BY col1 col3,…)

功能 熟悉的函数如COUNT(), SUM(), MIN(), MAX()等 以及新函数(例如ROW_NUMBER(), RATION_TO_REOIRT()等)

更多信息与示例:http://msdn.microsoft.com/en-us/library/ms189461.aspx


prkey   whatsthat               cash   
890    "abb                "   32  32
43     "abbz               "   2   34
4      "bttu               "   1   35
45     "gasstuff           "   2   37
545    "gasz               "   5   42
80009  "hoo                "   9   51
2321   "ibm                "   1   52
998    "krk                "   2   54
42     "kx-5010            "   2   56
32     "lto                "   4   60
543    "mp                 "   5   65
465    "multipower         "   2   67
455    "O.N.               "   1   68
7887   "prem               "   7   75
434    "puma               "   3   78
23     "retractble         "   3   81
242    "Trujillo's stuff   "   4   85

这是查询的结果。除了没有最后一列之外,用作源的表是相同的。这一列是第三个的移动和。

查询:

SELECT prkey,whatsthat,cash,SUM(cash) over (order by whatsthat)
    FROM public.iuk order by whatsthat,prkey
    ;

(表格格式为public.iuk)

sql version:  2012

这有点超过dbase(1986)的水平,我不知道为什么需要25年以上的时间来完成它。


简单来说: Over子句可用于选择非聚合值和聚合值。

Partition BY, ORDER BY within, ROWS or RANGE是OVER() BY子句的一部分。

Partition by是用来划分数据,然后执行这些窗口,聚合函数,如果我们没有分区,那么整个结果集被认为是一个单独的分区。

OVER子句可以用于排名函数(Rank, Row_Number, Dense_Rank..),聚合函数(AVG, Max, Min, SUM…等)和分析函数(First_Value, Last_Value和其他一些)。

让我们看看OVER子句的基本语法

OVER (   
       [ <PARTITION BY clause> ]  
       [ <ORDER BY clause> ]   
       [ <ROW or RANGE clause> ]  
      )  

分区: 用于对数据进行分区,并对具有相同数据的组进行操作。

命令: 它用于定义分区中数据的逻辑顺序。当我们不指定Partition时,整个结果集被认为是一个单独的分区

: 这可用于指定在执行操作时应该在分区中考虑哪些行。

让我们举个例子:

这是我的数据集:

Id          Name                                               Gender     Salary
----------- -------------------------------------------------- ---------- -----------
1           Mark                                               Male       5000
2           John                                               Male       4500
3           Pavan                                              Male       5000
4           Pam                                                Female     5500
5           Sara                                               Female     4000
6           Aradhya                                            Female     3500
7           Tom                                                Male       5500
8           Mary                                               Female     5000
9           Ben                                                Male       6500
10          Jodi                                               Female     7000
11          Tom                                                Male       5500
12          Ron                                                Male       5000

所以让我执行不同的场景,看看数据是如何受到影响的,我会从复杂的语法到简单的语法

Select *,SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        32000
1           Mark                                               Male       5000        32000
8           Mary                                               Female     5000        32000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        48500
7           Tom                                                Male       5500        48500
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000

Just observe the sum_sal part. Here I am using order by Salary and using "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW". In this case, we are not using partition so entire data will be treated as one partition and we are ordering on salary. And the important thing here is UNBOUNDED PRECEDING AND CURRENT ROW. This means when we are calculating the sum, from starting row to the current row for each row. But if we see rows with salary 5000 and name="Pavan", ideally it should be 17000 and for salary=5000 and name=Mark, it should be 22000. But as we are using RANGE and in this case, if it finds any similar elements then it considers them as the same logical group and performs an operation on them and assigns value to each item in that group. That is the reason why we have the same value for salary=5000. The engine went up to salary=5000 and Name=Ron and calculated sum and then assigned it to all salary=5000.

Select *,SUM(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees


   Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        17000
1           Mark                                               Male       5000        22000
8           Mary                                               Female     5000        27000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        37500
7           Tom                                                Male       5500        43000
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000

因此,对于ROWS BETWEEN UNBOUNDED precedand CURRENT ROW,区别是对于相同值的项而不是将它们分组在一起,它从起始行到当前行计算SUM,并且它不像RANGE那样不同地对待具有相同值的项

Select *,SUM(salary) Over(order by salary) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        32000
1           Mark                                               Male       5000        32000
8           Mary                                               Female     5000        32000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        48500
7           Tom                                                Male       5500        48500
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000

这些结果与

Select *, SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees

这是因为Over(按工资顺序)只是Over(按工资范围BETWEEN UNBOUNDED previous ROW AND CURRENT ROW)的一个简写。 因此,当我们简单地指定没有行或范围的顺序时,它将在UNBOUNDED previous和CURRENT ROW之间的范围作为默认值。

注意:这只适用于实际接受RANGE/ROW的函数。例如,ROW_NUMBER和其他一些不接受RANGE/ROW,在这种情况下,这就不存在了。

到目前为止,我们看到order by的Over子句采用的是Range/ROWS,语法看起来像这样的Range BETWEEN UNBOUNDED previous and CURRENT ROW 它实际上是从第一行开始一直计算到当前行。但是如果它想要计算整个数据分区的值,并为每一列(即从第一行到最后一行)计算值。这是对它的查询

Select *,sum(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1           Mark                                               Male       5000        62000
2           John                                               Male       4500        62000
3           Pavan                                              Male       5000        62000
4           Pam                                                Female     5500        62000
5           Sara                                               Female     4000        62000
6           Aradhya                                            Female     3500        62000
7           Tom                                                Male       5500        62000
8           Mary                                               Female     5000        62000
9           Ben                                                Male       6500        62000
10          Jodi                                               Female     7000        62000
11          Tom                                                Male       5500        62000
12          Ron                                                Male       5000        62000

我指定的不是CURRENT ROW,而是UNBOUNDED FOLLOWING,它指示引擎计算到每一行的最后一个分区记录。

现在回到你的观点,什么是OVER()空括号?

这只是Over的一个捷径(按无界前和无界后之间的工资行顺序)

在这里,我们间接指定将所有结果集视为单个分区,然后从每个分区的第一个记录到最后一个记录执行计算。

Select *,Sum(salary) Over() as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1           Mark                                               Male       5000        62000
2           John                                               Male       4500        62000
3           Pavan                                              Male       5000        62000
4           Pam                                                Female     5500        62000
5           Sara                                               Female     4000        62000
6           Aradhya                                            Female     3500        62000
7           Tom                                                Male       5500        62000
8           Mary                                               Female     5000        62000
9           Ben                                                Male       6500        62000
10          Jodi                                               Female     7000        62000
11          Tom                                                Male       5500        62000
12          Ron                                                Male       5000        62000

我确实做了一个关于这个的视频,如果你感兴趣,你可以访问它。 https://www.youtube.com/watch?v=CvVenuVUqto&t=1177s

谢谢, Pavan Kumar Aryasomayajulu HTTP://xyzcoder.github.io