简单来说:
Over子句可用于选择非聚合值和聚合值。
Partition BY, ORDER BY within, ROWS or RANGE是OVER() BY子句的一部分。
Partition by是用来划分数据,然后执行这些窗口,聚合函数,如果我们没有分区,那么整个结果集被认为是一个单独的分区。
OVER子句可以用于排名函数(Rank, Row_Number, Dense_Rank..),聚合函数(AVG, Max, Min, SUM…等)和分析函数(First_Value, Last_Value和其他一些)。
让我们看看OVER子句的基本语法
OVER (
[ <PARTITION BY clause> ]
[ <ORDER BY clause> ]
[ <ROW or RANGE clause> ]
)
分区:
用于对数据进行分区,并对具有相同数据的组进行操作。
命令:
它用于定义分区中数据的逻辑顺序。当我们不指定Partition时,整个结果集被认为是一个单独的分区
:
这可用于指定在执行操作时应该在分区中考虑哪些行。
让我们举个例子:
这是我的数据集:
Id Name Gender Salary
----------- -------------------------------------------------- ---------- -----------
1 Mark Male 5000
2 John Male 4500
3 Pavan Male 5000
4 Pam Female 5500
5 Sara Female 4000
6 Aradhya Female 3500
7 Tom Male 5500
8 Mary Female 5000
9 Ben Male 6500
10 Jodi Female 7000
11 Tom Male 5500
12 Ron Male 5000
所以让我执行不同的场景,看看数据是如何受到影响的,我会从复杂的语法到简单的语法
Select *,SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6 Aradhya Female 3500 3500
5 Sara Female 4000 7500
2 John Male 4500 12000
3 Pavan Male 5000 32000
1 Mark Male 5000 32000
8 Mary Female 5000 32000
12 Ron Male 5000 32000
11 Tom Male 5500 48500
7 Tom Male 5500 48500
4 Pam Female 5500 48500
9 Ben Male 6500 55000
10 Jodi Female 7000 62000
Just observe the sum_sal part. Here I am using order by Salary and using "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW".
In this case, we are not using partition so entire data will be treated as one partition and we are ordering on salary.
And the important thing here is UNBOUNDED PRECEDING AND CURRENT ROW. This means when we are calculating the sum, from starting row to the current row for each row.
But if we see rows with salary 5000 and name="Pavan", ideally it should be 17000 and for salary=5000 and name=Mark, it should be 22000. But as we are using RANGE and in this case, if it finds any similar elements then it considers them as the same logical group and performs an operation on them and assigns value to each item in that group. That is the reason why we have the same value for salary=5000. The engine went up to salary=5000 and Name=Ron and calculated sum and then assigned it to all salary=5000.
Select *,SUM(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6 Aradhya Female 3500 3500
5 Sara Female 4000 7500
2 John Male 4500 12000
3 Pavan Male 5000 17000
1 Mark Male 5000 22000
8 Mary Female 5000 27000
12 Ron Male 5000 32000
11 Tom Male 5500 37500
7 Tom Male 5500 43000
4 Pam Female 5500 48500
9 Ben Male 6500 55000
10 Jodi Female 7000 62000
因此,对于ROWS BETWEEN UNBOUNDED precedand CURRENT ROW,区别是对于相同值的项而不是将它们分组在一起,它从起始行到当前行计算SUM,并且它不像RANGE那样不同地对待具有相同值的项
Select *,SUM(salary) Over(order by salary) as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6 Aradhya Female 3500 3500
5 Sara Female 4000 7500
2 John Male 4500 12000
3 Pavan Male 5000 32000
1 Mark Male 5000 32000
8 Mary Female 5000 32000
12 Ron Male 5000 32000
11 Tom Male 5500 48500
7 Tom Male 5500 48500
4 Pam Female 5500 48500
9 Ben Male 6500 55000
10 Jodi Female 7000 62000
这些结果与
Select *, SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees
这是因为Over(按工资顺序)只是Over(按工资范围BETWEEN UNBOUNDED previous ROW AND CURRENT ROW)的一个简写。
因此,当我们简单地指定没有行或范围的顺序时,它将在UNBOUNDED previous和CURRENT ROW之间的范围作为默认值。
注意:这只适用于实际接受RANGE/ROW的函数。例如,ROW_NUMBER和其他一些不接受RANGE/ROW,在这种情况下,这就不存在了。
到目前为止,我们看到order by的Over子句采用的是Range/ROWS,语法看起来像这样的Range BETWEEN UNBOUNDED previous and CURRENT ROW
它实际上是从第一行开始一直计算到当前行。但是如果它想要计算整个数据分区的值,并为每一列(即从第一行到最后一行)计算值。这是对它的查询
Select *,sum(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1 Mark Male 5000 62000
2 John Male 4500 62000
3 Pavan Male 5000 62000
4 Pam Female 5500 62000
5 Sara Female 4000 62000
6 Aradhya Female 3500 62000
7 Tom Male 5500 62000
8 Mary Female 5000 62000
9 Ben Male 6500 62000
10 Jodi Female 7000 62000
11 Tom Male 5500 62000
12 Ron Male 5000 62000
我指定的不是CURRENT ROW,而是UNBOUNDED FOLLOWING,它指示引擎计算到每一行的最后一个分区记录。
现在回到你的观点,什么是OVER()空括号?
这只是Over的一个捷径(按无界前和无界后之间的工资行顺序)
在这里,我们间接指定将所有结果集视为单个分区,然后从每个分区的第一个记录到最后一个记录执行计算。
Select *,Sum(salary) Over() as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1 Mark Male 5000 62000
2 John Male 4500 62000
3 Pavan Male 5000 62000
4 Pam Female 5500 62000
5 Sara Female 4000 62000
6 Aradhya Female 3500 62000
7 Tom Male 5500 62000
8 Mary Female 5000 62000
9 Ben Male 6500 62000
10 Jodi Female 7000 62000
11 Tom Male 5500 62000
12 Ron Male 5000 62000
我确实做了一个关于这个的视频,如果你感兴趣,你可以访问它。
https://www.youtube.com/watch?v=CvVenuVUqto&t=1177s
谢谢,
Pavan Kumar Aryasomayajulu
HTTP://xyzcoder.github.io