本文将介绍ROW_NUMBER()和DENSE_RANK()之间的有趣关系(RANK()函数没有特别处理)。当您需要在SELECT DISTINCT语句上生成ROW_NUMBER()时,ROW_NUMBER()将在被DISTINCT关键字删除之前生成不同的值。例如这个查询
SELECT DISTINCT
v,
ROW_NUMBER() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number
... 可能会产生这种结果(DISTINCT没有影响):
+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a | 1 |
| a | 2 |
| a | 3 |
| b | 4 |
| c | 5 |
| c | 6 |
| d | 7 |
| e | 8 |
+---+------------+
鉴于此查询:
SELECT DISTINCT
v,
DENSE_RANK() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number
... 在这种情况下产生你可能想要的:
+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a | 1 |
| b | 2 |
| c | 3 |
| d | 4 |
| e | 5 |
+---+------------+
注意,DENSE_RANK()函数的ORDER BY子句需要SELECT DISTINCT子句中的所有其他列才能正常工作。
这样做的原因是,从逻辑上讲,窗口函数是在应用DISTINCT之前计算的。
这三种功能的比较
使用PostgreSQL / Sybase / SQL标准语法(WINDOW子句):
SELECT
v,
ROW_NUMBER() OVER (window) row_number,
RANK() OVER (window) rank,
DENSE_RANK() OVER (window) dense_rank
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v
... 你会得到:
+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 4 | 4 | 2 |
| c | 5 | 5 | 3 |
| c | 6 | 5 | 3 |
| d | 7 | 7 | 4 |
| e | 8 | 8 | 5 |
+---+------------+------+------------+