得到每组的前1行

我有一个表，我想获得每组的最新条目。下面是表格:

DocumentStatusLogs表

|ID| DocumentID | Status | DateCreated |
| 2| 1          | S1     | 7/29/2011   |
| 3| 1          | S2     | 7/30/2011   |
| 6| 1          | S1     | 8/02/2011   |
| 1| 2          | S1     | 7/28/2011   |
| 4| 2          | S2     | 7/30/2011   |
| 5| 2          | S3     | 8/01/2011   |
| 6| 3          | S1     | 8/02/2011   |

该表将按documententid分组，并按DateCreated降序排序。对于每个documententid，我希望获得最新的状态。

我的首选输出:

| DocumentID | Status | DateCreated |
| 1          | S1     | 8/02/2011   |
| 2          | S3     | 8/01/2011   |
| 3          | S1     | 8/02/2011   |

Is there any aggregate function to get only the top from each group? See pseudo-code GetOnlyTheTop below: SELECT DocumentID, GetOnlyTheTop(Status), GetOnlyTheTop(DateCreated) FROM DocumentStatusLogs GROUP BY DocumentID ORDER BY DateCreated DESC If such function doesn't exist, is there any way I can achieve the output I want? Or at the first place, could this be caused by unnormalized database? I'm thinking, since what I'm looking for is just one row, should that status also be located in the parent table?

更多信息请参见父表:

当前文档表

| DocumentID | Title  | Content  | DateCreated |
| 1          | TitleA | ...      | ...         |
| 2          | TitleB | ...      | ...         |
| 3          | TitleC | ...      | ...         |

父表应该是这样的，以便我可以轻松地访问它的状态吗?

| DocumentID | Title  | Content  | DateCreated | CurrentStatus |
| 1          | TitleA | ...      | ...         | s1            |
| 2          | TitleB | ...      | ...         | s3            |
| 3          | TitleC | ...      | ...         | s1            |

更新我刚刚学会了如何使用“apply”，它可以更容易地解决这类问题。

当前回答

;WITH cte AS
(
   SELECT *,
         ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
   FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1

如果您希望每天有2个条目，那么这将任意选择一个。要获得一天的两个条目，请使用DENSE_RANK代替

至于是否正常化，这取决于你是否想:

在两个地方保持状态保存状态历史 .．.

目前，您保留了状态历史。如果你也想在父表中保持最新状态(这是非规范化的)，你需要一个触发器来维持父表中的“状态”。或者删除这个状态历史表。

2011-07-27 08:44:10

其他回答

我知道这是一个旧的线程，但TOP 1与TIES解决方案是相当不错的，可能有助于阅读一些解决方案。

select top 1 with ties
   DocumentID
  ,Status
  ,DateCreated
from DocumentStatusLogs
order by row_number() over (partition by DocumentID order by DateCreated desc)

select top 1 with ties子句告诉SQL Server要返回每个组的第一行。但是SQL Server如何知道如何对数据进行分组呢?这就是按row_number()的顺序除以(按documententid分区的顺序除以DateCreated desc的顺序。分区后的列定义了SQL Server如何对数据进行分组。在每个组中，行将根据列的顺序进行排序。排序之后，查询中将返回每个组中的第一行。

关于TOP子句的更多信息可以在这里找到。

2018-01-24 00:14:52

;WITH cte AS
(
   SELECT *,
         ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
   FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1

如果您希望每天有2个条目，那么这将任意选择一个。要获得一天的两个条目，请使用DENSE_RANK代替

至于是否正常化，这取决于你是否想:

在两个地方保持状态保存状态历史 .．.

2011-07-27 08:44:10

在你想避免使用row_count()的场景中，你也可以使用左连接:

select ds.DocumentID, ds.Status, ds.DateCreated 
from DocumentStatusLogs ds
left join DocumentStatusLogs filter 
    ON ds.DocumentID = filter.DocumentID
    -- Match any row that has another row that was created after it.
    AND ds.DateCreated < filter.DateCreated
-- then filter out any rows that matched 
where filter.DocumentID is null

对于示例模式，你也可以使用"not in subquery"，它通常编译到与左连接相同的输出:

select ds.DocumentID, ds.Status, ds.DateCreated 
from DocumentStatusLogs ds
WHERE ds.ID NOT IN (
    SELECT filter.ID 
    FROM DocumentStatusLogs filter
    WHERE ds.DocumentID = filter.DocumentID
        AND ds.DateCreated < filter.DateCreated)

注意，如果表没有至少一个单列唯一键/约束/索引(在本例中是主键“Id”)，那么子查询模式将无法工作。

这两个查询往往比row_count()查询(由query Analyzer衡量)更“昂贵”。但是，您可能会遇到它们更快地返回结果或启用其他优化的情况。

2012-09-04 20:47:23

这里有3种不同的方法来解决这个问题，以及为每个查询建立索引的最佳选择(请自己尝试索引，并查看逻辑读取、消耗时间和执行计划。我根据自己的经验提供了关于此类查询的建议，但没有针对这个特定问题执行)。

方法1:使用ROW_NUMBER()。如果rowstore索引不能提高性能，对于具有聚合和分组的查询以及始终按不同列排序的表，可以尝试使用非聚集/聚集的columnstore索引，columnstore索引通常是最佳选择。

;WITH CTE AS
    (
       SELECT   *,
                RN = ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
       FROM     DocumentStatusLogs
    )
    SELECT  ID      
        ,DocumentID 
        ,Status     
        ,DateCreated
    FROM    CTE
    WHERE   RN = 1;

方法2:使用FIRST_VALUE。如果rowstore索引不能提高性能，对于具有聚合和分组的查询以及始终按不同列排序的表，可以尝试使用非聚集/聚集的columnstore索引，columnstore索引通常是最佳选择。

SELECT  DISTINCT
    ID      = FIRST_VALUE(ID) OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
    ,DocumentID
    ,Status     = FIRST_VALUE(Status) OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
    ,DateCreated    = FIRST_VALUE(DateCreated) OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
FROM    DocumentStatusLogs;

方法3:使用CROSS APPLY。在DocumentStatusLogs表上创建涵盖查询中使用的列的行存储索引应该足以覆盖查询，而不需要columnstore索引。

SELECT  DISTINCT
    ID      = CA.ID
    ,DocumentID = D.DocumentID
    ,Status     = CA.Status 
    ,DateCreated    = CA.DateCreated
FROM    DocumentStatusLogs D
    CROSS APPLY (
            SELECT  TOP 1 I.*
            FROM    DocumentStatusLogs I
            WHERE   I.DocumentID = D.DocumentID
            ORDER   BY I.DateCreated DESC
            ) CA;

2019-06-17 05:18:18

我相信这是可以做到的。这可能需要一些调整，但你可以从组中选择最大值。

这些答案太夸张了。

SELECT
  d.DocumentID,
  MAX(d.Status),
  MAX(d1.DateCreated)
FROM DocumentStatusLogs d, DocumentStatusLogs d1
USING DocumentID
GROUP BY 1
ORDER BY 3 DESC

2020-03-25 13:12:39

得到每组的前1行

推荐文章

最新文章

标签