我有一个表,我想获得每组的最新条目。下面是表格:

DocumentStatusLogs表

|ID| DocumentID | Status | DateCreated |
| 2| 1          | S1     | 7/29/2011   |
| 3| 1          | S2     | 7/30/2011   |
| 6| 1          | S1     | 8/02/2011   |
| 1| 2          | S1     | 7/28/2011   |
| 4| 2          | S2     | 7/30/2011   |
| 5| 2          | S3     | 8/01/2011   |
| 6| 3          | S1     | 8/02/2011   |

该表将按documententid分组,并按DateCreated降序排序。对于每个documententid,我希望获得最新的状态。

我的首选输出:

| DocumentID | Status | DateCreated |
| 1          | S1     | 8/02/2011   |
| 2          | S3     | 8/01/2011   |
| 3          | S1     | 8/02/2011   |

Is there any aggregate function to get only the top from each group? See pseudo-code GetOnlyTheTop below: SELECT DocumentID, GetOnlyTheTop(Status), GetOnlyTheTop(DateCreated) FROM DocumentStatusLogs GROUP BY DocumentID ORDER BY DateCreated DESC If such function doesn't exist, is there any way I can achieve the output I want? Or at the first place, could this be caused by unnormalized database? I'm thinking, since what I'm looking for is just one row, should that status also be located in the parent table?

更多信息请参见父表:

当前文档表

| DocumentID | Title  | Content  | DateCreated |
| 1          | TitleA | ...      | ...         |
| 2          | TitleB | ...      | ...         |
| 3          | TitleC | ...      | ...         |

父表应该是这样的,以便我可以轻松地访问它的状态吗?

| DocumentID | Title  | Content  | DateCreated | CurrentStatus |
| 1          | TitleA | ...      | ...         | s1            |
| 2          | TitleB | ...      | ...         | s3            |
| 3          | TitleC | ...      | ...         | s1            |

更新 我刚刚学会了如何使用“apply”,它可以更容易地解决这类问题。


当前回答

我的代码从每组中选择top 1

select a.* from #DocumentStatusLogs a where 
 datecreated in( select top 1 datecreated from #DocumentStatusLogs b
where 
a.documentid = b.documentid
order by datecreated desc
)

其他回答

我相信这是可以做到的。这可能需要一些调整,但你可以从组中选择最大值。

这些答案太夸张了。

SELECT
  d.DocumentID,
  MAX(d.Status),
  MAX(d1.DateCreated)
FROM DocumentStatusLogs d, DocumentStatusLogs d1
USING DocumentID
GROUP BY 1
ORDER BY 3 DESC

试试这个:

SELECT [DocumentID]
    ,[tmpRez].value('/x[2]', 'varchar(20)') AS [Status]
    ,[tmpRez].value('/x[3]', 'datetime') AS [DateCreated]
FROM (
    SELECT [DocumentID]
        ,cast('<x>' + max(cast([ID] AS VARCHAR(10)) + '</x><x>' + [Status] + '</x><x>' + cast([DateCreated] AS VARCHAR(20))) + '</x>' AS XML) AS [tmpRez]
    FROM DocumentStatusLogs
    GROUP BY DocumentID
    ) AS [tmpQry]

如果你担心性能问题,你也可以用MAX()这样做:

SELECT *
FROM DocumentStatusLogs D
WHERE DateCreated = (SELECT MAX(DateCreated) FROM DocumentStatusLogs WHERE ID = D.ID)

ROW_NUMBER()要求对SELECT语句中的所有行进行排序,而MAX则不需要。应该会大大加快你的查询速度。

我刚学会如何使用交叉应用。下面是如何在这种情况下使用它:

 select d.DocumentID, ds.Status, ds.DateCreated 
 from Documents as d 
 cross apply 
     (select top 1 Status, DateCreated
      from DocumentStatusLogs 
      where DocumentID = d.DocumentId
      order by DateCreated desc) as ds

一些数据库引擎*开始支持允许过滤窗口函数结果的qualifier子句(接受的答案使用该子句)。

所以公认的答案可以变成

SELECT *, ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
QUALIFY rn = 1

查看这篇文章以获得更深入的解释:https://jrandrews.net/the-joy-of-qualify

您可以使用此工具查看哪个数据库支持此子句:https://www.jooq.org/translate/ 当目标方言不支持qualifier子句时,可以选择转换它。

*Teradata, BigQuery, H2, Snowflake…