检索每个组中的最后一条记录- MySQL

有一个表消息，其中包含如下所示的数据:

Id   Name   Other_Columns
-------------------------
1    A       A_data_1
2    A       A_data_2
3    A       A_data_3
4    B       B_data_1
5    B       B_data_2
6    C       C_data_1

如果我按名称从消息组中运行查询select *，我将得到如下结果:

1    A       A_data_1
4    B       B_data_1
6    C       C_data_1

哪个查询将返回以下结果?

3    A       A_data_3
5    B       B_data_2
6    C       C_data_1

也就是说，应该返回每个组中的最后一条记录。

目前，这是我使用的查询:

SELECT
  *
FROM (SELECT
  *
FROM messages
ORDER BY id DESC) AS x
GROUP BY name

但这看起来效率很低。还有其他方法可以达到同样的效果吗?

当前回答

UPD: 2017-03-31, MySQL 5.7.5版本默认启用了ONLY_FULL_GROUP_BY开关(因此，不确定的GROUP by查询被禁用)。此外，他们更新了GROUP BY实现，即使禁用了开关，解决方案也可能不再像预期的那样工作。我们需要检查一下。

Bill Karwin的上述解决方案在组内的项目计数相当小时工作得很好，但当组相当大时查询性能就会变差，因为该解决方案只需要n*n/2 + n/2个is NULL比较。

我在一个包含18684446行和1182个组的InnoDB表上进行了测试。该表包含功能测试的测试结果，并将(test_id, request_id)作为主键。因此，test_id是一个组，我正在为每个test_id搜索最后的request_id。

Bill的解决方案已经在我的dell e4310上运行了几个小时，我不知道它什么时候会完成，即使它在覆盖索引上运行(因此在EXPLAIN中使用索引)。

基于同样的想法，我有一些其他的解决方案:

if the underlying index is BTREE index (which is usually the case), the largest (group_id, item_value) pair is the last value within each group_id, that is the first for each group_id if we walk through the index in descending order; if we read the values which are covered by an index, the values are read in the order of the index; each index implicitly contains primary key columns appended to that (that is the primary key is in the coverage index). In solutions below I operate directly on the primary key, in you case, you will just need to add primary key columns in the result. in many cases it is much cheaper to collect the required row ids in the required order in a subquery and join the result of the subquery on the id. Since for each row in the subquery result MySQL will need a single fetch based on primary key, the subquery will be put first in the join and the rows will be output in the order of the ids in the subquery (if we omit explicit ORDER BY for the join)

MySQL使用索引的3种方式是一篇了解一些细节的好文章。

解决方案1

这个是非常快的，在我的18M+行上大约需要0.8秒:

SELECT test_id, MAX(request_id) AS request_id
FROM testresults
GROUP BY test_id DESC;

如果你想改变顺序为ASC，把它放在一个子查询中，只返回id，并使用它作为子查询连接到其他列:

SELECT test_id, request_id
FROM (
    SELECT test_id, MAX(request_id) AS request_id
    FROM testresults
    GROUP BY test_id DESC) as ids
ORDER BY test_id;

在我的数据上，这个大约需要1,2秒。

解决方案2

下面是另一个解决方案，对我的表来说大约需要19秒:

SELECT test_id, request_id
FROM testresults, (SELECT @group:=NULL) as init
WHERE IF(IFNULL(@group, -1)=@group:=test_id, 0, 1)
ORDER BY test_id DESC, request_id DESC

它还按降序返回测试。它的速度要慢得多，因为它做了一个完整的索引扫描，但它在这里给你一个想法，如何为每个组输出最多N行。

查询的缺点是查询缓存不能缓存其结果。

2012-01-06 11:21:09

其他回答

使用子查询返回正确的分组，因为您已经完成了一半。

试试这个:

select
    a.*
from
    messages a
    inner join 
        (select name, max(id) as maxid from messages group by name) as b on
        a.id = b.maxid

如果它不是id，你想要的最大值:

select
    a.*
from
    messages a
    inner join 
        (select name, max(other_col) as other_col 
         from messages group by name) as b on
        a.name = b.name
        and a.other_col = b.other_col

通过这种方式，可以避免在子查询中进行相关子查询和/或排序，这往往非常缓慢/低效。

2009-08-21 17:06:42

你可以通过计数来分组，也可以得到分组的最后一项，比如:

SELECT 
    user,
    COUNT(user) AS count,
    MAX(id) as last
FROM request 
GROUP BY user

2019-04-07 06:26:53

Hi @Vijay Dev如果你的表消息包含Id，这是自动增加主键，然后在主键上获取最新的记录，你的查询应该如下所示:

SELECT m1.* FROM messages m1 INNER JOIN (SELECT max(Id) as lastmsgId FROM messages GROUP BY Name) m2 ON m1.Id=m2.lastmsgId

2014-10-21 14:08:16

希望以下Oracle查询能有所帮助:

WITH Temp_table AS
(
    Select id, name, othercolumns, ROW_NUMBER() over (PARTITION BY name ORDER BY ID 
    desc)as rank from messages
)
Select id, name,othercolumns from Temp_table where rank=1

2020-01-15 06:07:02

如果您真正关心的是性能，则可以在表上引入一个名为IsLastInGroup的类型为BIT的新列。

在最后的列上设置为true，并在每一行插入/更新/删除时保持该值。写的速度会变慢，但读的时候会受益。这取决于您的用例，我只建议在以读取为重点的情况下使用它。

因此，您的查询将如下所示:

SELECT * FROM Messages WHERE IsLastInGroup = 1

2018-05-02 15:05:59

检索每个组中的最后一条记录- MySQL

推荐文章

最新文章

标签