查询:

SELECT
year, id, rate
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC

收益率:

year    id  rate
2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2009    p01 4.4
2002    p01 3.9
2004    p01 3.5
2005    p01 2.1
2000    p01 0.8
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7
2006    p02 4.6
2007    p02 3.3

我想要的是每个id只有前5个结果:

2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7

是否有一种方法来做到这一点,使用一些限制之类的修饰符,在GROUP BY中工作?


当前回答

对于那些像我一样有查询超时的人。我做了下面的限制和任何其他由特定的组。

DELIMITER $$
CREATE PROCEDURE count_limit200()
BEGIN
    DECLARE a INT Default 0;
    DECLARE stop_loop INT Default 0;
    DECLARE domain_val VARCHAR(250);
    DECLARE domain_list CURSOR FOR SELECT DISTINCT domain FROM db.one;

    OPEN domain_list;

    SELECT COUNT(DISTINCT(domain)) INTO stop_loop 
    FROM db.one;
    -- BEGIN LOOP
    loop_thru_domains: LOOP
        FETCH domain_list INTO domain_val;
        SET a=a+1;

        INSERT INTO db.two(book,artist,title,title_count,last_updated) 
        SELECT * FROM 
        (
            SELECT book,artist,title,COUNT(ObjectKey) AS titleCount, NOW() 
            FROM db.one 
            WHERE book = domain_val
            GROUP BY artist,title
            ORDER BY book,titleCount DESC
            LIMIT 200
        ) a ON DUPLICATE KEY UPDATE title_count = titleCount, last_updated = NOW();

        IF a = stop_loop THEN
            LEAVE loop_thru_domain;
        END IF;
    END LOOP loop_thru_domain;
END $$

它循环遍历一个域列表,然后每个域只插入200个限制

其他回答

试试这个:

SET @num := 0, @type := '';
SELECT `year`, `id`, `rate`,
    @num := if(@type = `id`, @num + 1, 1) AS `row_number`,
    @type := `id` AS `dummy`
FROM (
    SELECT *
    FROM `h`
    WHERE (
        `year` BETWEEN '2000' AND '2009'
        AND `id` IN (SELECT `rid` FROM `table2`) AS `temp_rid`
    )
    ORDER BY `id`
) AS `temph`
GROUP BY `year`, `id`, `rate`
HAVING `row_number`<='5'
ORDER BY `id`, `rate DESC;

你可以使用GROUP_CONCAT聚合函数将所有年份放到一个列中,按id分组,按速率排序:

SELECT   id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM     yourtable
GROUP BY id

结果:

-----------------------------------------------------------
|  ID | GROUPED_YEAR                                      |
-----------------------------------------------------------
| p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 |
| p02 | 2001,2004,2002,2003,2000,2006,2007                |
-----------------------------------------------------------

然后你可以使用FIND_IN_SET,它返回第一个参数在第二个参数中的位置,例如。

SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
1

SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
6

使用GROUP_CONCAT和FIND_IN_SET的组合,并通过FIND_IN_SET返回的位置进行过滤,然后您可以使用这个查询,它只返回每个id的前5年:

SELECT
  yourtable.*
FROM
  yourtable INNER JOIN (
    SELECT
      id,
      GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
    FROM
      yourtable
    GROUP BY id) group_max
  ON yourtable.id = group_max.id
     AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5
ORDER BY
  yourtable.id, yourtable.year DESC;

请看这里的小提琴。

请注意,如果多个行可以具有相同的速率,则应该考虑在速率列上使用GROUP_CONCAT(DISTINCT速率ORDER BY速率),而不是在年列上使用GROUP_CONCAT。

GROUP_CONCAT返回的字符串的最大长度是有限的,因此如果您需要为每个组选择一些记录,那么这种方法可以很好地工作。

对于那些像我一样有查询超时的人。我做了下面的限制和任何其他由特定的组。

DELIMITER $$
CREATE PROCEDURE count_limit200()
BEGIN
    DECLARE a INT Default 0;
    DECLARE stop_loop INT Default 0;
    DECLARE domain_val VARCHAR(250);
    DECLARE domain_list CURSOR FOR SELECT DISTINCT domain FROM db.one;

    OPEN domain_list;

    SELECT COUNT(DISTINCT(domain)) INTO stop_loop 
    FROM db.one;
    -- BEGIN LOOP
    loop_thru_domains: LOOP
        FETCH domain_list INTO domain_val;
        SET a=a+1;

        INSERT INTO db.two(book,artist,title,title_count,last_updated) 
        SELECT * FROM 
        (
            SELECT book,artist,title,COUNT(ObjectKey) AS titleCount, NOW() 
            FROM db.one 
            WHERE book = domain_val
            GROUP BY artist,title
            ORDER BY book,titleCount DESC
            LIMIT 200
        ) a ON DUPLICATE KEY UPDATE title_count = titleCount, last_updated = NOW();

        IF a = stop_loop THEN
            LEAVE loop_thru_domain;
        END IF;
    END LOOP loop_thru_domain;
END $$

它循环遍历一个域列表,然后每个域只插入200个限制

不,你不能任意地限制子查询(你可以在较新的mysql中有限地这样做,但不能每组5个结果)。

这是一个分组最大类型查询,在SQL中执行起来并不简单。在某些情况下,有很多更有效的方法来解决这个问题,但对于一般的top-n,你会想看看Bill对前面一个类似问题的答案。

与此问题的大多数解决方案一样,如果有多行具有相同的速率值,它可以返回超过5行,因此您可能仍然需要大量的后处理来检查这一点。

试试这个:

SELECT h.year, h.id, h.rate 
FROM (SELECT h.year, h.id, h.rate, IF(@lastid = (@lastid:=h.id), @index:=@index+1, @index:=0) indx 
      FROM (SELECT h.year, h.id, h.rate 
            FROM h
            WHERE h.year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2)
            GROUP BY id, h.year
            ORDER BY id, rate DESC
            ) h, (SELECT @lastid:='', @index:=0) AS a
    ) h 
WHERE h.indx <= 5;