我如何才能最好地编写一个查询,从总共600k中随机选择10行?


当前回答

一个伟大的职位处理几个情况,从简单,到差距,到不均匀与差距。

http://jan.kneschke.de/projects/mysql/order-by-rand/

对于大多数一般情况,你可以这样做:

SELECT name
  FROM random AS r1 JOIN
       (SELECT CEIL(RAND() *
                     (SELECT MAX(id)
                        FROM random)) AS id)
        AS r2
 WHERE r1.id >= r2.id
 ORDER BY r1.id ASC
 LIMIT 1

这假设id的分布是相等的,并且id列表中可能存在间隙。有关更高级的示例,请参阅本文

其他回答

一个伟大的职位处理几个情况,从简单,到差距,到不均匀与差距。

http://jan.kneschke.de/projects/mysql/order-by-rand/

对于大多数一般情况,你可以这样做:

SELECT name
  FROM random AS r1 JOIN
       (SELECT CEIL(RAND() *
                     (SELECT MAX(id)
                        FROM random)) AS id)
        AS r2
 WHERE r1.id >= r2.id
 ORDER BY r1.id ASC
 LIMIT 1

这假设id的分布是相等的,并且id列表中可能存在间隙。有关更高级的示例,请参阅本文

它是非常简单的单行查询。

SELECT * FROM Table_Name ORDER BY RAND() LIMIT 0,10;

我改进了@Riedsio的答案。这是我在一个有间隙的大型均匀分布表上能找到的最有效的查询(测试从一个有> 2.6B行的表中获得1000个随机行)。

(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1)

让我来解释一下发生了什么。

(SELECT MAX(id) FROM table) 我在计算并保存最大值。对于非常大的表,每次需要一行时计算MAX(id)都会有轻微的开销 SELECT FLOOR(rand() * @max) + 1 as rand) 获取一个随机id SELECT id FROM table INNER JOIN(… 这就填补了空白。基本上,如果你在间隙中随机选择一个数字,它就会选择下一个id。假设间隙是均匀分布的,这应该不是问题。

进行联合可以帮助您将所有内容放入一个查询中,从而避免进行多个查询。它还可以节省计算MAX(id)的开销。根据您的应用程序,这可能非常重要,也可能无关紧要。

注意,这只获取id,并以随机顺序获取它们。如果你想做更高级的事情,我建议你这样做:

SELECT t.id, t.name -- etc, etc
FROM table t
INNER JOIN (
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1)
) x ON x.id = t.id
ORDER BY t.id

所有最好的答案都已经贴出来了(主要是那些引用了http://jan.kneschke.de/projects/mysql/order-by-rand/的链接)。

I want to pinpoint another speed-up possibility - caching. Think of why you need to get random rows. Probably you want display some random post or random ad on a website. If you are getting 100 req/s, is it really needed that each visitor gets random rows? Usually it is completely fine to cache these X random rows for 1 second (or even 10 seconds). It doesn't matter if 100 unique visitors in the same 1 second get the same random posts, because the next second another 100 visitors will get different set of posts.

当使用这种缓存时,你也可以使用一些较慢的解决方案来获取随机数据,因为不管你的req/s如何,它每秒只会从MySQL中获取一次。

我认为这是一个简单但更快的方法,我在现场服务器上测试了它,与上面的几个答案相比,它更快。

 SELECT * FROM `table_name` WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table_name` ) ORDER BY id LIMIT 30; 

//对一个130行的表花费0.0014秒

SELECT * FROM `table_name` WHERE 1 ORDER BY RAND() LIMIT 30

//对130行的表花费0.0042秒

 SELECT name
FROM random AS r1 JOIN
   (SELECT CEIL(RAND() *
                 (SELECT MAX(id)
                    FROM random)) AS id)
    AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 30

//对130行的表花费0.0040秒