我如何才能最好地编写一个查询,从总共600k中随机选择10行?


当前回答

我认为这是一个简单但更快的方法,我在现场服务器上测试了它,与上面的几个答案相比,它更快。

 SELECT * FROM `table_name` WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table_name` ) ORDER BY id LIMIT 30; 

//对一个130行的表花费0.0014秒

SELECT * FROM `table_name` WHERE 1 ORDER BY RAND() LIMIT 30

//对130行的表花费0.0042秒

 SELECT name
FROM random AS r1 JOIN
   (SELECT CEIL(RAND() *
                 (SELECT MAX(id)
                    FROM random)) AS id)
    AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 30

//对130行的表花费0.0040秒

其他回答

我是这样做的:

select * 
from table_with_600k_rows
where rand() < 10/600000
limit 10

我喜欢它,因为它不需要其他表,写起来很简单,执行起来非常快。

我使用了Riedsio发布的http://jan.kneschke.de/projects/mysql/order-by-rand/(我使用了返回一个或多个随机值的存储过程的情况):

   DROP TEMPORARY TABLE IF EXISTS rands;
   CREATE TEMPORARY TABLE rands ( rand_id INT );

    loop_me: LOOP
        IF cnt < 1 THEN
          LEAVE loop_me;
        END IF;

        INSERT INTO rands
           SELECT r1.id
             FROM random AS r1 JOIN
                  (SELECT (RAND() *
                                (SELECT MAX(id)
                                   FROM random)) AS id)
                   AS r2
            WHERE r1.id >= r2.id
            ORDER BY r1.id ASC
            LIMIT 1;

        SET cnt = cnt - 1;
      END LOOP loop_me;

在这篇文章中,他通过维护一个表(使用触发器等)解决了id中的间隙导致不那么随机的结果的问题。参见文章); 我通过向表中添加另一列来解决这个问题,用连续的数字填充,从1开始(编辑:此列添加到运行时由子查询创建的临时表中,不影响永久表):

   DROP TEMPORARY TABLE IF EXISTS rands;
   CREATE TEMPORARY TABLE rands ( rand_id INT );

    loop_me: LOOP
        IF cnt < 1 THEN
          LEAVE loop_me;
        END IF;

        SET @no_gaps_id := 0;

        INSERT INTO rands
           SELECT r1.id
             FROM (SELECT id, @no_gaps_id := @no_gaps_id + 1 AS no_gaps_id FROM random) AS r1 JOIN
                  (SELECT (RAND() *
                                (SELECT COUNT(*)
                                   FROM random)) AS id)
                   AS r2
            WHERE r1.no_gaps_id >= r2.id
            ORDER BY r1.no_gaps_id ASC
            LIMIT 1;

        SET cnt = cnt - 1;
      END LOOP loop_me;

在文章中,我可以看到他花了很大的精力来优化代码;我不知道我的改变是否/有多大影响性能,但对我来说非常好。

所有最好的答案都已经贴出来了(主要是那些引用了http://jan.kneschke.de/projects/mysql/order-by-rand/的链接)。

I want to pinpoint another speed-up possibility - caching. Think of why you need to get random rows. Probably you want display some random post or random ad on a website. If you are getting 100 req/s, is it really needed that each visitor gets random rows? Usually it is completely fine to cache these X random rows for 1 second (or even 10 seconds). It doesn't matter if 100 unique visitors in the same 1 second get the same random posts, because the next second another 100 visitors will get different set of posts.

当使用这种缓存时,你也可以使用一些较慢的解决方案来获取随机数据,因为不管你的req/s如何,它每秒只会从MySQL中获取一次。

如果你想要一个随机记录(不管id之间是否有空隙):

PREPARE stmt FROM 'SELECT * FROM `table_name` LIMIT 1 OFFSET ?';
SET @count = (SELECT
        FLOOR(RAND() * COUNT(*))
    FROM `table_name`);

EXECUTE stmt USING @count;

来源:https://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/评论- 1266

SELECT
  * 
FROM
  table_with_600k_rows
WHERE
  RAND( ) 
ORDER BY
  id DESC 
LIMIT 30;

Id是主键,按Id排序, 解释table_with_600k_rows,发现该行不扫描整个表