MySQL快速从600K行中随机选择10行

我如何才能最好地编写一个查询，从总共600k中随机选择10行?

当前回答

SELECT column FROM table
ORDER BY RAND()
LIMIT 10

这不是有效的解决方案，但确实有效

2012-10-13 06:43:40

其他回答

这里有一个改变游戏规则的方法，可能对许多人有帮助;

我有一个有200k行的表，有连续的id，我需要选择N个随机行，所以我选择根据表中最大的id生成随机值，我创建了这个脚本来找出哪个是最快的操作:

logTime();
query("SELECT COUNT(id) FROM tbl");
logTime();
query("SELECT MAX(id) FROM tbl");
logTime();
query("SELECT id FROM tbl ORDER BY id DESC LIMIT 1");
logTime();

结果如下:

计数:36.8418693542479毫秒 Max: 0.241041183472 ms 订单:0.216960906982毫秒

根据这个结果，order desc是得到最大id的最快操作，以下是我对这个问题的回答:

SELECT GROUP_CONCAT(n SEPARATOR ',') g FROM (
    SELECT FLOOR(RAND() * (
        SELECT id FROM tbl ORDER BY id DESC LIMIT 1
    )) n FROM tbl LIMIT 10) a

...
SELECT * FROM tbl WHERE id IN ($result);

供您参考:从一个200k表中随机获得10行，我花了1.78 ms(包括php方面的所有操作)

2015-05-15 11:05:58

所有最好的答案都已经贴出来了(主要是那些引用了http://jan.kneschke.de/projects/mysql/order-by-rand/的链接)。

I want to pinpoint another speed-up possibility - caching. Think of why you need to get random rows. Probably you want display some random post or random ad on a website. If you are getting 100 req/s, is it really needed that each visitor gets random rows? Usually it is completely fine to cache these X random rows for 1 second (or even 10 seconds). It doesn't matter if 100 unique visitors in the same 1 second get the same random posts, because the next second another 100 visitors will get different set of posts.

当使用这种缓存时，你也可以使用一些较慢的解决方案来获取随机数据，因为不管你的req/s如何，它每秒只会从MySQL中获取一次。

2015-07-07 13:52:36

SELECT
  * 
FROM
  table_with_600k_rows
WHERE
  RAND( ) 
ORDER BY
  id DESC 
LIMIT 30;

Id是主键，按Id排序，解释table_with_600k_rows，发现该行不扫描整个表

2021-08-02 08:30:37

我需要一个查询从一个相当大的表中返回大量随机行。这是我想到的。首先获取最大记录id:

SELECT MAX(id) FROM table_name;

然后将该值代入:

SELECT * FROM table_name WHERE id > FLOOR(RAND() * max) LIMIT n;

Where max is the maximum record id in the table and n is the number of rows you want in your result set. The assumption is that there are no gaps in the record id's although I doubt it would affect the result if there were (haven't tried it though). I also created this stored procedure to be more generic; pass in the table name and number of rows to be returned. I'm running MySQL 5.5.38 on Windows 2008, 32GB, dual 3GHz E5450, and on a table with 17,361,264 rows it's fairly consistent at ~.03 sec / ~11 sec to return 1,000,000 rows. (times are from MySQL Workbench 6.1; you could also use CEIL instead of FLOOR in the 2nd select statement depending on your preference)

DELIMITER $$

USE [schema name] $$

DROP PROCEDURE IF EXISTS `random_rows` $$

CREATE PROCEDURE `random_rows`(IN tab_name VARCHAR(64), IN num_rows INT)
BEGIN

SET @t = CONCAT('SET @max=(SELECT MAX(id) FROM ',tab_name,')');
PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

SET @t = CONCAT(
    'SELECT * FROM ',
    tab_name,
    ' WHERE id>FLOOR(RAND()*@max) LIMIT ',
    num_rows);

PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END
$$

then

CALL [schema name].random_rows([table name], n);

2014-09-24 13:47:28

如果有一个自动生成的id，我发现一个很好的方法是使用模运算符'%'。例如，如果您需要70,000条随机记录中的10,000条，您可以简化为每7行中需要1行。这可以在这个查询中简化:

SELECT * FROM 
    table 
WHERE 
    id % 
    FLOOR(
        (SELECT count(1) FROM table) 
        / 10000
    ) = 0;

如果目标行除以total available的结果不是一个整数，那么你将得到比你要求的更多的行，所以你应该添加一个LIMIT子句来帮助你像这样修剪结果集:

SELECT * FROM 
    table 
WHERE 
    id % 
    FLOOR(
        (SELECT count(1) FROM table) 
        / 10000
    ) = 0
LIMIT 10000;

这确实需要一个完整的扫描，但它比ORDER BY RAND更快，在我看来，比本文中提到的其他选项更容易理解。另外，如果写入数据库的系统批量创建了一组行，你可能不会得到你所期望的随机结果。

2016-06-22 13:22:41

MySQL快速从600K行中随机选择10行

推荐文章

最新文章

标签