我如何才能最好地编写一个查询,从总共600k中随机选择10行?


当前回答

您可以轻松地使用带限制的随机偏移量

PREPARE stm from 'select * from table limit 10 offset ?';
SET @total = (select count(*) from table);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;

您还可以像这样应用where子句

PREPARE stm from 'select * from table where available=true limit 10 offset ?';
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;

在600,000行(700MB)表查询执行上的测试花费了大约0.016秒的硬盘驱动器时间。

EDIT:偏移量可能取接近表末尾的值,这将导致select语句返回更少的行(或者可能只有一行),为了避免这种情况,我们可以在声明偏移量后再次检查,如下所示

SET @rows_count = 10;
PREPARE stm from "select * from table where available=true limit ? offset ?";
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
SET @_offset = (SELECT IF(@total-@_offset<@rows_count,@_offset-@rows_count,@_offset));
SET @_offset = (SELECT IF(@_offset<0,0,@_offset));
EXECUTE stm using @rows_count,@_offset;

其他回答

一个伟大的职位处理几个情况,从简单,到差距,到不均匀与差距。

http://jan.kneschke.de/projects/mysql/order-by-rand/

对于大多数一般情况,你可以这样做:

SELECT name
  FROM random AS r1 JOIN
       (SELECT CEIL(RAND() *
                     (SELECT MAX(id)
                        FROM random)) AS id)
        AS r2
 WHERE r1.id >= r2.id
 ORDER BY r1.id ASC
 LIMIT 1

这假设id的分布是相等的,并且id列表中可能存在间隙。有关更高级的示例,请参阅本文

SELECT column FROM table
ORDER BY RAND()
LIMIT 10

这不是有效的解决方案,但确实有效

我是这样做的:

select * 
from table_with_600k_rows
where rand() < 10/600000
limit 10

我喜欢它,因为它不需要其他表,写起来很简单,执行起来非常快。

如果你只有一个读请求

将@redsio的答案与一个临时表结合起来(600K并不是很多):

DROP TEMPORARY TABLE IF EXISTS tmp_randorder;
CREATE TABLE tmp_randorder (id int(11) not null auto_increment primary key, data_id int(11));
INSERT INTO tmp_randorder (data_id) select id from datatable;

然后用一个@redsios的版本回答:

SELECT dt.*
FROM
       (SELECT (RAND() *
                     (SELECT MAX(id)
                        FROM tmp_randorder)) AS id)
        AS rnd
 INNER JOIN tmp_randorder rndo on rndo.id between rnd.id - 10 and rnd.id + 10
 INNER JOIN datatable AS dt on dt.id = rndo.data_id
 ORDER BY abs(rndo.id - rnd.id)
 LIMIT 1;

如果表比较大,可以先筛选第一部分:

INSERT INTO tmp_randorder (data_id) select id from datatable where rand() < 0.01;

如果你有很多读请求

Version: You could keep the table tmp_randorder persistent, call it datatable_idlist. Recreate that table in certain intervals (day, hour), since it also will get holes. If your table gets really big, you could also refill holes select l.data_id as whole from datatable_idlist l left join datatable dt on dt.id = l.data_id where dt.id is null; Version: Give your Dataset a random_sortorder column either directly in datatable or in a persistent extra table datatable_sortorder. Index that column. Generate a Random-Value in your Application (I'll call it $rand). select l.* from datatable l order by abs(random_sortorder - $rand) desc limit 1;

这个解决方案用最高和最低的random_sortorder来区分“边缘行”,所以在间隔中重新排列它们(一天一次)。

如果你想要一个随机记录(不管id之间是否有空隙):

PREPARE stmt FROM 'SELECT * FROM `table_name` LIMIT 1 OFFSET ?';
SET @count = (SELECT
        FLOOR(RAND() * COUNT(*))
    FROM `table_name`);

EXECUTE stmt USING @count;

来源:https://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/评论- 1266