我如何才能最好地编写一个查询,从总共600k中随机选择10行?


当前回答

它是非常简单的单行查询。

SELECT * FROM Table_Name ORDER BY RAND() LIMIT 0,10;

其他回答

您可以轻松地使用带限制的随机偏移量

PREPARE stm from 'select * from table limit 10 offset ?';
SET @total = (select count(*) from table);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;

您还可以像这样应用where子句

PREPARE stm from 'select * from table where available=true limit 10 offset ?';
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;

在600,000行(700MB)表查询执行上的测试花费了大约0.016秒的硬盘驱动器时间。

EDIT:偏移量可能取接近表末尾的值,这将导致select语句返回更少的行(或者可能只有一行),为了避免这种情况,我们可以在声明偏移量后再次检查,如下所示

SET @rows_count = 10;
PREPARE stm from "select * from table where available=true limit ? offset ?";
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
SET @_offset = (SELECT IF(@total-@_offset<@rows_count,@_offset-@rows_count,@_offset));
SET @_offset = (SELECT IF(@_offset<0,0,@_offset));
EXECUTE stm using @rows_count,@_offset;

如果键之间没有间隙而且都是数字你可以计算随机数然后选择这些行。但事实可能并非如此。

所以一种解决方案是:

SELECT * FROM table WHERE key >= FLOOR(RAND()*MAX(id)) LIMIT 1

这将确保你在键的范围内得到一个随机数然后你选择下一个更大的最佳值。 你必须这样做10次。

然而,这并不是随机的,因为你的钥匙很可能不是均匀分布的。

这真的是一个大问题,不容易解决满足所有的要求,MySQL的rand()是最好的,如果你真的想要10个随机行。

然而,还有另一种解决方案,它速度很快,但涉及到随机性时也需要权衡,但可能更适合你。在这里阅读:我如何优化MySQL的ORDER BY RAND()函数?

问题是你需要它有多随机。

你能多解释一下吗?这样我才能给你一个好的解决办法。

例如,我合作的一家公司有一个解决方案,他们需要非常快的绝对随机性。最后,他们用随机值预填充数据库,这些随机值是从降序选择的,然后再次设置为不同的随机值。

如果你几乎没有更新,你也可以填充一个递增的id,这样你就没有间隙,只是可以在选择之前计算随机键…这取决于用例!

我需要一个查询从一个相当大的表中返回大量随机行。这是我想到的。首先获取最大记录id:

SELECT MAX(id) FROM table_name;

然后将该值代入:

SELECT * FROM table_name WHERE id > FLOOR(RAND() * max) LIMIT n;

Where max is the maximum record id in the table and n is the number of rows you want in your result set. The assumption is that there are no gaps in the record id's although I doubt it would affect the result if there were (haven't tried it though). I also created this stored procedure to be more generic; pass in the table name and number of rows to be returned. I'm running MySQL 5.5.38 on Windows 2008, 32GB, dual 3GHz E5450, and on a table with 17,361,264 rows it's fairly consistent at ~.03 sec / ~11 sec to return 1,000,000 rows. (times are from MySQL Workbench 6.1; you could also use CEIL instead of FLOOR in the 2nd select statement depending on your preference)

DELIMITER $$

USE [schema name] $$

DROP PROCEDURE IF EXISTS `random_rows` $$

CREATE PROCEDURE `random_rows`(IN tab_name VARCHAR(64), IN num_rows INT)
BEGIN

SET @t = CONCAT('SET @max=(SELECT MAX(id) FROM ',tab_name,')');
PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

SET @t = CONCAT(
    'SELECT * FROM ',
    tab_name,
    ' WHERE id>FLOOR(RAND()*@max) LIMIT ',
    num_rows);

PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END
$$

then

CALL [schema name].random_rows([table name], n);

使用下面的简单查询从表中获取随机数据。

SELECT user_firstname ,
COUNT(DISTINCT usr_fk_id) cnt
FROM userdetails 
GROUP BY usr_fk_id 
ORDER BY cnt ASC  
LIMIT 10

如果你想要一个随机记录(不管id之间是否有空隙):

PREPARE stmt FROM 'SELECT * FROM `table_name` LIMIT 1 OFFSET ?';
SET @count = (SELECT
        FLOOR(RAND() * COUNT(*))
    FROM `table_name`);

EXECUTE stmt USING @count;

来源:https://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/评论- 1266