我如何才能最好地编写一个查询,从总共600k中随机选择10行?
当前回答
我得到了快速查询(大约0.5秒),cpu很慢,在一个400K寄存器的MySQL数据库中随机选择10行,非缓存2Gb大小。在MySQL中快速选择随机行
$time= microtime_float();
$sql='SELECT COUNT(*) FROM pages';
$rquery= BD_Ejecutar($sql);
list($num_records)=mysql_fetch_row($rquery);
mysql_free_result($rquery);
$sql="SELECT id FROM pages WHERE RAND()*$num_records<20
ORDER BY RAND() LIMIT 0,10";
$rquery= BD_Ejecutar($sql);
while(list($id)=mysql_fetch_row($rquery)){
if($id_in) $id_in.=",$id";
else $id_in="$id";
}
mysql_free_result($rquery);
$sql="SELECT id,url FROM pages WHERE id IN($id_in)";
$rquery= BD_Ejecutar($sql);
while(list($id,$url)=mysql_fetch_row($rquery)){
logger("$id, $url",1);
}
mysql_free_result($rquery);
$time= microtime_float()-$time;
logger("num_records=$num_records",1);
logger("$id_in",1);
logger("Time elapsed: <b>$time segundos</b>",1);
其他回答
我看了所有的答案,我认为没有人提到这种可能性,我也不知道为什么。
如果你想要最大限度的简单和速度,在一个较小的成本,那么对我来说,它似乎是有意义的存储在DB中的每一行的随机数。只需要创建一个额外的列random_number,并将其默认值设置为RAND()。在此列上创建索引。
然后,当您想检索一行时,在代码(PHP、Perl等)中生成一个随机数,并将其与列进行比较。
SELECT FROM tbl WHERE random_number >= :random LIMIT 1
我想虽然它对于单行来说非常整洁,但是对于像OP要求的十行,你必须分别调用它十次(或者想出一个我立即逃脱的聪明的调整)
所有最好的答案都已经贴出来了(主要是那些引用了http://jan.kneschke.de/projects/mysql/order-by-rand/的链接)。
I want to pinpoint another speed-up possibility - caching. Think of why you need to get random rows. Probably you want display some random post or random ad on a website. If you are getting 100 req/s, is it really needed that each visitor gets random rows? Usually it is completely fine to cache these X random rows for 1 second (or even 10 seconds). It doesn't matter if 100 unique visitors in the same 1 second get the same random posts, because the next second another 100 visitors will get different set of posts.
当使用这种缓存时,你也可以使用一些较慢的解决方案来获取随机数据,因为不管你的req/s如何,它每秒只会从MySQL中获取一次。
我需要一个查询从一个相当大的表中返回大量随机行。这是我想到的。首先获取最大记录id:
SELECT MAX(id) FROM table_name;
然后将该值代入:
SELECT * FROM table_name WHERE id > FLOOR(RAND() * max) LIMIT n;
Where max is the maximum record id in the table and n is the number of rows you want in your result set. The assumption is that there are no gaps in the record id's although I doubt it would affect the result if there were (haven't tried it though). I also created this stored procedure to be more generic; pass in the table name and number of rows to be returned. I'm running MySQL 5.5.38 on Windows 2008, 32GB, dual 3GHz E5450, and on a table with 17,361,264 rows it's fairly consistent at ~.03 sec / ~11 sec to return 1,000,000 rows. (times are from MySQL Workbench 6.1; you could also use CEIL instead of FLOOR in the 2nd select statement depending on your preference)
DELIMITER $$
USE [schema name] $$
DROP PROCEDURE IF EXISTS `random_rows` $$
CREATE PROCEDURE `random_rows`(IN tab_name VARCHAR(64), IN num_rows INT)
BEGIN
SET @t = CONCAT('SET @max=(SELECT MAX(id) FROM ',tab_name,')');
PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SET @t = CONCAT(
'SELECT * FROM ',
tab_name,
' WHERE id>FLOOR(RAND()*@max) LIMIT ',
num_rows);
PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END
$$
then
CALL [schema name].random_rows([table name], n);
我认为这是一个简单但更快的方法,我在现场服务器上测试了它,与上面的几个答案相比,它更快。
SELECT * FROM `table_name` WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table_name` ) ORDER BY id LIMIT 30;
//对一个130行的表花费0.0014秒
SELECT * FROM `table_name` WHERE 1 ORDER BY RAND() LIMIT 30
//对130行的表花费0.0042秒
SELECT name
FROM random AS r1 JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 30
//对130行的表花费0.0040秒
我改进了@Riedsio的答案。这是我在一个有间隙的大型均匀分布表上能找到的最有效的查询(测试从一个有> 2.6B行的表中获得1000个随机行)。
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1)
让我来解释一下发生了什么。
(SELECT MAX(id) FROM table) 我在计算并保存最大值。对于非常大的表,每次需要一行时计算MAX(id)都会有轻微的开销 SELECT FLOOR(rand() * @max) + 1 as rand) 获取一个随机id SELECT id FROM table INNER JOIN(… 这就填补了空白。基本上,如果你在间隙中随机选择一个数字,它就会选择下一个id。假设间隙是均匀分布的,这应该不是问题。
进行联合可以帮助您将所有内容放入一个查询中,从而避免进行多个查询。它还可以节省计算MAX(id)的开销。根据您的应用程序,这可能非常重要,也可能无关紧要。
注意,这只获取id,并以随机顺序获取它们。如果你想做更高级的事情,我建议你这样做:
SELECT t.id, t.name -- etc, etc
FROM table t
INNER JOIN (
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1)
) x ON x.id = t.id
ORDER BY t.id
推荐文章
- 如何关闭mysql密码验证?
- 如何在Ruby On Rails中使用NuoDB手动执行SQL命令
- 查询JSON类型内的数组元素
- 确定记录是否存在的最快方法
- MySQL区分大小写查询
- 获得PostgreSQL数据库中当前连接数的正确查询
- 如何在Ruby中生成a和b之间的随机数?
- 在SQL选择语句Order By 1的目的是什么?
- MySQL数据库表中的最大记录数
- 原则-如何打印出真正的sql,而不仅仅是准备好的语句?
- PHP/MySQL插入一行然后获取id
- 我如何循环通过一组记录在SQL Server?
- 如何从命令行通过mysql运行一个查询?
- 外键约束可能导致循环或多条级联路径?
- java.util.Random真的那么随机吗?我怎么能生成52!(阶乘)可能的序列?