我如何才能最好地编写一个查询,从总共600k中随机选择10行?
当前回答
我得到了快速查询(大约0.5秒),cpu很慢,在一个400K寄存器的MySQL数据库中随机选择10行,非缓存2Gb大小。在MySQL中快速选择随机行
$time= microtime_float();
$sql='SELECT COUNT(*) FROM pages';
$rquery= BD_Ejecutar($sql);
list($num_records)=mysql_fetch_row($rquery);
mysql_free_result($rquery);
$sql="SELECT id FROM pages WHERE RAND()*$num_records<20
ORDER BY RAND() LIMIT 0,10";
$rquery= BD_Ejecutar($sql);
while(list($id)=mysql_fetch_row($rquery)){
if($id_in) $id_in.=",$id";
else $id_in="$id";
}
mysql_free_result($rquery);
$sql="SELECT id,url FROM pages WHERE id IN($id_in)";
$rquery= BD_Ejecutar($sql);
while(list($id,$url)=mysql_fetch_row($rquery)){
logger("$id, $url",1);
}
mysql_free_result($rquery);
$time= microtime_float()-$time;
logger("num_records=$num_records",1);
logger("$id_in",1);
logger("Time elapsed: <b>$time segundos</b>",1);
其他回答
SELECT column FROM table
ORDER BY RAND()
LIMIT 10
这不是有效的解决方案,但确实有效
我认为这是一个简单但更快的方法,我在现场服务器上测试了它,与上面的几个答案相比,它更快。
SELECT * FROM `table_name` WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table_name` ) ORDER BY id LIMIT 30;
//对一个130行的表花费0.0014秒
SELECT * FROM `table_name` WHERE 1 ORDER BY RAND() LIMIT 30
//对130行的表花费0.0042秒
SELECT name
FROM random AS r1 JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 30
//对130行的表花费0.0040秒
SELECT
*
FROM
table_with_600k_rows
WHERE
RAND( )
ORDER BY
id DESC
LIMIT 30;
Id是主键,按Id排序, 解释table_with_600k_rows,发现该行不扫描整个表
我使用了Riedsio发布的http://jan.kneschke.de/projects/mysql/order-by-rand/(我使用了返回一个或多个随机值的存储过程的情况):
DROP TEMPORARY TABLE IF EXISTS rands;
CREATE TEMPORARY TABLE rands ( rand_id INT );
loop_me: LOOP
IF cnt < 1 THEN
LEAVE loop_me;
END IF;
INSERT INTO rands
SELECT r1.id
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
SET cnt = cnt - 1;
END LOOP loop_me;
在这篇文章中,他通过维护一个表(使用触发器等)解决了id中的间隙导致不那么随机的结果的问题。参见文章); 我通过向表中添加另一列来解决这个问题,用连续的数字填充,从1开始(编辑:此列添加到运行时由子查询创建的临时表中,不影响永久表):
DROP TEMPORARY TABLE IF EXISTS rands;
CREATE TEMPORARY TABLE rands ( rand_id INT );
loop_me: LOOP
IF cnt < 1 THEN
LEAVE loop_me;
END IF;
SET @no_gaps_id := 0;
INSERT INTO rands
SELECT r1.id
FROM (SELECT id, @no_gaps_id := @no_gaps_id + 1 AS no_gaps_id FROM random) AS r1 JOIN
(SELECT (RAND() *
(SELECT COUNT(*)
FROM random)) AS id)
AS r2
WHERE r1.no_gaps_id >= r2.id
ORDER BY r1.no_gaps_id ASC
LIMIT 1;
SET cnt = cnt - 1;
END LOOP loop_me;
在文章中,我可以看到他花了很大的精力来优化代码;我不知道我的改变是否/有多大影响性能,但对我来说非常好。
您可以轻松地使用带限制的随机偏移量
PREPARE stm from 'select * from table limit 10 offset ?';
SET @total = (select count(*) from table);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;
您还可以像这样应用where子句
PREPARE stm from 'select * from table where available=true limit 10 offset ?';
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;
在600,000行(700MB)表查询执行上的测试花费了大约0.016秒的硬盘驱动器时间。
EDIT:偏移量可能取接近表末尾的值,这将导致select语句返回更少的行(或者可能只有一行),为了避免这种情况,我们可以在声明偏移量后再次检查,如下所示
SET @rows_count = 10;
PREPARE stm from "select * from table where available=true limit ? offset ?";
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
SET @_offset = (SELECT IF(@total-@_offset<@rows_count,@_offset-@rows_count,@_offset));
SET @_offset = (SELECT IF(@_offset<0,0,@_offset));
EXECUTE stm using @rows_count,@_offset;
推荐文章
- LEFT OUTER JOIN如何返回比左表中存在的记录更多的记录?
- 如何用SQL语句计算百分比
- Postgres唯一约束与索引
- SQL Server动态PIVOT查询?
- MySQL对重复键更新在一个查询中插入多行
- 向现有表添加主键
- mysql_connect():[2002]没有这样的文件或目录(试图通过unix:///tmp/mysql.sock连接)在
- 使用电子邮件地址为主键?
- MySQL:如何复制行,但改变几个字段?
- 不能删除或更新父行:外键约束失败
- MongoDB在v4之前不兼容ACID意味着什么?
- SQL WHERE ID IN (id1, id2,…idn)
- Mysql错误1452:不能添加或更新子行:外键约束失败
- 最常见的SQL反模式是什么?
- 错误:没有唯一的约束匹配给定的键引用表"bar"