我如何才能最好地编写一个查询,从总共600k中随机选择10行?
当前回答
如何从表中随机选择行:
从这里开始: 在MySQL中随机选择行
对“表扫描”的快速改进是使用索引来获取随机id。
SELECT *
FROM random, (
SELECT id AS sid
FROM random
ORDER BY RAND( )
LIMIT 10
) tmp
WHERE random.id = tmp.sid;
其他回答
SELECT column FROM table
ORDER BY RAND()
LIMIT 10
这不是有效的解决方案,但确实有效
它是非常简单的单行查询。
SELECT * FROM Table_Name ORDER BY RAND() LIMIT 0,10;
简单的查询,具有出色的性能和工作的差距:
SELECT * FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY RAND() LIMIT 10) as t2 ON t1.id=t2.id
在一个200K表上的这个查询需要0.08秒,而在我的机器上,正常版本(SELECT * FROM tbl ORDER BY RAND() LIMIT 10)需要0.35秒。
这是快速的,因为排序阶段只使用索引ID列。你可以在解释中看到这种行为:
SELECT * FROM tbl ORDER BY RAND() LIMIT 10:
SELECT * FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY RAND() LIMIT 10) AS t2 ON t1.id=t2.id
加权版:https://stackoverflow.com/a/41577458/893432
如果你只有一个读请求
将@redsio的答案与一个临时表结合起来(600K并不是很多):
DROP TEMPORARY TABLE IF EXISTS tmp_randorder;
CREATE TABLE tmp_randorder (id int(11) not null auto_increment primary key, data_id int(11));
INSERT INTO tmp_randorder (data_id) select id from datatable;
然后用一个@redsios的版本回答:
SELECT dt.*
FROM
(SELECT (RAND() *
(SELECT MAX(id)
FROM tmp_randorder)) AS id)
AS rnd
INNER JOIN tmp_randorder rndo on rndo.id between rnd.id - 10 and rnd.id + 10
INNER JOIN datatable AS dt on dt.id = rndo.data_id
ORDER BY abs(rndo.id - rnd.id)
LIMIT 1;
如果表比较大,可以先筛选第一部分:
INSERT INTO tmp_randorder (data_id) select id from datatable where rand() < 0.01;
如果你有很多读请求
Version: You could keep the table tmp_randorder persistent, call it datatable_idlist. Recreate that table in certain intervals (day, hour), since it also will get holes. If your table gets really big, you could also refill holes select l.data_id as whole from datatable_idlist l left join datatable dt on dt.id = l.data_id where dt.id is null; Version: Give your Dataset a random_sortorder column either directly in datatable or in a persistent extra table datatable_sortorder. Index that column. Generate a Random-Value in your Application (I'll call it $rand). select l.* from datatable l order by abs(random_sortorder - $rand) desc limit 1;
这个解决方案用最高和最低的random_sortorder来区分“边缘行”,所以在间隔中重新排列它们(一天一次)。
我使用了Riedsio发布的http://jan.kneschke.de/projects/mysql/order-by-rand/(我使用了返回一个或多个随机值的存储过程的情况):
DROP TEMPORARY TABLE IF EXISTS rands;
CREATE TEMPORARY TABLE rands ( rand_id INT );
loop_me: LOOP
IF cnt < 1 THEN
LEAVE loop_me;
END IF;
INSERT INTO rands
SELECT r1.id
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
SET cnt = cnt - 1;
END LOOP loop_me;
在这篇文章中,他通过维护一个表(使用触发器等)解决了id中的间隙导致不那么随机的结果的问题。参见文章); 我通过向表中添加另一列来解决这个问题,用连续的数字填充,从1开始(编辑:此列添加到运行时由子查询创建的临时表中,不影响永久表):
DROP TEMPORARY TABLE IF EXISTS rands;
CREATE TEMPORARY TABLE rands ( rand_id INT );
loop_me: LOOP
IF cnt < 1 THEN
LEAVE loop_me;
END IF;
SET @no_gaps_id := 0;
INSERT INTO rands
SELECT r1.id
FROM (SELECT id, @no_gaps_id := @no_gaps_id + 1 AS no_gaps_id FROM random) AS r1 JOIN
(SELECT (RAND() *
(SELECT COUNT(*)
FROM random)) AS id)
AS r2
WHERE r1.no_gaps_id >= r2.id
ORDER BY r1.no_gaps_id ASC
LIMIT 1;
SET cnt = cnt - 1;
END LOOP loop_me;
在文章中,我可以看到他花了很大的精力来优化代码;我不知道我的改变是否/有多大影响性能,但对我来说非常好。
推荐文章
- 将值从同一表中的一列复制到另一列
- GROUP BY with MAX(DATE)
- 删除id与其他表不匹配的sql行
- 等价的限制和偏移SQL Server?
- MySQL CPU使用率高
- INT和VARCHAR主键之间有真正的性能差异吗?
- 拒绝访问;您需要(至少一个)SUPER特权来执行此操作
- 为什么我不能在DELETE语句中使用别名?
- 在SQL Server Management Studio中保存带有标题的结果
- 从存储引擎得到错误28
- "where 1=1"语句
- 如何选择一个记录和更新它,与一个单一的查询集在Django?
- 多语句表值函数vs内联表值函数
- 如何从Oracle的表中获取列名?
- 可能做MySQL外键的两个可能的表之一?