我如何才能最好地编写一个查询,从总共600k中随机选择10行?
当前回答
我需要一个查询从一个相当大的表中返回大量随机行。这是我想到的。首先获取最大记录id:
SELECT MAX(id) FROM table_name;
然后将该值代入:
SELECT * FROM table_name WHERE id > FLOOR(RAND() * max) LIMIT n;
Where max is the maximum record id in the table and n is the number of rows you want in your result set. The assumption is that there are no gaps in the record id's although I doubt it would affect the result if there were (haven't tried it though). I also created this stored procedure to be more generic; pass in the table name and number of rows to be returned. I'm running MySQL 5.5.38 on Windows 2008, 32GB, dual 3GHz E5450, and on a table with 17,361,264 rows it's fairly consistent at ~.03 sec / ~11 sec to return 1,000,000 rows. (times are from MySQL Workbench 6.1; you could also use CEIL instead of FLOOR in the 2nd select statement depending on your preference)
DELIMITER $$
USE [schema name] $$
DROP PROCEDURE IF EXISTS `random_rows` $$
CREATE PROCEDURE `random_rows`(IN tab_name VARCHAR(64), IN num_rows INT)
BEGIN
SET @t = CONCAT('SET @max=(SELECT MAX(id) FROM ',tab_name,')');
PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SET @t = CONCAT(
'SELECT * FROM ',
tab_name,
' WHERE id>FLOOR(RAND()*@max) LIMIT ',
num_rows);
PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END
$$
then
CALL [schema name].random_rows([table name], n);
其他回答
我认为这是一个简单但更快的方法,我在现场服务器上测试了它,与上面的几个答案相比,它更快。
SELECT * FROM `table_name` WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table_name` ) ORDER BY id LIMIT 30;
//对一个130行的表花费0.0014秒
SELECT * FROM `table_name` WHERE 1 ORDER BY RAND() LIMIT 30
//对130行的表花费0.0042秒
SELECT name
FROM random AS r1 JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 30
//对130行的表花费0.0040秒
使用下面的简单查询从表中获取随机数据。
SELECT user_firstname ,
COUNT(DISTINCT usr_fk_id) cnt
FROM userdetails
GROUP BY usr_fk_id
ORDER BY cnt ASC
LIMIT 10
您可以轻松地使用带限制的随机偏移量
PREPARE stm from 'select * from table limit 10 offset ?';
SET @total = (select count(*) from table);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;
您还可以像这样应用where子句
PREPARE stm from 'select * from table where available=true limit 10 offset ?';
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
EXECUTE stm using @_offset;
在600,000行(700MB)表查询执行上的测试花费了大约0.016秒的硬盘驱动器时间。
EDIT:偏移量可能取接近表末尾的值,这将导致select语句返回更少的行(或者可能只有一行),为了避免这种情况,我们可以在声明偏移量后再次检查,如下所示
SET @rows_count = 10;
PREPARE stm from "select * from table where available=true limit ? offset ?";
SET @total = (select count(*) from table where available=true);
SET @_offset = FLOOR(RAND() * @total);
SET @_offset = (SELECT IF(@total-@_offset<@rows_count,@_offset-@rows_count,@_offset));
SET @_offset = (SELECT IF(@_offset<0,0,@_offset));
EXECUTE stm using @rows_count,@_offset;
我使用了Riedsio发布的http://jan.kneschke.de/projects/mysql/order-by-rand/(我使用了返回一个或多个随机值的存储过程的情况):
DROP TEMPORARY TABLE IF EXISTS rands;
CREATE TEMPORARY TABLE rands ( rand_id INT );
loop_me: LOOP
IF cnt < 1 THEN
LEAVE loop_me;
END IF;
INSERT INTO rands
SELECT r1.id
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
SET cnt = cnt - 1;
END LOOP loop_me;
在这篇文章中,他通过维护一个表(使用触发器等)解决了id中的间隙导致不那么随机的结果的问题。参见文章); 我通过向表中添加另一列来解决这个问题,用连续的数字填充,从1开始(编辑:此列添加到运行时由子查询创建的临时表中,不影响永久表):
DROP TEMPORARY TABLE IF EXISTS rands;
CREATE TEMPORARY TABLE rands ( rand_id INT );
loop_me: LOOP
IF cnt < 1 THEN
LEAVE loop_me;
END IF;
SET @no_gaps_id := 0;
INSERT INTO rands
SELECT r1.id
FROM (SELECT id, @no_gaps_id := @no_gaps_id + 1 AS no_gaps_id FROM random) AS r1 JOIN
(SELECT (RAND() *
(SELECT COUNT(*)
FROM random)) AS id)
AS r2
WHERE r1.no_gaps_id >= r2.id
ORDER BY r1.no_gaps_id ASC
LIMIT 1;
SET cnt = cnt - 1;
END LOOP loop_me;
在文章中,我可以看到他花了很大的精力来优化代码;我不知道我的改变是否/有多大影响性能,但对我来说非常好。
如果你想要一个随机记录(不管id之间是否有空隙):
PREPARE stmt FROM 'SELECT * FROM `table_name` LIMIT 1 OFFSET ?';
SET @count = (SELECT
FLOOR(RAND() * COUNT(*))
FROM `table_name`);
EXECUTE stmt USING @count;
来源:https://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/评论- 1266
推荐文章
- 如何关闭mysql密码验证?
- 如何在Ruby On Rails中使用NuoDB手动执行SQL命令
- 查询JSON类型内的数组元素
- 确定记录是否存在的最快方法
- MySQL区分大小写查询
- 获得PostgreSQL数据库中当前连接数的正确查询
- 如何在Ruby中生成a和b之间的随机数?
- 在SQL选择语句Order By 1的目的是什么?
- MySQL数据库表中的最大记录数
- 原则-如何打印出真正的sql,而不仅仅是准备好的语句?
- PHP/MySQL插入一行然后获取id
- 我如何循环通过一组记录在SQL Server?
- 如何从命令行通过mysql运行一个查询?
- 外键约束可能导致循环或多条级联路径?
- java.util.Random真的那么随机吗?我怎么能生成52!(阶乘)可能的序列?