我如何才能最好地编写一个查询,从总共600k中随机选择10行?


当前回答

使用下面的简单查询从表中获取随机数据。

SELECT user_firstname ,
COUNT(DISTINCT usr_fk_id) cnt
FROM userdetails 
GROUP BY usr_fk_id 
ORDER BY cnt ASC  
LIMIT 10

其他回答

另一个简单的解决方案是对行进行排名,并随机获取其中之一,有了这个解决方案,你将不需要在表中有任何基于“Id”的列。

SELECT d.* FROM (
SELECT  t.*,  @rownum := @rownum + 1 AS rank
FROM mytable AS t,
    (SELECT @rownum := 0) AS r,
    (SELECT @cnt := (SELECT RAND() * (SELECT COUNT(*) FROM mytable))) AS n
) d WHERE rank >= @cnt LIMIT 10;

您可以根据需要更改限制值,以便访问尽可能多的行,但大多数情况下是连续的值。

然而,如果你不想要连续的随机值,那么你可以获取一个更大的样本并从中随机选择。就像……

SELECT * FROM (
SELECT d.* FROM (
    SELECT  c.*,  @rownum := @rownum + 1 AS rank
    FROM buildbrain.`commits` AS c,
        (SELECT @rownum := 0) AS r,
        (SELECT @cnt := (SELECT RAND() * (SELECT COUNT(*) FROM buildbrain.`commits`))) AS rnd
) d 
WHERE rank >= @cnt LIMIT 10000 
) t ORDER BY RAND() LIMIT 10;

如何从表中随机选择行:

从这里开始: 在MySQL中随机选择行

对“表扫描”的快速改进是使用索引来获取随机id。

SELECT *
FROM random, (
        SELECT id AS sid
        FROM random
        ORDER BY RAND( )
        LIMIT 10
    ) tmp
WHERE random.id = tmp.sid;

我得到了快速查询(大约0.5秒),cpu很慢,在一个400K寄存器的MySQL数据库中随机选择10行,非缓存2Gb大小。在MySQL中快速选择随机行

$time= microtime_float();

$sql='SELECT COUNT(*) FROM pages';
$rquery= BD_Ejecutar($sql);
list($num_records)=mysql_fetch_row($rquery);
mysql_free_result($rquery);

$sql="SELECT id FROM pages WHERE RAND()*$num_records<20
   ORDER BY RAND() LIMIT 0,10";
$rquery= BD_Ejecutar($sql);
while(list($id)=mysql_fetch_row($rquery)){
    if($id_in) $id_in.=",$id";
    else $id_in="$id";
}
mysql_free_result($rquery);

$sql="SELECT id,url FROM pages WHERE id IN($id_in)";
$rquery= BD_Ejecutar($sql);
while(list($id,$url)=mysql_fetch_row($rquery)){
    logger("$id, $url",1);
}
mysql_free_result($rquery);

$time= microtime_float()-$time;

logger("num_records=$num_records",1);
logger("$id_in",1);
logger("Time elapsed: <b>$time segundos</b>",1);
SELECT column FROM table
ORDER BY RAND()
LIMIT 10

这不是有效的解决方案,但确实有效

我是这样做的:

select * 
from table_with_600k_rows
where rand() < 10/600000
limit 10

我喜欢它,因为它不需要其他表,写起来很简单,执行起来非常快。