如何从MongoDB获得随机记录?

我想从一个巨大的集合(1亿条记录)中获得一个随机记录。

最快最有效的方法是什么?

数据已经在那里，没有字段可以生成随机数并获得随机行。

当前回答

下面是一种使用_id的默认ObjectId值和一些数学和逻辑的方法。

// Get the "min" and "max" timestamp values from the _id in the collection and the 
// diff between.
// 4-bytes from a hex string is 8 characters

var min = parseInt(db.collection.find()
        .sort({ "_id": 1 }).limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
    max = parseInt(db.collection.find()
        .sort({ "_id": -1 })limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
    diff = max - min;

// Get a random value from diff and divide/multiply be 1000 for The "_id" precision:
var random = Math.floor(Math.floor(Math.random(diff)*diff)/1000)*1000;

// Use "random" in the range and pad the hex string to a valid ObjectId
var _id = new ObjectId(((min + random)/1000).toString(16) + "0000000000000000")

// Then query for the single document:
var randomDoc = db.collection.find({ "_id": { "$gte": _id } })
   .sort({ "_id": 1 }).limit(1).toArray()[0];

这是shell表示法的一般逻辑，很容易适应。

所以在点上:

查找集合中的最小和最大主键值生成一个位于这些文档的时间戳之间的随机数。将随机数与最小值相加，然后找到大于或等于该值的第一个文档。

这使用了从“十六进制”的时间戳值中“填充”来形成有效的ObjectId值，因为这就是我们正在寻找的。使用整数作为_id值本质上更简单，但在点中基本思想相同。

2015-06-26 11:06:04

其他回答

对所有记录进行计数，生成一个0到计数之间的随机数，然后执行:

db.yourCollection.find().limit(-1).skip(yourRandomNumber).next()

2010-05-13 02:48:12

没有一个解决方案对我有效。尤其是当缝隙多、集小的时候。这对我来说很好(在php中):

$count = $collection->count($search);
$skip = mt_rand(0, $count - 1);
$result = $collection->find($search)->skip($skip)->limit(1)->getNext();

2014-01-21 18:07:44

您可以选择一个随机时间戳，然后搜索随后创建的第一个对象。它将只扫描单个文档，尽管它不一定会给您一个统一的分布。

var randRec = function() {
    // replace with your collection
    var coll = db.collection
    // get unixtime of first and last record
    var min = coll.find().sort({_id: 1}).limit(1)[0]._id.getTimestamp() - 0;
    var max = coll.find().sort({_id: -1}).limit(1)[0]._id.getTimestamp() - 0;

    // allow to pass additional query params
    return function(query) {
        if (typeof query === 'undefined') query = {}
        var randTime = Math.round(Math.random() * (max - min)) + min;
        var hexSeconds = Math.floor(randTime / 1000).toString(16);
        var id = ObjectId(hexSeconds + "0000000000000000");
        query._id = {$gte: id}
        return coll.find(query).limit(1)
    };
}();

2014-12-04 23:37:40

MongoDB现在有$rand

要选择n个非重复项，请使用{$addFields: {_f: {$rand:{}}}}进行聚合，然后按_f进行$sort和$limit n。

2021-02-23 15:38:46

我最简单的解决办法是……

db.coll.find()
    .limit(1)
    .skip(Math.floor(Math.random() * 500))
    .next()

你至少有500件收藏品

2022-09-22 03:26:04

如何从MongoDB获得随机记录?

推荐文章

最新文章

标签