如何从MongoDB获得随机记录?

我想从一个巨大的集合(1亿条记录)中获得一个随机记录。

最快最有效的方法是什么?

数据已经在那里，没有字段可以生成随机数并获得随机行。

当前回答

使用Python (pymongo)，聚合函数也可以工作。

collection.aggregate([{'$sample': {'size': sample_size }}])

这种方法比对随机数(例如collection.find([random_int]))运行查询要快得多。对于大型收藏来说尤其如此。

2018-04-17 14:37:24

其他回答

下面是一种使用_id的默认ObjectId值和一些数学和逻辑的方法。

// Get the "min" and "max" timestamp values from the _id in the collection and the 
// diff between.
// 4-bytes from a hex string is 8 characters

var min = parseInt(db.collection.find()
        .sort({ "_id": 1 }).limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
    max = parseInt(db.collection.find()
        .sort({ "_id": -1 })limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
    diff = max - min;

// Get a random value from diff and divide/multiply be 1000 for The "_id" precision:
var random = Math.floor(Math.floor(Math.random(diff)*diff)/1000)*1000;

// Use "random" in the range and pad the hex string to a valid ObjectId
var _id = new ObjectId(((min + random)/1000).toString(16) + "0000000000000000")

// Then query for the single document:
var randomDoc = db.collection.find({ "_id": { "$gte": _id } })
   .sort({ "_id": 1 }).limit(1).toArray()[0];

这是shell表示法的一般逻辑，很容易适应。

所以在点上:

查找集合中的最小和最大主键值生成一个位于这些文档的时间戳之间的随机数。将随机数与最小值相加，然后找到大于或等于该值的第一个文档。

这使用了从“十六进制”的时间戳值中“填充”来形成有效的ObjectId值，因为这就是我们正在寻找的。使用整数作为_id值本质上更简单，但在点中基本思想相同。

2015-06-26 11:06:04

对所有记录进行计数，生成一个0到计数之间的随机数，然后执行:

db.yourCollection.find().limit(-1).skip(yourRandomNumber).next()

2010-05-13 02:48:12

如果没有数据，这是很困难的。_id字段是什么?它们是mongodb对象id吗?如果是这样，你可以得到最大值和最小值:

lowest = db.coll.find().sort({_id:1}).limit(1).next()._id;
highest = db.coll.find().sort({_id:-1}).limit(1).next()._id;

然后，如果你假设id是均匀分布的(但它们不是，但至少这是一个开始):

unsigned long long L = first_8_bytes_of(lowest)
unsigned long long H = first_8_bytes_of(highest)

V = (H - L) * random_from_0_to_1();
N = L + V;
oid = N concat random_4_bytes();

randomobj = db.coll.find({_id:{$gte:oid}}).limit(1);

2010-05-13 13:48:41

如果您使用的是mongoid(文档到对象的包装器)，您可以执行以下操作 Ruby。(假设你的模型是User)

User.all.to_a[rand(User.count)]

在我的。irbrc，我有

def rando klass
    klass.all.to_a[rand(klass.count)]
end

所以在rails控制台，我可以做，例如，

rando User
rando Article

从任何集合中随机获取文件。

2013-12-06 12:22:06

您可以选择随机_id并返回相应的对象:

 db.collection.count( function(err, count){
        db.collection.distinct( "_id" , function( err, result) {
            if (err)
                res.send(err)
            var randomId = result[Math.floor(Math.random() * (count-1))]
            db.collection.findOne( { _id: randomId } , function( err, result) {
                if (err)
                    res.send(err)
                console.log(result)
            })
        })
    })

在这里，你不需要花空间存储随机数字的集合。

2015-04-30 04:24:13

如何从MongoDB获得随机记录?

推荐文章

最新文章

标签