如何从MongoDB获得随机记录?

我想从一个巨大的集合(1亿条记录)中获得一个随机记录。

最快最有效的方法是什么?

数据已经在那里，没有字段可以生成随机数并获得随机行。

当前回答

从MongoDB 3.2版本开始，你可以使用$sample聚合管道操作符从集合中随机获得N个文档:

// Get one random document from the mycoll collection.
db.mycoll.aggregate([{ $sample: { size: 1 } }])

如果你想从集合的筛选子集中选择随机文档，在管道中预先添加$match阶段:

// Get one random document matching {a: 10} from the mycoll collection.
db.mycoll.aggregate([
    { $match: { a: 10 } },
    { $sample: { size: 1 } }
])

正如注释中所指出的，当size大于1时，返回的文档样例中可能有重复项。

2015-11-07 02:28:27

其他回答

在Mongoose中最好的方法是使用$sample进行聚合调用。然而，Mongoose并不会将Mongoose文档应用到Aggregation上——尤其是当populate()也被应用的时候。

从数据库中获取一个“精益”数组:

/*
Sample model should be init first
const Sample = mongoose …
*/

const samples = await Sample.aggregate([
  { $match: {} },
  { $sample: { size: 33 } },
]).exec();
console.log(samples); //a lean Array

获取mongoose文档数组:

const samples = (
  await Sample.aggregate([
    { $match: {} },
    { $sample: { size: 27 } },
    { $project: { _id: 1 } },
  ]).exec()
).map(v => v._id);

const mongooseSamples = await Sample.find({ _id: { $in: samples } });

console.log(mongooseSamples); //an Array of mongoose documents

2021-04-06 09:21:52

有效可靠的方法是:

在每个文档中添加一个名为“random”的字段，并为其分配一个随机值，为该随机字段添加一个索引，如下所示:

让我们假设我们有一个名为“links”的网络链接集合，我们想从它中随机链接:

link = db.links.find().sort({random: 1}).limit(1)[0]

为了确保同一个链接不会第二次弹出，用一个新的随机数更新它的随机场:

db.links.update({random: Math.random()}, link)

2011-03-25 13:56:27

我建议给每个对象添加一个随机的int字段。然后你就可以做

findOne({random_field: {$gte: rand()}})

随机选择一个文档。只要确保你ensureIndex({random_field:1})

2010-05-17 18:47:23

使用Map/Reduce，您当然可以获得一个随机记录，只是不一定非常有效，这取决于您最终使用的过滤集合的大小。

我已经用5万个文档测试了这个方法(过滤器将其减少到大约3万个)，它在Intel i3、16GB ram和SATA3 HDD上执行大约400毫秒……

db.toc_content.mapReduce(
    /* map function */
    function() { emit( 1, this._id ); },

    /* reduce function */
    function(k,v) {
        var r = Math.floor((Math.random()*v.length));
        return v[r];
    },

    /* options */
    {
        out: { inline: 1 },
        /* Filter the collection to "A"ctive documents */
        query: { status: "A" }
    }
);

Map函数简单地创建一个数组，其中包含所有与查询匹配的文档的id。在我的例子中，我测试了5万个可能的文档中的大约3万个。

Reduce函数只是在数组中从0到项数(-1)之间选择一个随机整数，然后从数组中返回该_id。

400ms听起来是一段很长的时间，而且确实如此，如果您有5000万条记录而不是5万条记录，这可能会增加开销，以至于在多用户情况下无法使用。

MongoDB在核心中包含这个功能有一个悬而未决的问题…https://jira.mongodb.org/browse/SERVER-533

如果将这种“随机”选择构建到索引查找中，而不是将id收集到一个数组中然后选择一个，这将非常有帮助。(去投票吧!)

2014-01-29 23:26:46

从MongoDB 3.2版本开始，你可以使用$sample聚合管道操作符从集合中随机获得N个文档:

// Get one random document from the mycoll collection.
db.mycoll.aggregate([{ $sample: { size: 1 } }])

如果你想从集合的筛选子集中选择随机文档，在管道中预先添加$match阶段:

// Get one random document matching {a: 10} from the mycoll collection.
db.mycoll.aggregate([
    { $match: { a: 10 } },
    { $sample: { size: 1 } }
])

正如注释中所指出的，当size大于1时，返回的文档样例中可能有重复项。

2015-11-07 02:28:27

如何从MongoDB获得随机记录?

推荐文章

最新文章

标签