如何从MongoDB获得随机记录?

我想从一个巨大的集合(1亿条记录)中获得一个随机记录。

最快最有效的方法是什么?

数据已经在那里，没有字段可以生成随机数并获得随机行。

当前回答

如果您使用的是mongoid(文档到对象的包装器)，您可以执行以下操作 Ruby。(假设你的模型是User)

User.all.to_a[rand(User.count)]

在我的。irbrc，我有

def rando klass
    klass.all.to_a[rand(klass.count)]
end

所以在rails控制台，我可以做，例如，

rando User
rando Article

从任何集合中随机获取文件。

2013-12-06 12:22:06

其他回答

下面是一种使用_id的默认ObjectId值和一些数学和逻辑的方法。

// Get the "min" and "max" timestamp values from the _id in the collection and the 
// diff between.
// 4-bytes from a hex string is 8 characters

var min = parseInt(db.collection.find()
        .sort({ "_id": 1 }).limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
    max = parseInt(db.collection.find()
        .sort({ "_id": -1 })limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
    diff = max - min;

// Get a random value from diff and divide/multiply be 1000 for The "_id" precision:
var random = Math.floor(Math.floor(Math.random(diff)*diff)/1000)*1000;

// Use "random" in the range and pad the hex string to a valid ObjectId
var _id = new ObjectId(((min + random)/1000).toString(16) + "0000000000000000")

// Then query for the single document:
var randomDoc = db.collection.find({ "_id": { "$gte": _id } })
   .sort({ "_id": 1 }).limit(1).toArray()[0];

这是shell表示法的一般逻辑，很容易适应。

所以在点上:

查找集合中的最小和最大主键值生成一个位于这些文档的时间戳之间的随机数。将随机数与最小值相加，然后找到大于或等于该值的第一个文档。

这使用了从“十六进制”的时间戳值中“填充”来形成有效的ObjectId值，因为这就是我们正在寻找的。使用整数作为_id值本质上更简单，但在点中基本思想相同。

2015-06-26 11:06:04

您可以选择一个随机时间戳，然后搜索随后创建的第一个对象。它将只扫描单个文档，尽管它不一定会给您一个统一的分布。

var randRec = function() {
    // replace with your collection
    var coll = db.collection
    // get unixtime of first and last record
    var min = coll.find().sort({_id: 1}).limit(1)[0]._id.getTimestamp() - 0;
    var max = coll.find().sort({_id: -1}).limit(1)[0]._id.getTimestamp() - 0;

    // allow to pass additional query params
    return function(query) {
        if (typeof query === 'undefined') query = {}
        var randTime = Math.round(Math.random() * (max - min)) + min;
        var hexSeconds = Math.floor(randTime / 1000).toString(16);
        var id = ObjectId(hexSeconds + "0000000000000000");
        query._id = {$gte: id}
        return coll.find(query).limit(1)
    };
}();

2014-12-04 23:37:40

在Mongoose中最好的方法是使用$sample进行聚合调用。然而，Mongoose并不会将Mongoose文档应用到Aggregation上——尤其是当populate()也被应用的时候。

从数据库中获取一个“精益”数组:

/*
Sample model should be init first
const Sample = mongoose …
*/

const samples = await Sample.aggregate([
  { $match: {} },
  { $sample: { size: 33 } },
]).exec();
console.log(samples); //a lean Array

获取mongoose文档数组:

const samples = (
  await Sample.aggregate([
    { $match: {} },
    { $sample: { size: 27 } },
    { $project: { _id: 1 } },
  ]).exec()
).map(v => v._id);

const mongooseSamples = await Sample.find({ _id: { $in: samples } });

console.log(mongooseSamples); //an Array of mongoose documents

2021-04-06 09:21:52

您还可以在执行查询后使用shuffle-array

Var shuffle = require('shuffle-array');

Accounts.find (qry函数(呃,results_array) { newIndexArr = shuffle (results_array);

2019-05-12 05:43:50

当我面对类似的解决方案时，我回溯并发现业务请求实际上是为了创建所呈现的库存的某种形式的轮换。在这种情况下，有更好的选择，它们有来自Solr这样的搜索引擎的答案，而不是MongoDB这样的数据存储。

In short, with the requirement to "intelligently rotate" content, what we should do instead of a random number across all of the documents is to include a personal q score modifier. To implement this yourself, assuming a small population of users, you can store a document per user that has the productId, impression count, click-through count, last seen date, and whatever other factors the business finds as being meaningful to compute a q score modifier. When retrieving the set to display, typically you request more documents from the data store than requested by the end user, then apply the q score modifier, take the number of records requested by the end user, then randomize the page of results, a tiny set, so simply sort the documents in the application layer (in memory).

如果用户的范围太大，可以将用户划分为行为组，按行为组而不是按用户进行索引。

如果产品范围足够小，您可以为每个用户创建一个索引。

我发现这种技术效率更高，但更重要的是在创建相关的、有价值的软件解决方案使用体验方面更有效。

2013-09-11 16:32:43

如何从MongoDB获得随机记录?

推荐文章

最新文章

标签