如何从MongoDB获得随机记录?

我想从一个巨大的集合(1亿条记录)中获得一个随机记录。

最快最有效的方法是什么?

数据已经在那里，没有字段可以生成随机数并获得随机行。

当前回答

现在可以使用聚合了。例子:

db.users.aggregate(
   [ { $sample: { size: 3 } } ]
)

去看医生。

2017-02-06 17:00:45

其他回答

这工作得很好，它是快速的，适用于多个文档，不需要填充rand字段，它最终会填充自己:

向集合上的.rand字段添加索引使用查找和刷新，如下所示:

// Install packages:
//   npm install mongodb async
// Add index in mongo:
//   db.ensureIndex('mycollection', { rand: 1 })

var mongodb = require('mongodb')
var async = require('async')

// Find n random documents by using "rand" field.
function findAndRefreshRand (collection, n, fields, done) {
  var result = []
  var rand = Math.random()

  // Append documents to the result based on criteria and options, if options.limit is 0 skip the call.
  var appender = function (criteria, options, done) {
    return function (done) {
      if (options.limit > 0) {
        collection.find(criteria, fields, options).toArray(
          function (err, docs) {
            if (!err && Array.isArray(docs)) {
              Array.prototype.push.apply(result, docs)
            }
            done(err)
          }
        )
      } else {
        async.nextTick(done)
      }
    }
  }

  async.series([

    // Fetch docs with unitialized .rand.
    // NOTE: You can comment out this step if all docs have initialized .rand = Math.random()
    appender({ rand: { $exists: false } }, { limit: n - result.length }),

    // Fetch on one side of random number.
    appender({ rand: { $gte: rand } }, { sort: { rand: 1 }, limit: n - result.length }),

    // Continue fetch on the other side.
    appender({ rand: { $lt: rand } }, { sort: { rand: -1 }, limit: n - result.length }),

    // Refresh fetched docs, if any.
    function (done) {
      if (result.length > 0) {
        var batch = collection.initializeUnorderedBulkOp({ w: 0 })
        for (var i = 0; i < result.length; ++i) {
          batch.find({ _id: result[i]._id }).updateOne({ rand: Math.random() })
        }
        batch.execute(done)
      } else {
        async.nextTick(done)
      }
    }

  ], function (err) {
    done(err, result)
  })
}

// Example usage
mongodb.MongoClient.connect('mongodb://localhost:27017/core-development', function (err, db) {
  if (!err) {
    findAndRefreshRand(db.collection('profiles'), 1024, { _id: true, rand: true }, function (err, result) {
      if (!err) {
        console.log(result)
      } else {
        console.error(err)
      }
      db.close()
    })
  } else {
    console.error(err)
  }
})

ps.如何在mongodb问题中找到随机记录被标记为此问题的副本。不同之处在于，这个问题明确地询问单个记录，而另一个问题明确地询问随机文档。

2014-11-19 22:08:23

对所有记录进行计数，生成一个0到计数之间的随机数，然后执行:

db.yourCollection.find().limit(-1).skip(yourRandomNumber).next()

2010-05-13 02:48:12

从MongoDB 3.2版本开始，你可以使用$sample聚合管道操作符从集合中随机获得N个文档:

// Get one random document from the mycoll collection.
db.mycoll.aggregate([{ $sample: { size: 1 } }])

如果你想从集合的筛选子集中选择随机文档，在管道中预先添加$match阶段:

// Get one random document matching {a: 10} from the mycoll collection.
db.mycoll.aggregate([
    { $match: { a: 10 } },
    { $sample: { size: 1 } }
])

正如注释中所指出的，当size大于1时，返回的文档样例中可能有重复项。

2015-11-07 02:28:27

如果你有一个简单的id键，你可以将所有的id存储在一个数组中，然后随机选择一个id。(Ruby回答):

ids = @coll.find({},fields:{_id:1}).to_a
@coll.find(ids.sample).first

2013-03-19 14:10:47

当我面对类似的解决方案时，我回溯并发现业务请求实际上是为了创建所呈现的库存的某种形式的轮换。在这种情况下，有更好的选择，它们有来自Solr这样的搜索引擎的答案，而不是MongoDB这样的数据存储。

In short, with the requirement to "intelligently rotate" content, what we should do instead of a random number across all of the documents is to include a personal q score modifier. To implement this yourself, assuming a small population of users, you can store a document per user that has the productId, impression count, click-through count, last seen date, and whatever other factors the business finds as being meaningful to compute a q score modifier. When retrieving the set to display, typically you request more documents from the data store than requested by the end user, then apply the q score modifier, take the number of records requested by the end user, then randomize the page of results, a tiny set, so simply sort the documents in the application layer (in memory).

如果用户的范围太大，可以将用户划分为行为组，按行为组而不是按用户进行索引。

如果产品范围足够小，您可以为每个用户创建一个索引。

我发现这种技术效率更高，但更重要的是在创建相关的、有价值的软件解决方案使用体验方面更有效。

2013-09-11 16:32:43

如何从MongoDB获得随机记录?

推荐文章

最新文章

标签