我想设计一个带有一些评论的问题结构。注释应该使用哪种关系:嵌入还是引用?

一个带有注释的问题,比如stackoverflow,会有这样的结构:

Question
    title = 'aaa'
    content = 'bbb'
    comments = ???

一开始,我想使用嵌入式注释(我认为MongoDB中推荐使用embed),像这样:

Question
    title = 'aaa'
    content = 'bbb'
    comments = [ { content = 'xxx', createdAt = 'yyy'}, 
                 { content = 'xxx', createdAt = 'yyy'}, 
                 { content = 'xxx', createdAt = 'yyy'} ]

这很清楚,但我担心这种情况:如果我想编辑一个指定的评论,我如何获得它的内容和它的问题?没有_id让我找到一个,也没有question_ref让我找到它的问题。(也许有一种方法可以做到这一点没有_id和question_ref?)

我必须使用ref而不是embed吗?然后我必须为评论创建一个新的集合吗?


当前回答

一般来说,如果实体之间有一对一或一对多的关系,则嵌入是很好的选择;如果实体之间有多对多的关系,则引用是很好的选择。

其他回答

好吧,我有点晚了,但仍然想分享我的模式创建方法。

我有可以用一个词描述的所有事物的模式,就像在经典的OOP中那样。

E.G.

评论 账户 用户 博客 ...

每个模式都可以保存为Document或Subdocument,因此我对每个模式都声明了这一点。

文档:

可以作为参考。(例如,用户做了一个评论->评论有一个“由”的参考用户) 是应用程序中的“根”。(例如,博客文章->有一个关于博客文章的页面)

子文档:

只能使用一次/绝不是参考。(例如,评论保存在博客文章中) 在应用程序中从来都不是“根”。(评论只在博客页面中显示,但页面仍然是关于博客的)

是的,我们可以使用文件中的参考资料。就像SQL i连接一样填充另一个文档。在MongoDB中,它们没有连接来将一个关系文档映射到多个关系文档。相反,我们可以使用populate来实现我们的场景。

var mongoose = require('mongoose')
  , Schema = mongoose.Schema
  
var personSchema = Schema({
  _id     : Number,
  name    : String,
  age     : Number,
  stories : [{ type: Schema.Types.ObjectId, ref: 'Story' }]
});

var storySchema = Schema({
  _creator : { type: Number, ref: 'Person' },
  title    : String,
  fans     : [{ type: Number, ref: 'Person' }]
});

填充是自动用其他集合中的文档替换文档中的指定路径的过程。我们可以填充单个文档、多个文档、普通对象、多个普通对象或从查询返回的所有对象。让我们来看一些例子。

更多信息请访问:http://mongoosejs.com/docs/populate.html

MongoDB提供了无模式的自由,如果没有考虑或计划好,这个特性可能会导致长期的痛苦,

有2个选项,嵌入或引用。我不会详细解释定义,因为上面的答案已经很好地定义了它们。

当嵌入时,你应该回答一个问题,你嵌入的文档是否会增长,如果是,那么有多少(记住每个文档有16mb的限制)所以,如果你有一个帖子的评论,什么是评论计数的限制,如果这个帖子病毒式传播,人们开始添加评论。在这种情况下,引用可能是更好的选择(但甚至引用也会增长,达到16mb的限制)。

因此,如何平衡它,答案是不同模式的组合,检查这些链接,并根据您的用例创建自己的混合和匹配。

https://www.mongodb.com/blog/post/building-with-patterns-a-summary

https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

这与其说是科学,不如说是一门艺术。schema的Mongo文档是一个很好的参考,但是这里有一些事情需要考虑:

Put as much in as possible The joy of a Document database is that it eliminates lots of Joins. Your first instinct should be to place as much in a single document as you can. Because MongoDB documents have structure, and because you can efficiently query within that structure (this means that you can take the part of the document that you need, so document size shouldn't worry you much) there is no immediate need to normalize data like you would in SQL. In particular any data that is not useful apart from its parent document should be part of the same document. Separate data that can be referred to from multiple places into its own collection. This is not so much a "storage space" issue as it is a "data consistency" issue. If many records will refer to the same data it is more efficient and less error prone to update a single record and keep references to it in other places. Document size considerations MongoDB imposes a 4MB (16MB with 1.8) size limit on a single document. In a world of GB of data this sounds small, but it is also 30 thousand tweets or 250 typical Stack Overflow answers or 20 flicker photos. On the other hand, this is far more information than one might want to present at one time on a typical web page. First consider what will make your queries easier. In many cases concern about document sizes will be premature optimization. Complex data structures: MongoDB can store arbitrary deep nested data structures, but cannot search them efficiently. If your data forms a tree, forest or graph, you effectively need to store each node and its edges in a separate document. (Note that there are data stores specifically designed for this type of data that one should consider as well) It has also been pointed out than it is impossible to return a subset of elements in a document. If you need to pick-and-choose a few bits of each document, it will be easier to separate them out. Data Consistency MongoDB makes a trade off between efficiency and consistency. The rule is changes to a single document are always atomic, while updates to multiple documents should never be assumed to be atomic. There is also no way to "lock" a record on the server (you can build this into the client's logic using for example a "lock" field). When you design your schema consider how you will keep your data consistent. Generally, the more that you keep in a document the better.

对于您所描述的内容,我将嵌入注释,并为每个注释提供一个ObjectID id字段。ObjectID有一个嵌入的时间戳,所以你可以使用它而不是在你喜欢的时候创建。

实际上,我很好奇为什么没有人谈论UML规范。经验法则是,如果您有一个聚合,那么您应该使用引用。但如果它是一个组合,那么耦合更强,您应该使用嵌入式文档。

你很快就会明白为什么这是合乎逻辑的。如果一个对象可以独立于父对象而存在,那么即使父对象不存在,您也会希望访问它。因为不能将它嵌入到不存在的父节点中,所以必须让它活在自己的数据结构中。如果存在父对象,只需通过在父对象中添加对象的引用将它们链接在一起。

不知道这两种关系有什么区别? 下面是一个解释它们的链接: UML中的聚合与组合