我想设计一个带有一些评论的问题结构。注释应该使用哪种关系:嵌入还是引用?
一个带有注释的问题,比如stackoverflow,会有这样的结构:
Question
title = 'aaa'
content = 'bbb'
comments = ???
一开始,我想使用嵌入式注释(我认为MongoDB中推荐使用embed),像这样:
Question
title = 'aaa'
content = 'bbb'
comments = [ { content = 'xxx', createdAt = 'yyy'},
{ content = 'xxx', createdAt = 'yyy'},
{ content = 'xxx', createdAt = 'yyy'} ]
这很清楚,但我担心这种情况:如果我想编辑一个指定的评论,我如何获得它的内容和它的问题?没有_id让我找到一个,也没有question_ref让我找到它的问题。(也许有一种方法可以做到这一点没有_id和question_ref?)
我必须使用ref而不是embed吗?然后我必须为评论创建一个新的集合吗?
这与其说是科学,不如说是一门艺术。schema的Mongo文档是一个很好的参考,但是这里有一些事情需要考虑:
Put as much in as possible
The joy of a Document database is that it eliminates lots of Joins. Your first instinct should be to place as much in a single document as you can. Because MongoDB documents have structure, and because you can efficiently query within that structure (this means that you can take the part of the document that you need, so document size shouldn't worry you much) there is no immediate need to normalize data like you would in SQL. In particular any data that is not useful apart from its parent document should be part of the same document.
Separate data that can be referred to from multiple places into its own collection.
This is not so much a "storage space" issue as it is a "data consistency" issue. If many records will refer to the same data it is more efficient and less error prone to update a single record and keep references to it in other places.
Document size considerations
MongoDB imposes a 4MB (16MB with 1.8) size limit on a single document. In a world of GB of data this sounds small, but it is also 30 thousand tweets or 250 typical Stack Overflow answers or 20 flicker photos. On the other hand, this is far more information than one might want to present at one time on a typical web page. First consider what will make your queries easier. In many cases concern about document sizes will be premature optimization.
Complex data structures:
MongoDB can store arbitrary deep nested data structures, but cannot search them efficiently. If your data forms a tree, forest or graph, you effectively need to store each node and its edges in a separate document. (Note that there are data stores specifically designed for this type of data that one should consider as well)
It has also been pointed out than it is impossible to return a subset of elements in a document. If you need to pick-and-choose a few bits of each document, it will be easier to separate them out.
Data Consistency
MongoDB makes a trade off between efficiency and consistency. The rule is changes to a single document are always atomic, while updates to multiple documents should never be assumed to be atomic. There is also no way to "lock" a record on the server (you can build this into the client's logic using for example a "lock" field). When you design your schema consider how you will keep your data consistent. Generally, the more that you keep in a document the better.
对于您所描述的内容,我将嵌入注释,并为每个注释提供一个ObjectID id字段。ObjectID有一个嵌入的时间戳,所以你可以使用它而不是在你喜欢的时候创建。
MongoDB提供了无模式的自由,如果没有考虑或计划好,这个特性可能会导致长期的痛苦,
有2个选项,嵌入或引用。我不会详细解释定义,因为上面的答案已经很好地定义了它们。
当嵌入时,你应该回答一个问题,你嵌入的文档是否会增长,如果是,那么有多少(记住每个文档有16mb的限制)所以,如果你有一个帖子的评论,什么是评论计数的限制,如果这个帖子病毒式传播,人们开始添加评论。在这种情况下,引用可能是更好的选择(但甚至引用也会增长,达到16mb的限制)。
因此,如何平衡它,答案是不同模式的组合,检查这些链接,并根据您的用例创建自己的混合和匹配。
https://www.mongodb.com/blog/post/building-with-patterns-a-summary
https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1