我想设计一个带有一些评论的问题结构。注释应该使用哪种关系:嵌入还是引用?

一个带有注释的问题,比如stackoverflow,会有这样的结构:

Question
    title = 'aaa'
    content = 'bbb'
    comments = ???

一开始,我想使用嵌入式注释(我认为MongoDB中推荐使用embed),像这样:

Question
    title = 'aaa'
    content = 'bbb'
    comments = [ { content = 'xxx', createdAt = 'yyy'}, 
                 { content = 'xxx', createdAt = 'yyy'}, 
                 { content = 'xxx', createdAt = 'yyy'} ]

这很清楚,但我担心这种情况:如果我想编辑一个指定的评论,我如何获得它的内容和它的问题?没有_id让我找到一个,也没有question_ref让我找到它的问题。(也许有一种方法可以做到这一点没有_id和question_ref?)

我必须使用ref而不是embed吗?然后我必须为评论创建一个新的集合吗?


当前回答

MongoDB提供了无模式的自由,如果没有考虑或计划好,这个特性可能会导致长期的痛苦,

有2个选项,嵌入或引用。我不会详细解释定义,因为上面的答案已经很好地定义了它们。

当嵌入时,你应该回答一个问题,你嵌入的文档是否会增长,如果是,那么有多少(记住每个文档有16mb的限制)所以,如果你有一个帖子的评论,什么是评论计数的限制,如果这个帖子病毒式传播,人们开始添加评论。在这种情况下,引用可能是更好的选择(但甚至引用也会增长,达到16mb的限制)。

因此,如何平衡它,答案是不同模式的组合,检查这些链接,并根据您的用例创建自己的混合和匹配。

https://www.mongodb.com/blog/post/building-with-patterns-a-summary

https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

其他回答

如果我想编辑一个指定的评论,如何获得它的内容和它的问题?

您可以通过子文档进行查询:Content ': 'xxx'})。

这将返回整个Question文档。要编辑指定的评论,您必须在客户端上找到该评论,进行编辑并将其保存回DB。

一般来说,如果您的文档包含一个对象数组,您会发现需要在客户端修改这些子对象。

我在自己研究这个问题的时候看到了这个小演示。我惊讶于它的布局之好,无论是信息还是呈现方式。

http://openmymind.net/Multiple-Collections-Versus-Embedded-Documents

总结:

作为一般规则,如果您有很多[子文档]或它们很大,那么单独的集合可能是最好的。 更小和/或更少的文档往往更适合嵌入。

好吧,我有点晚了,但仍然想分享我的模式创建方法。

我有可以用一个词描述的所有事物的模式,就像在经典的OOP中那样。

E.G.

评论 账户 用户 博客 ...

每个模式都可以保存为Document或Subdocument,因此我对每个模式都声明了这一点。

文档:

可以作为参考。(例如,用户做了一个评论->评论有一个“由”的参考用户) 是应用程序中的“根”。(例如,博客文章->有一个关于博客文章的页面)

子文档:

只能使用一次/绝不是参考。(例如,评论保存在博客文章中) 在应用程序中从来都不是“根”。(评论只在博客页面中显示,但页面仍然是关于博客的)

这与其说是科学,不如说是一门艺术。schema的Mongo文档是一个很好的参考,但是这里有一些事情需要考虑:

Put as much in as possible The joy of a Document database is that it eliminates lots of Joins. Your first instinct should be to place as much in a single document as you can. Because MongoDB documents have structure, and because you can efficiently query within that structure (this means that you can take the part of the document that you need, so document size shouldn't worry you much) there is no immediate need to normalize data like you would in SQL. In particular any data that is not useful apart from its parent document should be part of the same document. Separate data that can be referred to from multiple places into its own collection. This is not so much a "storage space" issue as it is a "data consistency" issue. If many records will refer to the same data it is more efficient and less error prone to update a single record and keep references to it in other places. Document size considerations MongoDB imposes a 4MB (16MB with 1.8) size limit on a single document. In a world of GB of data this sounds small, but it is also 30 thousand tweets or 250 typical Stack Overflow answers or 20 flicker photos. On the other hand, this is far more information than one might want to present at one time on a typical web page. First consider what will make your queries easier. In many cases concern about document sizes will be premature optimization. Complex data structures: MongoDB can store arbitrary deep nested data structures, but cannot search them efficiently. If your data forms a tree, forest or graph, you effectively need to store each node and its edges in a separate document. (Note that there are data stores specifically designed for this type of data that one should consider as well) It has also been pointed out than it is impossible to return a subset of elements in a document. If you need to pick-and-choose a few bits of each document, it will be easier to separate them out. Data Consistency MongoDB makes a trade off between efficiency and consistency. The rule is changes to a single document are always atomic, while updates to multiple documents should never be assumed to be atomic. There is also no way to "lock" a record on the server (you can build this into the client's logic using for example a "lock" field). When you design your schema consider how you will keep your data consistent. Generally, the more that you keep in a document the better.

对于您所描述的内容,我将嵌入注释,并为每个注释提供一个ObjectID id字段。ObjectID有一个嵌入的时间戳,所以你可以使用它而不是在你喜欢的时候创建。

我知道这是相当古老的,但如果你正在寻找OP关于如何只返回指定注释的问题的答案,你可以像这样使用$(查询)操作符:

db.question.update({'comments.content': 'xxx'}, {'comments.$': true})