我想设计一个带有一些评论的问题结构。注释应该使用哪种关系:嵌入还是引用?

一个带有注释的问题,比如stackoverflow,会有这样的结构:

Question
    title = 'aaa'
    content = 'bbb'
    comments = ???

一开始,我想使用嵌入式注释(我认为MongoDB中推荐使用embed),像这样:

Question
    title = 'aaa'
    content = 'bbb'
    comments = [ { content = 'xxx', createdAt = 'yyy'}, 
                 { content = 'xxx', createdAt = 'yyy'}, 
                 { content = 'xxx', createdAt = 'yyy'} ]

这很清楚,但我担心这种情况:如果我想编辑一个指定的评论,我如何获得它的内容和它的问题?没有_id让我找到一个,也没有question_ref让我找到它的问题。(也许有一种方法可以做到这一点没有_id和question_ref?)

我必须使用ref而不是embed吗?然后我必须为评论创建一个新的集合吗?


当前回答

这与其说是科学,不如说是一门艺术。schema的Mongo文档是一个很好的参考,但是这里有一些事情需要考虑:

Put as much in as possible The joy of a Document database is that it eliminates lots of Joins. Your first instinct should be to place as much in a single document as you can. Because MongoDB documents have structure, and because you can efficiently query within that structure (this means that you can take the part of the document that you need, so document size shouldn't worry you much) there is no immediate need to normalize data like you would in SQL. In particular any data that is not useful apart from its parent document should be part of the same document. Separate data that can be referred to from multiple places into its own collection. This is not so much a "storage space" issue as it is a "data consistency" issue. If many records will refer to the same data it is more efficient and less error prone to update a single record and keep references to it in other places. Document size considerations MongoDB imposes a 4MB (16MB with 1.8) size limit on a single document. In a world of GB of data this sounds small, but it is also 30 thousand tweets or 250 typical Stack Overflow answers or 20 flicker photos. On the other hand, this is far more information than one might want to present at one time on a typical web page. First consider what will make your queries easier. In many cases concern about document sizes will be premature optimization. Complex data structures: MongoDB can store arbitrary deep nested data structures, but cannot search them efficiently. If your data forms a tree, forest or graph, you effectively need to store each node and its edges in a separate document. (Note that there are data stores specifically designed for this type of data that one should consider as well) It has also been pointed out than it is impossible to return a subset of elements in a document. If you need to pick-and-choose a few bits of each document, it will be easier to separate them out. Data Consistency MongoDB makes a trade off between efficiency and consistency. The rule is changes to a single document are always atomic, while updates to multiple documents should never be assumed to be atomic. There is also no way to "lock" a record on the server (you can build this into the client's logic using for example a "lock" field). When you design your schema consider how you will keep your data consistent. Generally, the more that you keep in a document the better.

对于您所描述的内容,我将嵌入注释,并为每个注释提供一个ObjectID id字段。ObjectID有一个嵌入的时间戳,所以你可以使用它而不是在你喜欢的时候创建。

其他回答

这与其说是科学,不如说是一门艺术。schema的Mongo文档是一个很好的参考,但是这里有一些事情需要考虑:

Put as much in as possible The joy of a Document database is that it eliminates lots of Joins. Your first instinct should be to place as much in a single document as you can. Because MongoDB documents have structure, and because you can efficiently query within that structure (this means that you can take the part of the document that you need, so document size shouldn't worry you much) there is no immediate need to normalize data like you would in SQL. In particular any data that is not useful apart from its parent document should be part of the same document. Separate data that can be referred to from multiple places into its own collection. This is not so much a "storage space" issue as it is a "data consistency" issue. If many records will refer to the same data it is more efficient and less error prone to update a single record and keep references to it in other places. Document size considerations MongoDB imposes a 4MB (16MB with 1.8) size limit on a single document. In a world of GB of data this sounds small, but it is also 30 thousand tweets or 250 typical Stack Overflow answers or 20 flicker photos. On the other hand, this is far more information than one might want to present at one time on a typical web page. First consider what will make your queries easier. In many cases concern about document sizes will be premature optimization. Complex data structures: MongoDB can store arbitrary deep nested data structures, but cannot search them efficiently. If your data forms a tree, forest or graph, you effectively need to store each node and its edges in a separate document. (Note that there are data stores specifically designed for this type of data that one should consider as well) It has also been pointed out than it is impossible to return a subset of elements in a document. If you need to pick-and-choose a few bits of each document, it will be easier to separate them out. Data Consistency MongoDB makes a trade off between efficiency and consistency. The rule is changes to a single document are always atomic, while updates to multiple documents should never be assumed to be atomic. There is also no way to "lock" a record on the server (you can build this into the client's logic using for example a "lock" field). When you design your schema consider how you will keep your data consistent. Generally, the more that you keep in a document the better.

对于您所描述的内容,我将嵌入注释,并为每个注释提供一个ObjectID id字段。ObjectID有一个嵌入的时间戳,所以你可以使用它而不是在你喜欢的时候创建。

如果我想编辑一个指定的评论,如何获得它的内容和它的问题?

您可以通过子文档进行查询:Content ': 'xxx'})。

这将返回整个Question文档。要编辑指定的评论,您必须在客户端上找到该评论,进行编辑并将其保存回DB。

一般来说,如果您的文档包含一个对象数组,您会发现需要在客户端修改这些子对象。

如果我想编辑一个指定的评论,我如何获得它的内容和 它的问题吗?

如果您已经记录了注释的数量和想要修改的注释的索引,那么您可以使用点操作符(SO示例)。

你可以用f.ex。

db.questions.update(
    {
        "title": "aaa"       
    }, 
    { 
        "comments.0.contents": "new text"
    }
)

(作为另一种编辑问题内评论的方式)

实际上,我很好奇为什么没有人谈论UML规范。经验法则是,如果您有一个聚合,那么您应该使用引用。但如果它是一个组合,那么耦合更强,您应该使用嵌入式文档。

你很快就会明白为什么这是合乎逻辑的。如果一个对象可以独立于父对象而存在,那么即使父对象不存在,您也会希望访问它。因为不能将它嵌入到不存在的父节点中,所以必须让它活在自己的数据结构中。如果存在父对象,只需通过在父对象中添加对象的引用将它们链接在一起。

不知道这两种关系有什么区别? 下面是一个解释它们的链接: UML中的聚合与组合

一般来说,如果实体之间有一对一或一对多的关系,则嵌入是很好的选择;如果实体之间有多对多的关系,则引用是很好的选择。