这个问题是关于在深入研究实验和实现的细节之前做出架构选择。这是关于elasticsearch与MongoDB在可伸缩性和性能方面的适用性,用于某种特定的目的。

假设两者都存储有字段和值的数据对象,并允许查询对象体。因此,根据所选择的字段,过滤出对象的子集,这两者都适用。

My application will revolve around selecting objects according to criteria. It would select objects by filtering simultaneously by more than a single field, put differently, its query filtering criteria would typically comprise anywhere between 1 and 5 fields, maybe more in some cases. Whereas the fields chosen as filters would be a subset of a much larger amount of fields. Picture some 20 field names existing, and each query is an attempt to filter the objects by few fields out of those overall 20 fields (It can be less or more than 20 overall field names existing, I just used this number to demonstrate the ratio of fields to fields used as filters in every discrete query). The filtering can be by the existence of the chosen fields, as well as by the field values, e.g. filtering out objects that have field A, and their field B is between x and y, and their field C is equal to w.

我的应用程序将不断地进行这种过滤,而对于在任何时候使用哪些字段进行过滤,将没有任何常数或很少常数。也许在elasticsearch中需要定义索引,但也许即使没有索引,速度也与MongoDB相当。

至于数据进入商店,没有特别的细节。对象在插入后几乎不会改变。也许旧对象需要被删除,我想假设两个数据存储都支持在内部或通过应用程序查询过期删除东西。(更不常见的情况是,适合某个查询的对象也需要被删除)。

你怎么看? 你在这方面做过实验吗?

对于这类任务,我感兴趣的是两个数据存储的性能和可伸缩性。这是一种体系结构设计问题,欢迎提供特定于商店的选项或查询基础的细节,以展示经过充分考虑的建议。

谢谢!


首先,这里有一个重要的区别:MongoDB是一个通用数据库,Elasticsearch是一个由Lucene支持的分布式文本搜索引擎。人们一直在谈论使用Elasticsearch作为通用数据库,但知道这不是它的最初设计。我认为通用NoSQL数据库和搜索引擎正在走向整合,但就目前情况来看,这两者来自两个截然不同的阵营。

We are using both MongoDB and Elasticsearch in my company. We store our data in MongoDB and use Elasticsearch exclusively for its' full-text search capabilities. We only send a subset of the mongo data fields that we need to query to elastic. Our use case differs from yours in that our Mongo data changes all the time: a record, or a subset of the fields of a record, can be updated several times a day and this can call for re-indexing of that record to elastic. For that reason alone, using elastic as the sole data store is not a good option for us, as we can't update select fields; we would need to re-index a document in its' entirety. This is not an elastic limitation, this is how Lucene works, the underlying search engine behind elastic. In your case, the fact that records won't be changed once stored saves you from having to make that choice. Having said that, if data safety is a concern, I would think twice about using Elasticsearch as the only storage mechanism for your data. It may get there at some point but I'm not sure it's there yet.

在速度方面,Elastic/Lucene不仅与Mongo的查询速度相当,在你的情况下,“在任何时候用于过滤的字段方面几乎没有常数”,它可能会快几个数量级,特别是当数据集变得更大时。区别在于底层的查询实现:

Elastic/Lucene use the Vector Space Model and inverted indexes for Information Retrieval, which are highly efficient ways of comparing record similarity against a query. When you query Elastic/Lucene, it already knows the answer; most of its' work lies in ranking the results for you by the most likely ones to match your query terms. This is an important point: search engines, as opposed to databases, can't guarantee you exact results; they rank results by how close they get to your query. It just so happens that most of the times, the results are close to exact. Mongo's approach is that of a more general purpose data store; it compares JSON documents against one another. You can get great performance out of it by all means, but you need to carefully craft your indexes to match the queries you will be running. Specifically, if you have multiple fields by which you will query, you need to carefully craft your compound keys so that they reduce the dataset that will be queried as fast as possible. E.g. your first key should filter down the majority of your dataset, your second should further filter down what left, and so on and so forth. If your queries don't match the keys and the order of those keys in the defined indexes, your performance will drop quite a bit. On the other hand, Mongo is a true database, so if accuracy is what what you need, the answers it will give will be spot on.

对于过期的旧记录,Elastic有一个内置的TTL特性。我想Mongo刚刚在2.2版引入了它。

由于我不知道您的其他要求,如预期的数据大小、事务、准确性或过滤器的外观,因此很难给出任何具体的建议。希望这里有足够的内容让您开始学习。