什么时候不使用Cassandra?

最近有很多关于卡桑德拉的话题。

Twitter, Digg, Facebook等都在使用它。

什么时候有意义:

使用卡桑德拉, 不用卡桑德拉，还有使用RDMS而不是Cassandra。

当前回答

在这里，我将重点介绍一些重要的方面，这些方面可以帮助你决定是否真的需要卡桑德拉。这个清单并不详尽，只是我脑海中最重要的一些观点

Don't consider Cassandra as the first choice when you have a strict requirement on the relationship (across your dataset). Cassandra by default is AP system (of CAP). But, it supports tunable consistency which means it can be configured to support as CP as well. So don't ignore it just because you read somewhere that it's AP and you are looking for CP systems. Cassandra is more accurately termed “tuneably consistent,” which means it allows you to easily decide the level of consistency you require, in balance with the level of availability. Don't use Cassandra if your scale is not much or if you can deal with a non-distributed DB. Think harder if your team thinks that all your problems will be solved if you use distributed DBs like Cassandra. To start with these DBs is very simple as it comes with many defaults but optimizing and mastering it for solving a specific problem would require a good (if not a lot) amount of engineering effort. Cassandra is column-oriented but at the same time each row also has a unique key. So, it might be helpful to think of it as an indexed, row-oriented store. You can even use it as a document store. Cassandra doesn't force you to define the fields beforehand. So, if you are in a startup mode or your features are evolving (as in agile) - Cassandra embraces it. So better, first think about queries and then think about data to answer them. Cassandra is optimized for really high throughput on writes. If your use case is read-heavy (like cache) then Cassandra might not be an ideal choice.

2019-08-06 10:21:05

其他回答

让我们来读一些真实的案例:

http://planetcassandra.org/apache-cassandra-use-cases/

本文地址:http://planetcassandra.org/blog/post/agentis-energy-stores-over-15-billion-records-of-time-series-usage-data-in-apache-cassandra

他们详细阐述了不选择MySql的原因，因为数据库同步太慢。

(也是由于2- phase commit, FK, PK)

Cassandra基于Amazon Dynamo纸

特点:

稳定

高可用性

备份性能良好

读写比HBase好，(java中的BigTable克隆)。

wiki http://en.wikipedia.org/wiki/Apache_Cassandra

他们的结论是:

We looked at HBase, Dynamo, Mongo and Cassandra. 

Cassandra was simply the best storage solution for the majority of our data.

截至2018年，

如果你需要支援，我建议你用ScyllaDB代替经典的cassandra。

Postgres kv插件也比cassandra快。无论如何不会有多实例可伸缩性。

2014-10-07 03:59:00

你应该问自己以下问题:

(Volume, Velocity) Will you be writing and reading TONS of information , so much information that no one computer could handle the writes. (Global) Will you need this writing and reading capability around the world so that the writes in one part of the world are accessible in another part of the world? (Reliability) Do you need this database to be up and running all the time and never go down regardless of which Cloud, which country, whether it's VM , Container, or Bare metal? (Scale-ability) Do you need this database to be able to continue to grow easily and scale linearly (Consistency) Do you need TUNABLE consistency where some writes can happen asynchronously where as others need to be certified? (Skill) Are you willing to do what it takes to learn this technology and the data modeling that goes with creating a globally distributed database that can be fast for everyone, everywhere?

如果在这些问题中，你认为“可能”或“不”，你应该用别的词。如果你对所有问题的答案都是“当然”，那么你应该用卡桑德拉。

当你可以在一个盒子上做所有事情时，使用RDBMS。它可能比大多数方法都简单，任何人都可以使用它。

2019-03-15 13:44:49

除了上面给出的关于何时使用和何时不使用Cassandra的答案外，如果你决定使用Cassandra，你可能会考虑不使用Cassandra本身，而是使用它的众多表亲之一。

上面的一些答案已经指出了各种“NoSQL”系统，它们与Cassandra有许多相同的属性，有一些或大或小的差异，并且可能比Cassandra本身更适合您的特定需求。

Additionally, recently (several years after this question was originally asked), a Cassandra clone called Scylla (see https://en.wikipedia.org/wiki/Scylla_(database)) was released. Scylla is an open-source re-implementation of Cassandra in C++, which claims to have significantly higher throughput and lower latencies than the original Java Cassandra, while being mostly compatible with it (in features, APIs, and file formats). So if you're already considering Cassandra, you may want to consider Scylla as well.

2017-11-07 09:51:11

除了这里的其他答案之外，沉重的单个查询与无数的轻查询负载是另一个需要考虑的问题。在nosql风格的DB中自动优化单个查询本身就比较困难。我使用过MongoDB，在尝试计算复杂查询时遇到了性能问题。我没有使用Cassandra，但我预计它会有同样的问题。

另一方面，如果您的负载预期是许多小型查询的负载，并且您希望能够轻松地向外扩展，那么您可以利用大多数NoSql数据库提供的最终一致性。注意，最终一致性实际上不是非关系数据模型的特性，但是在基于nosql的系统中实现和设置一致性要容易得多。

For a single, very heavy query, any modern RDBMS engine can do a decent job parallelizing parts of the query and take advantage of as much CPU and memory you throw at it (on a single machine). NoSql databases don't have enough information about the structure of the data to be able to make assumptions that will allow truly intelligent parallelization of a big query. They do allow you to easily scale out more servers (or cores) but once the query hits a complexity level you are basically forced to split it apart manually to parts that the NoSql engine knows how to deal with intelligently.

根据我使用MongoDB的经验，由于查询的复杂性，MongoDB最终无法对其进行优化，也无法在多个数据上运行部分查询。Mongo可以并行多个查询，但不太擅长优化单个查询。

2013-04-09 14:36:09

Cassandra是个不错的选择，如果:

您不需要DB中的ACID属性。 DB上会有大量的写操作。需要与大数据、Hadoop、Hive和Spark集成。需要实时数据分析和生成报告。有一个强大的容错机制的要求。有一个齐次系统的要求。调优需要大量的自定义。

2018-03-21 16:53:52

什么时候不使用Cassandra?

推荐文章

最新文章

标签