Secondly, users can create race conditions due to concurrent requests against the same entity. It may be quite undesirable to save conflicting events and resolve them after the fact. So it is important to be able to prevent conflicting events. To scale request load, it is common to use stateless services while preventing write conflicts using conditional writes (only write if the last entity event was #x). A.k.a. Optimistic Concurrency. Kafka does not support optimistic concurrency. Even if it supported it at the topic level, it would need to be all the way down to the entity level to be effective. To use Kafka and prevent conflicting events, you would need to use a stateful, serialized writer (per "shard" or whatever is Kafka's equivalent) at the application level. This is a significant architectural requirement/restriction.



Kafka is designed to solve giant-scale data problems. An app-controlled source of truth is a smaller scale, in-depth solution. Using event sourcing to good effect requires crafting events and streams to match the business processes. This usually has a much higher level of detail than would be generally useful to at-scale consumers. Consider if your bank statement contained an entry for every step of a bank's internal transaction processes. A single deposit or withdrawal could have many entries before it is confirmed to your account. The bank needs that level of detail to process transactions. But it's mostly inscrutable bank jargon (domain-specific language) to you, unusable for reconciling your account. Instead, the bank publishes separate events for consumers. These are course-grained summaries of each completed transaction. These summary events are what consumers know as "transactions" on their bank statement.

When I asked myself the same question as the OP, I wanted to know if Kafka was a scaling option for event sourcing. But perhaps a better question is whether it makes sense for my event sourced solution to operate at a giant scale. I can't speak to every case, but I think often it does not. When this scale enters the picture, like with the bank statement example, the granularity of events tends to be different. My event sourced system should probably publish course-grained events to the Kafka cluster to feed at-scale consumers rather than use Kafka as internal storage.

Scale can still be needed for event sourcing. Strategies differ depending on why. Often event streams have a "done" or "no-longer-useful" state. Archiving those streams is a good answer if event size/volume is a problem. Sharding is another option -- a perfect fit for regional- or tenant-isolated scenarios. In less siloed scenarios, when streams are arbitrarily related in a way that can cross shard boundaries, sharding is still the move (partition by stream ID). But there are no order guarantees across streams, which can make the event consumer's job harder. For example, the consumer may receive transaction events before it receives events describing the accounts involved. The first instinct is to "just use timestamps" to order received events. But it is still not possible to guarantee perfect occurrence order. Too many uncontrollable factors. Network hiccups, clock drift, cosmic rays, etc. Ideally you design the consumer to not require cross-stream dependencies. Have a strategy for temporarily missing data. Like progressive enhancement for data. If you really need the data to be unavailable instead of incomplete, use the same tactic. But keep the incomplete data in a separate area or marked unavailable until it's all filled in. You can also just attempt to process each event, knowing it may fail due to missing prerequisites. Put failed events in a retry queue, processing next events, and retry failed events later. But watch out for poison messages (events).






In distributed scenarios, I've seen a couple of different implementations. Jet's Panther project uses Azure CosmosDB, with the Change Feed feature to notify listeners. Another similar implementation I've heard about on AWS is using DynamoDB with its Streams feature to notify listeners. The partition key probably should be the stream id for best data distribution (to lessen the amount of over-provisioning). However, a full replay across streams in Dynamo is expensive (read and cost-wise). So this impl was also setup for Dynamo Streams to dump events to S3. When a new listener comes online, or an existing listener wants a full replay, it would read S3 to catch up first.






然而,如果你看一下Greg Young在2010年的论文,它从第32页开始就很好地总结了这个想法,但它没有包含最终的定义,所以我自己大胆地阐述了它。




将新的实体状态保存到数据库中 从数据库检索实体状态

这就是Greg谈论实体流概念的地方,其中每个实体都有自己的事件流,由实体id唯一标识。当您有一个数据库,它能够通过实体id读取所有实体事件(读取流)时,使用Event Sourcing不是一个困难的问题。

As Greg's paper mentions Event Sourcing in the context of CQRS, he explains why those two concepts play nicely with each other. Although, you have a database full of atomic state mutations for a bunch of entities, querying across the current state of multiple entities is hard work. The issue is solved by separating the transactional (event-sourced) store that is used as the source of truth, and the reporting (query, read) store, which is used for reports and queries of the current system state across multiple entities. The query store doesn't contain any events, it contains the projected state of multiple entities, composed based on the needs for querying data. It doesn't necessarily need to contain snapshots of each entity, you are free to choose the shape and form of the query model, as long as you can project your events to that model.


We also know that we need the entity state in hand when making decisions about its allowed state transition. For example, a money transfer that has already been executed, should not be executed twice. As the query model is by definition stale (even for milliseconds), it becomes dangerous when you make decisions on stale data. Therefore, we use the most recent, and totally consistent state from the transactional (event) store to reconstruct the entity state when executing operations on the entity.



使用实体id作为键,将事件附加到有序的、只能追加的日志中 使用实体id作为键,按顺序加载单个实体的所有事件 删除给定实体的所有事件,使用实体id作为键 支持实时订阅项目事件以查询模型


Kafka是一个高度可伸缩的消息代理,基于仅追加日志。Kafka中的消息是根据主题生成的,现在一个主题通常包含一个单独的消息类型,以便更好地使用模式注册表。主题可以是CPU -load,其中我们为许多服务器生成CPU负载的时间序列测量。



Can you append events to Kafka? Yes, it's called produce. Can you append events with the entity id as a key? Not really, as the partition key is used to distribute messages across partitions, so it's really just a partition key. One thing mentioned in another answer is optimistic concurrency. If you worked with a relational database, you probably used the Version column. For NoSQL databases you might have used the document eTag. Both allow you to ensure that you update the entity that is in the state that you know about, and it hasn't been mutated during your operation. Kafka does not provide you with anything to support optimistic concurrency for such state transitions. Can you read all the events for a single entity from a Kafka topic, using the entity id as a key? No, you can't. As Kafka is not a database, it has no index on its topics, so the only way to retrieve messages from a topic is to consume them. Can you delete events from Kafka using the entity id as a key? No, it's impossible. Messages get removed from the topic only after their retention period expires. Can you subscribe to a Kafka topic to receive live (and historical) events in order, so you can project them to your query models? Yes, and because topics are partitioned, you can scale out your projections to increase performance.


I believe that the reason why a lot of people claim that Kafka is a good choice to be an event store for event-sourced systems is that they confuse Event Sourcing with simple pub-sub (you can use a hype word "EDA", or Event-Driven Architecture instead). Using message brokers to fan out events to other system components is a pattern known for decades. The issue with "classic" brokers as that messages are gone as soon as they are consumed, so you cannot build something like a query model that would be built from history. Another issue is that when projecting events, you want them to be consumed in the same order as they are produced, and "classic" brokers normally aim to support the competing consumers pattern, which doesn't support ordered message processing by definition. Make no mistake, Kafka does not support competing consumers, it has a limitation of one consumer per one or more partitions, but not the other way around. Kafka solved the ordering issue, and historical messages retention issue quite nicely. So, you can now build query models from events you push through Kafka. But that's not what the original idea of Event Sourcing is about, it's what we today call EDA. As soon as this separation is clear, we, hopefully, stop seeing claims that any append-only event log is a good candidate to be an event store database for event-sourced systems.


Kafka only guarantees at least once deliver and there are duplicates in the event store that cannot be removed. Update: Here you can read why it is so hard with Kafka and some latest news about how to finally achieve this behavior: https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/ Due to immutability, there is no way to manipulate event store when application evolves and events need to be transformed (there are of course methods like upcasting, but...). Once might say you never need to transform events, but that is not correct assumption, there could be situation where you do backup of original, but you upgrade them to latest versions. That is valid requirement in event driven architectures. No place to persist snapshots of entities/aggregates and replay will become slower and slower. Creating snapshots is must feature for event store from long term perspective. Given Kafka partitions are distributed and they are hard to manage and backup compare with databases. Databases are simply simpler :-)







