假设我们有一个用户、钱包REST微服务和一个API网关,它将这些东西粘合在一起。当Bob在我们的网站上注册时,我们的API网关需要通过user微服务创建一个用户,并通过wallet微服务创建一个钱包。

下面是一些可能出错的情况:

用户Bob创建失败:没关系,我们只是向Bob返回一个错误消息。我们使用SQL事务,所以没有人在系统中看到Bob。一切都很好:) 用户Bob已经创建,但是在我们的钱包创建之前,我们的API网关硬崩溃了。我们现在有一个没有钱包的用户(数据不一致)。 创建了用户Bob,当我们创建钱包时,HTTP连接断开。钱包的创建可能成功,也可能失败。

有什么解决方案可以防止这种数据不一致的发生?是否存在允许事务跨越多个REST请求的模式?我读过维基百科上关于两阶段提交的页面,它似乎涉及到这个问题,但我不确定如何在实践中应用它。这篇原子分布式事务:一篇RESTful设计论文看起来也很有趣,尽管我还没有读过。

或者,我知道REST可能不适合这个用例。处理这种情况的正确方法可能是完全放弃REST并使用不同的通信协议,如消息队列系统?或者我应该在我的应用程序代码中强制一致性(例如,通过有一个后台作业来检测不一致并修复它们,或者通过在我的用户模型上有一个“state”属性与“creating”,“created”值等)?


当前回答

所有分布式系统都存在事务一致性问题。最好的方法是像你说的那样,进行两阶段提交。在挂起状态下创建钱包和用户。创建该用户后,进行单独调用以激活该用户。

最后一个调用应该是安全可重复的(以防连接断开)。

这将需要最后一次调用知道两个表(这样就可以在单个JDBC事务中完成)。

或者,您可能想要考虑一下为什么您如此担心没有钱包的用户。你认为这会引起问题吗?如果是这样,也许将它们作为单独的rest调用是一个坏主意。如果用户没有钱包就不应该存在,那么您可能应该将钱包添加到用户中(在最初的POST调用中创建用户)。

其他回答

This is a classic question I was asked during an interview recently How to call multiple web services and still preserve some kind of error handling in the middle of the task. Today, in high performance computing, we avoid two phase commits. I read a paper many years ago about what was called the "Starbuck model" for transactions: Think about the process of ordering, paying, preparing and receiving the coffee you order at Starbuck... I oversimplify things but a two phase commit model would suggest that the whole process would be a single wrapping transaction for all the steps involved until you receive your coffee. However, with this model, all employees would wait and stop working until you get your coffee. You see the picture ?

相反,“星巴克模式”通过遵循“尽最大努力”模式并补偿过程中的错误,更有成效。首先,他们会确保你付钱!然后,将您的订单附加到杯子上的消息队列。如果在这个过程中出现了问题,比如你没有拿到你的咖啡,这不是你点的东西,等等,我们会进入赔偿过程,我们会确保你得到你想要的东西或退款给你,这是提高生产力的最有效的模式。

有时,星巴克浪费了一杯咖啡,但整个过程是高效的。在构建web服务时,还有其他一些技巧需要考虑,比如将它们设计成可以被任意次调用并且仍然提供相同的最终结果。所以我的建议是:

Don't be too fine when defining your web services (I am not convinced about the micro-service hype happening these days: too many risks of going too far); Async increases performance so prefer being async, send notifications by email whenever possible. Build more intelligent services to make them "recallable" any number of times, processing with an uid or taskid that will follow the order bottom-top until the end, validating business rules in each step; Use message queues (JMS or others) and divert to error handling processors that will apply operations to "rollback" by applying opposite operations, by the way, working with async order will require some sort of queue to validate the current state of the process, so consider that; In last resort, (since it may not happen often), put it in a queue for manual processing of errors.

让我们回到最初的问题。创建一个账户,创建一个钱包,确保一切都完成了。

假设调用一个web服务来编排整个操作。

web服务的伪代码如下所示:

Call Account creation microservice, pass it some information and a some unique task id 1.1 Account creation microservice will first check if that account was already created. A task id is associated with the account's record. The microservice detects that the account does not exist so it creates it and stores the task id. NOTE: this service can be called 2000 times, it will always perform the same result. The service answers with a "receipt that contains minimal information to perform an undo operation if required". Call Wallet creation, giving it the account ID and task id. Let's say a condition is not valid and the wallet creation cannot be performed. The call returns with an error but nothing was created. The orchestrator is informed of the error. It knows it needs to abort the Account creation but it will not do it itself. It will ask the wallet service to do it by passing its "minimal undo receipt" received at the end of step 1. The Account service reads the undo receipt and knows how to undo the operation; the undo receipt may even include information about another microservice it could have called itself to do part of the job. In this situation, the undo receipt could contain the Account ID and possibly some extra information required to perform the opposite operation. In our case, to simplify things, let's say is simply delete the account using its account id. Now, let's say the web service never received the success or failure (in this case) that the Account creation's undo was performed. It will simply call the Account's undo service again. And this service should normaly never fail because its goal is for the account to no longer exist. So it checks if it exists and sees nothing can be done to undo it. So it returns that the operation is a success. The web service returns to the user that the account could not be created.

This is a synchronous example. We could have managed it in a different way and put the case into a message queue targeted to the help desk if we don't want the system to completly recover the error". I've seen this being performed in a company where not enough hooks could be provided to the back end system to correct situations. The help desk received messages containing what was performed successfully and had enough information to fix things just like our undo receipt could be used for in a fully automated way.

我已经进行了搜索,微软网站对这种方法有一个模式描述。它被称为补偿事务模式:

补偿事务模式

有什么解决方案可以防止这种数据不一致的发生?

Traditionally, distributed transaction managers are used. A few years ago in the Java EE world you might have created these services as EJBs which were deployed to different nodes and your API gateway would have made remote calls to those EJBs. The application server (if configured correctly) automatically ensures, using two phase commit, that the transaction is either committed or rolled back on each node, so that consistency is guaranteed. But that requires that all the services be deployed on the same type of application server (so that they are compatible) and in reality only ever worked with services deployed by a single company.

是否存在允许事务跨越多个REST请求的模式?

对于SOAP(好吧,不是REST),有WS-AT规范,但我曾经集成过的任何服务都不支持该规范。对于REST, JBoss已经准备好了一些东西。否则,“模式”是要么找到一个可以插入到您的体系结构中的产品,要么构建您自己的解决方案(不推荐)。

我已经为Java EE发布了这样一个产品:https://github.com/maxant/genericconnector

根据您引用的论文,还有来自Atomikos的Try-Cancel/Confirm模式和相关Product。

BPEL引擎使用补偿处理远程部署服务之间的一致性。

或者,我知道REST可能不适合这个用例。处理这种情况的正确方法可能是完全放弃REST并使用不同的通信协议,如消息队列系统?

将非事务资源“绑定”到事务中有很多方法:

As you suggest, you could use a transactional message queue, but it will be asynchronous, so if you depend on the response it becomes messy. You could write the fact that you need to call the back end services into your database, and then call the back end services using a batch. Again, async, so can get messy. You could use a business process engine as your API gateway to orchestrate the back end microservices. You could use remote EJB, as mentioned at the start, since that supports distributed transactions out of the box.

或者我应该在我的应用程序代码中强制一致性(例如,通过有一个后台作业来检测不一致并修复它们,或者通过在我的用户模型上有一个“state”属性与“creating”,“created”值等)?

玩魔鬼的拥护者:为什么要建立这样的东西,当有产品为你做这件事(见上文),可能比你做得更好,因为他们是经过试验和测试的?

就我个人而言,我喜欢微服务的想法,用例定义的模块,但正如你的问题所提到的,它们对银行、保险、电信等经典业务有适应问题……

分布式事务,正如许多人提到的,不是一个好的选择,人们现在更多的是最终一致的系统,但我不确定这是否适用于银行,保险等....

我写了一篇博客关于我提出的解决方案,也许这可以帮助你....

https://mehmetsalgar.wordpress.com/2016/11/05/micro-services-fan-out-transaction-problems-and-solutions-with-spring-bootjboss-and-netflix-eureka/

所有分布式系统都存在事务一致性问题。最好的方法是像你说的那样,进行两阶段提交。在挂起状态下创建钱包和用户。创建该用户后,进行单独调用以激活该用户。

最后一个调用应该是安全可重复的(以防连接断开)。

这将需要最后一次调用知道两个表(这样就可以在单个JDBC事务中完成)。

或者,您可能想要考虑一下为什么您如此担心没有钱包的用户。你认为这会引起问题吗?如果是这样,也许将它们作为单独的rest调用是一个坏主意。如果用户没有钱包就不应该存在,那么您可能应该将钱包添加到用户中(在最初的POST调用中创建用户)。

在微服务世界中,服务之间的通信应该通过rest客户端或消息队列。有两种方法可以跨服务处理事务,具体取决于服务之间的通信方式。我个人更喜欢消息驱动的体系结构,这样长时间的事务对用户来说应该是一个无阻塞的操作。 让我们举个例子来解释一下:

使用事件Create user创建用户BOB,并将消息推送到消息总线。 订阅此事件的钱包服务可以创建用户对应的钱包。

The one thing which you have to take care is to select a robust reliable message backbone which can persists the state in case of failure. You can use kafka or rabbitmq for messaging backbone. There will be a delay in execution because of eventual consistency but that can be easily updated through socket notification. A notifications service/task manager framework can be a service which update the state of the transactions through asynchronous mechanism like sockets and can help UI to update show the proper progress.