我有一种感觉,肯定存在客户端-服务器同步模式。但我完全没能做到。

情况很简单——服务器是中心节点,多个客户端连接并操作相同的数据。数据可以按原子分割,在发生冲突的情况下,服务器上的任何内容都具有优先级(以避免让用户陷入冲突解决)。由于潜在的大量数据,部分同步是首选。

对于这种情况,是否有任何模式/良好的实践,或者如果你不知道任何模式/良好的实践,你会采取什么方法?

以下是我现在认为如何解决它: 与数据并行的是,将保存一个修改日志,其中所有事务都有时间戳。 当客户端连接时,它以合并的形式接收自上次检查以来的所有更改(服务器遍历列表并删除添加的内容,然后再删除,合并每个原子的更新,等等)。 瞧,我们是最新的。

另一种方法是为每条记录保留修改日期,而不是执行数据删除,只是将它们标记为已删除。

任何想法吗?


当前回答

本页用模式和示例代码清楚地描述了数据同步的大多数场景:数据同步:模式、工具和技术

它是我发现的最全面的源代码,考虑了整个增量同步、如何处理删除以及服务器到客户端和客户端到服务器同步的策略。这是一个非常好的起点,值得一看。

其他回答

The question is not crystal clear, but I'd look into optimistic locking if I were you. It can be implemented with a sequence number that the server returns for each record. When a client tries to save the record back, it will include the sequence number it received from the server. If the sequence number matches what's in the database at the time when the update is received, the update is allowed and the sequence number is incremented. If the sequence numbers don't match, the update is disallowed.

您应该了解分布式变更管理是如何工作的。查看管理增量工作的SVN、CVS和其他存储库。

您有几个用例。

Synchronize changes. Your change-log (or delta history) approach looks good for this. Clients send their deltas to the server; server consolidates and distributes the deltas to the clients. This is the typical case. Databases call this "transaction replication". Client has lost synchronization. Either through a backup/restore or because of a bug. In this case, the client needs to get the current state from the server without going through the deltas. This is a copy from master to detail, deltas and performance be damned. It's a one-time thing; the client is broken; don't try to optimize this, just implement a reliable copy. Client is suspicious. In this case, you need to compare client against server to determine if the client is up-to-date and needs any deltas.

您应该遵循数据库(和SVN)设计模式,按顺序为每个更改编号。这样,客户端可以在尝试同步之前提出一个简单的请求(“我应该有什么修订?”)。即使这样,对于客户机和服务器来说,查询(“自2149年以来的所有增量”)处理起来也非常简单。

大约8年前,我为一个应用程序构建了一个这样的系统,我可以分享一些随着应用程序使用量的增长而发展的方法。

I started by logging every change (insert, update or delete) from any device into a "history" table. So if, for example, someone changes their phone number in the "contact" table, the system will edit the contact.phone field, and also add a history record with action=update, table=contact, field=phone, record=[contact ID], value=[new phone number]. Then whenever a device syncs, it downloads the history items since the last sync and applies them to its local database. This sounds like the "transaction replication" pattern described above.

One issue is keeping IDs unique when items could be created on different devices. I didn't know about UUIDs when I started this, so I used auto-incrementing IDs and wrote some convoluted code that runs on the central server to check new IDs uploaded from devices, change them to a unique ID if there's a conflict, and tell the source device to change the ID in its local database. Just changing the IDs of new records wasn't that bad, but if I create, for example, a new item in the contact table, then create a new related item in the event table, now I have foreign keys that I also need to check and update.

Eventually I learned that UUIDs could avoid this, but by then my database was getting pretty large and I was afraid a full UUID implementation would create a performance issue. So instead of using full UUIDs, I started using randomly generated, 8 character alphanumeric keys as IDs, and I left my existing code in place to handle conflicts. Somewhere between my current 8-character keys and the 36 characters of a UUID there must be a sweet spot that would eliminate conflicts without unnecessary bloat, but since I already have the conflict resolution code, it hasn't been a priority to experiment with that.

The next problem was that the history table was about 10 times larger than the entire rest of the database. This makes storage expensive, and any maintenance on the history table can be painful. Keeping that entire table allows users to roll back any previous change, but that started to feel like overkill. So I added a routine to the sync process where if the history item that a device last downloaded no longer exists in the history table, the server doesn't give it the recent history items, but instead gives it a file containing all the data for that account. Then I added a cronjob to delete history items older than 90 days. This means users can still roll back changes less than 90 days old, and if they sync at least once every 90 days, the updates will be incremental as before. But if they wait longer than 90 days, the app will replace the entire database.

That change reduced the size of the history table by almost 90%, so now maintaining the history table only makes the database twice as large instead of ten times as large. Another benefit of this system is that syncing could still work without the history table if needed -- like if I needed to do some maintenance that took it offline temporarily. Or I could offer different rollback time periods for accounts at different price points. And if there are more than 90 days of changes to download, the complete file is usually more efficient than the incremental format.

如果我今天重新开始,我将跳过ID冲突检查,只以一个足以消除冲突的键长度为目标,并进行某种错误检查以防万一。(YouTube似乎使用了11个字符的随机id。)历史表和最近更新的增量下载或需要时的完整下载的组合运行良好。

您真正需要的是Operational transformation (OT)。这甚至可以在许多情况下迎合冲突。

这仍然是一个活跃的研究领域,但已经有各种OT算法的实现。我从事这方面的研究已经有好几年了,所以如果你对这条路线感兴趣,请告诉我,我很乐意为你提供相关资源。

本页用模式和示例代码清楚地描述了数据同步的大多数场景:数据同步:模式、工具和技术

它是我发现的最全面的源代码,考虑了整个增量同步、如何处理删除以及服务器到客户端和客户端到服务器同步的策略。这是一个非常好的起点,值得一看。