我有一种感觉,肯定存在客户端-服务器同步模式。但我完全没能做到。

情况很简单——服务器是中心节点,多个客户端连接并操作相同的数据。数据可以按原子分割,在发生冲突的情况下,服务器上的任何内容都具有优先级(以避免让用户陷入冲突解决)。由于潜在的大量数据,部分同步是首选。

对于这种情况,是否有任何模式/良好的实践,或者如果你不知道任何模式/良好的实践,你会采取什么方法?

以下是我现在认为如何解决它: 与数据并行的是,将保存一个修改日志,其中所有事务都有时间戳。 当客户端连接时,它以合并的形式接收自上次检查以来的所有更改(服务器遍历列表并删除添加的内容,然后再删除,合并每个原子的更新,等等)。 瞧,我们是最新的。

另一种方法是为每条记录保留修改日期,而不是执行数据删除,只是将它们标记为已删除。

任何想法吗?


当前回答

大约8年前,我为一个应用程序构建了一个这样的系统,我可以分享一些随着应用程序使用量的增长而发展的方法。

I started by logging every change (insert, update or delete) from any device into a "history" table. So if, for example, someone changes their phone number in the "contact" table, the system will edit the contact.phone field, and also add a history record with action=update, table=contact, field=phone, record=[contact ID], value=[new phone number]. Then whenever a device syncs, it downloads the history items since the last sync and applies them to its local database. This sounds like the "transaction replication" pattern described above.

One issue is keeping IDs unique when items could be created on different devices. I didn't know about UUIDs when I started this, so I used auto-incrementing IDs and wrote some convoluted code that runs on the central server to check new IDs uploaded from devices, change them to a unique ID if there's a conflict, and tell the source device to change the ID in its local database. Just changing the IDs of new records wasn't that bad, but if I create, for example, a new item in the contact table, then create a new related item in the event table, now I have foreign keys that I also need to check and update.

Eventually I learned that UUIDs could avoid this, but by then my database was getting pretty large and I was afraid a full UUID implementation would create a performance issue. So instead of using full UUIDs, I started using randomly generated, 8 character alphanumeric keys as IDs, and I left my existing code in place to handle conflicts. Somewhere between my current 8-character keys and the 36 characters of a UUID there must be a sweet spot that would eliminate conflicts without unnecessary bloat, but since I already have the conflict resolution code, it hasn't been a priority to experiment with that.

The next problem was that the history table was about 10 times larger than the entire rest of the database. This makes storage expensive, and any maintenance on the history table can be painful. Keeping that entire table allows users to roll back any previous change, but that started to feel like overkill. So I added a routine to the sync process where if the history item that a device last downloaded no longer exists in the history table, the server doesn't give it the recent history items, but instead gives it a file containing all the data for that account. Then I added a cronjob to delete history items older than 90 days. This means users can still roll back changes less than 90 days old, and if they sync at least once every 90 days, the updates will be incremental as before. But if they wait longer than 90 days, the app will replace the entire database.

That change reduced the size of the history table by almost 90%, so now maintaining the history table only makes the database twice as large instead of ten times as large. Another benefit of this system is that syncing could still work without the history table if needed -- like if I needed to do some maintenance that took it offline temporarily. Or I could offer different rollback time periods for accounts at different price points. And if there are more than 90 days of changes to download, the complete file is usually more efficient than the incremental format.

如果我今天重新开始,我将跳过ID冲突检查,只以一个足以消除冲突的键长度为目标,并进行某种错误检查以防万一。(YouTube似乎使用了11个字符的随机id。)历史表和最近更新的增量下载或需要时的完整下载的组合运行良好。

其他回答

本页用模式和示例代码清楚地描述了数据同步的大多数场景:数据同步:模式、工具和技术

它是我发现的最全面的源代码,考虑了整个增量同步、如何处理删除以及服务器到客户端和客户端到服务器同步的策略。这是一个非常好的起点,值得一看。

大约8年前,我为一个应用程序构建了一个这样的系统,我可以分享一些随着应用程序使用量的增长而发展的方法。

I started by logging every change (insert, update or delete) from any device into a "history" table. So if, for example, someone changes their phone number in the "contact" table, the system will edit the contact.phone field, and also add a history record with action=update, table=contact, field=phone, record=[contact ID], value=[new phone number]. Then whenever a device syncs, it downloads the history items since the last sync and applies them to its local database. This sounds like the "transaction replication" pattern described above.

One issue is keeping IDs unique when items could be created on different devices. I didn't know about UUIDs when I started this, so I used auto-incrementing IDs and wrote some convoluted code that runs on the central server to check new IDs uploaded from devices, change them to a unique ID if there's a conflict, and tell the source device to change the ID in its local database. Just changing the IDs of new records wasn't that bad, but if I create, for example, a new item in the contact table, then create a new related item in the event table, now I have foreign keys that I also need to check and update.

Eventually I learned that UUIDs could avoid this, but by then my database was getting pretty large and I was afraid a full UUID implementation would create a performance issue. So instead of using full UUIDs, I started using randomly generated, 8 character alphanumeric keys as IDs, and I left my existing code in place to handle conflicts. Somewhere between my current 8-character keys and the 36 characters of a UUID there must be a sweet spot that would eliminate conflicts without unnecessary bloat, but since I already have the conflict resolution code, it hasn't been a priority to experiment with that.

The next problem was that the history table was about 10 times larger than the entire rest of the database. This makes storage expensive, and any maintenance on the history table can be painful. Keeping that entire table allows users to roll back any previous change, but that started to feel like overkill. So I added a routine to the sync process where if the history item that a device last downloaded no longer exists in the history table, the server doesn't give it the recent history items, but instead gives it a file containing all the data for that account. Then I added a cronjob to delete history items older than 90 days. This means users can still roll back changes less than 90 days old, and if they sync at least once every 90 days, the updates will be incremental as before. But if they wait longer than 90 days, the app will replace the entire database.

That change reduced the size of the history table by almost 90%, so now maintaining the history table only makes the database twice as large instead of ten times as large. Another benefit of this system is that syncing could still work without the history table if needed -- like if I needed to do some maintenance that took it offline temporarily. Or I could offer different rollback time periods for accounts at different price points. And if there are more than 90 days of changes to download, the complete file is usually more efficient than the incremental format.

如果我今天重新开始,我将跳过ID冲突检查,只以一个足以消除冲突的键长度为目标,并进行某种错误检查以防万一。(YouTube似乎使用了11个字符的随机id。)历史表和最近更新的增量下载或需要时的完整下载的组合运行良好。

作为团队的一员,我做了很多涉及数据同步的项目,所以我应该可以回答这个问题。

Data syncing is quite a broad concept and there are way too much to discuss. It covers a range of different approaches with their upsides and downsides. Here is one of the possible classifications based on two perspectives: Synchronous / Asynchronous, Client/Server / Peer-to-Peer. Syncing implementation is severely dependent on these factors, data model complexity, amount of data transferred and stored, and other requirements. So in each particular case the choice should be in favor of the simplest implementation meeting the app requirements.

基于对现有的现成解决方案的回顾,我们可以勾画出几个主要的同步类,这些同步对象的粒度不同:

Syncing of a whole document or database is used in cloud-based applications, such as Dropbox, Google Drive or Yandex.Disk. When the user edits and saves a file, the new file version is uploaded to the cloud completely, overwriting the earlier copy. In case of a conflict, both file versions are saved so that the user can choose which version is more relevant. Syncing of key-value pairs can be used in apps with a simple data structure, where the variables are considered to be atomic, i.e. not divided into logical components. This option is similar to syncing of whole documents, as both the value and the document can be overwritten completely. However, from a user perspective a document is a complex object composed of many parts, but a key-value pair is but a short string or a number. Therefore, in this case we can use a more simple strategy of conflict resolution, considering the value more relevant, if it has been the last to change. Syncing of data structured as a tree or a graph is used in more sophisticated applications where the amount of data is large enough to send the database in its entirety at every update. In this case, conflicts have to be resolved at the level of individual objects, fields or relationships. We are primarily focused on this option.

所以,我们在这篇文章中抓住了我们的知识,我认为这可能对每个对这个主题感兴趣的人都很有用=>基于Core Data的iOS应用程序中的数据同步(http://blog.denivip.ru/index.php/2014/04/data-syncing-in-core-data-based-ios-apps/?lang=en)

对于delta (change) sync,您可以使用pubsub模式将更改发布回所有订阅的客户端,像pusher这样的服务可以做到这一点。

对于数据库镜像,一些web框架使用本地迷你数据库将服务器端数据库同步到本地浏览器数据库,支持部分同步。检查meteror。

您真正需要的是Operational transformation (OT)。这甚至可以在许多情况下迎合冲突。

这仍然是一个活跃的研究领域,但已经有各种OT算法的实现。我从事这方面的研究已经有好几年了,所以如果你对这条路线感兴趣,请告诉我,我很乐意为你提供相关资源。