Cheap and Secure Web Hosting Provider : See Now

[Solved]: “Standard” approach to syncing data?

, , No Comments
Problem Detail: 

I'm wondering if there are any universally accepted "best" algorithms for syncing data sets.

I imagine there would be different solutions depending on circumstances:

  • Are all peers online all the time or not?
  • Are we more limited by bandwidth or processing power?
  • Are data items large or small?
  • Does data get modified and deleted or are we just accumulating a growing collection of data?

My question is what are the equivalents to Djikstra's and Kruskal's algorthim of my problem?

You might say the two well known algorithms mentioned solve problems which are much more specific. That's probably true, so let me narrow down my question:

What is a good way to calculate a diff between two versions of a large growing set of very small chunks of data? Assume data is never deleted or modified, synchronisation happens between a limited number of peers, somewhat regularly but there is no real server and client situation, any peer could have some new data.

I feel I just don't know where to start looking, what papers to read.

Asked By : Higemaru

Answered By : D.W.

I don't know that there is any "standard" or "best" approach, as the technique that is appropriate will depend on the setting. However, I recommend you start by looking at the algorithms used by rsync and the Remote Differential Compression scheme. More generally, look at techniques for delta compression.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents


Post a Comment

Let us know your responses and feedback