Data synchronization

Data synchronization is the process of establishing consistency among data from a source to a target data storage and vice versa and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including file synchronization and mobile device synchronization e.g., for PDAs.[1]

File-based solutions

There are tools available for file synchronization, version control (CVS, Subversion, etc.), distributed filesystems (Coda, etc.), and mirroring (rsync, etc.), in that all these attempt to keep sets of files synchronized. However, only version control and file synchronization tools can deal with modifications to more than one copy of the files.

Synchronization can also be useful in encryption for synchronizing Public Key Servers.[3]

Theoretical models

Several theoretical models of data synchronization exist in the research literature, and the problem is also related to the problem of Slepian–Wolf coding in information theory. The models are classified based on how they consider the data to be synchronized.

Unordered data

The problem of synchronizing unordered data (also known as the set reconciliation problem) is modeled as an attempt to compute the symmetric difference S_A \oplus S_B = (S_A - S_B) \cup (S_B - S_A) between two remote sets S_A and S_B of b-bit numbers.[4] Some solutions to this problem are typified by:

Wholesale transfer
In this case all data is transferred to one host for a local comparison.
Timestamp synchronization
In this case all changes to the data are marked with timestamps. Synchronization proceeds by transferring all data with a timestamp later than the previous synchronization.[5]
Mathematical synchronization
In this case data are treated as mathematical objects and synchronization corresponds to a mathematical process.[4][6][7]

Ordered data

In this case, two remote strings \sigma_A and \sigma_B need to be reconciled. Typically, it is assumed that these strings differ by up to a fixed number of edits (i.e. character insertions, deletions, or modifications). Then data synchronization is the process of reducing edit distance between \sigma_A and \sigma_B, up to the ideal distance of zero. This is applied in all filesystem based synchronizations (where the data is ordered). Many practical applications of this are discussed or referenced above.

It is sometimes possible to transform the problem to one of unordered data through a process known as shingling (splitting the strings into shingles).[8]

See also

Notes

  1. Agarwal, S.; Starobinski, D.; Ari Trachtenberg (2002). "On the scalability of data synchronization protocols for PDAs and mobile devices". Network, IEEE 16 (4): 22–28. doi:10.1109/MNET.2002.1020232. Retrieved 2007-07-27.
  2. A. Tridgell (February 1999). "Efficient algorithms for sorting and synchronization" (PDF). PhD thesis. The Australian National University.
  3. sks.dnsalias.net
  4. 1 2 Minsky, Y.; Ari Trachtenberg; Zippel, R. (2003). "Set reconciliation with nearly optimal communication complexity". Information Theory, IEEE Transactions on 49 (9): 2213–2218. doi:10.1109/TIT.2003.815784. Retrieved 2007-07-27.
  5. Palm developer knowledgebase manuals
  6. Ari Trachtenberg; D. Starobinski; S. Agarwal. "Fast PDA Synchronization Using Characteristic Polynomial Interpolation" (PDF). IEEE INFOCOM 2002. doi:10.1109/INFCOM.2002.1019402.
  7. Y. Minsky and A. Trachtenberg, Scalable set reconciliation, Allerton Conference on Communication, Control, and Computing, Oct. 2002
  8. S. Agarwal; V. Chauhan; Ari Trachtenberg (November 2006). "Bandwidth efficient string reconciliation using puzzles" (PDF). IEEE Transactions on Parallel and Distributed Systems 17 (11): 1217–1225. doi:10.1109/TPDS.2006.148. Retrieved 2007-05-23.
This article is issued from Wikipedia - version of the Tuesday, April 05, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.