Conflict-free replicated data type
In distributed computing, a conflict-free replicated data type (abbreviated CRDT) is a type of specially-designed data structure used to achieve strong eventual consistency (SEC) and monotonicity (absence of rollbacks)[1] . As their name indicates, a CRDT instance is distributed into several replicas; each replica can be mutated promptly and concurrently; the potential divergence between replicas is however guaranteed to be eventually reconciled through downstream synchronisation (off the critical path);[1] consequently CRDTs are known to be highly available.
There are two alternative routes to ensure SEC: operation-based CRDTs[2][3] and state-based CRDTs[4][5] . The two alternatives are equivalent, as one can emulate the other,[1] but there is a tradeoff: operation-based CRDTs require additional guarantees from the communication middleware[1] whereas state-based CRDTs have a high dissemination overhead, as the entire state must be disseminated. Delta state CRDTs[5][6] (or simply Delta CRDTs) are optimised state-based CRDTs where only recently applied mutations to a state are disseminated instead of the entire state. Pure operation-based CRDTs[3] are also an improved variant of operation-based CRDTs that reduce the meta-data size through exploiting the causality information of the middleware.
CRDTs are used to replicate data across multiple computers in a network, executing updates without the need for remote synchronization. This would lead to merge conflicts in systems using conventional eventual consistency technology, but CRDTs are designed such that conflicts are mathematically impossible.[7] Under the constraints of the CAP theorem they provide the strongest consistency guarantees for available/partition-tolerant (AP) settings. In contrast, consensus protocols such as Paxos are required for strongly-consistent/partition-tolerant (CP) settings.
The CRDT concept was first formally defined in 2007 by Marc Shapiro and Nuno Preguiça in terms of operation commutativity,[8] and development was initially motivated by collaborative text editing.[9][10][11] The concept of semilattice evolution of replicated states was first defined by Baquero and Moura in 1997,[4][12] and development was initially motivated by mobile computing. The two concepts were later unified in 2011.[1][7]
Overview
Eventual consistency
Informally, eventual consistency means that replicas eventually reach the same value if clients stop submitting updates. Eventually consistent systems accept local updates without remote synchronization, improving performance and scalability by sacrificing strong consistency. Without remote synchronization, replicas concurrently hold different values which are expected to converge over time. Convergence is complicated by conflicts which arise when merging values between replicas. A conflict is a combination of concurrent updates which may be individually correct, but taken together violate some system invariant. Conventional conflict-resolution schemes involve state roll-back, full consensus, or even user interaction.
Strong eventual consistency
Strong eventual consistency is a property of some eventually-consistent systems: replicas that have received and applied the same set of updates must immediately have equivalent state.[1] There is no conflict arbitration process, because conflicts do not exist in strongly-consistent systems. CRDTs are used to achieve strong eventual consistency in a distributed system.
Mathematical properties
If the system is monotonically increasing in state, clients never observe state rolling back. The set of system states is partially ordered, and the merge operation being commutative, associative and idempotent, the set of all system states is a semilattice, and the merge operation is the semilattice join.
CRDT classes
Two general classes of CRDTs are known to exist. Although any CRDT of one class has an other-class equivalent,[1] the classes differ in assumptions and performance characteristics.
Operation-based CRDTs
Operation-based CRDTs are called commutative replicated data types, or CmRDTs. CmRDT replicas propagate state by broadcasting the state update operation itself, which must be commutative. For example, a CmRDT of a single integer might broadcast the operations (+10) or (-20). Replicas receive the updates and apply them locally. The operations are commutative, so can be received and applied in any order; however, they are not idempotent, and additional network protocol guarantees are required to ensure unique delivery.
State-based CRDTs
State-based CRDTs are called convergent replicated data types, or CvRDTs. In contrast to CmRDTs, CvRDTs send their full local state to other replicas. CvRDTs have the following local interface:
- query - reads the state of the replica, with no side effects
- update - writes to the replica state in accordance with certain restrictions
- merge - merges local state with the state of some remote replica
The merge function must be commutative, associative, and idempotent. It provides a join for any pair of replica states, so the set of all states forms a semilattice. The update function must monotonically increase the internal state, according to the same partial order rules as the semilattice.
Comparison
While CmRDTs require additional guarantees from the network protocol, they use less bandwidth than CvRDTs when the number of transactions is small in comparison to the size of internal state. However, since the CvRDT merge function is associative, merging with the state of some replica yields all previous updates to that replica; gossip protocols work well for propagating CvRDT state to other replicas while reducing network use and handling topology changes.
Some lower bounds[13] on the storage complexity of state-based CRDTs are known.
Known CRDTs
State-based increment-only counter
payload integer[n] P initial [0,0,...,0] update increment() let g = myId() P[g] := P[g] + 1 query value() : integer v let v = P[i] compare (X, Y) : boolean b let b = ( [0, n - 1] : X.P[i] Y.P[i]) merge (X, Y) : payload Z let [0, n - 1] : Z.P[i] = max(X.P[i], Y.P[i])
This CvRDT implements a counter for a cluster of n nodes. Each node in the cluster is assigned an ID from 0 to n - 1, which is retrieved with a call to myId(). Thus each node is assigned its own slot in the array P, which it increments locally. Updates are propagated in the background, and merged by taking the max() of every element in P. The compare function is included to illustrate a partial order on the states. The merge function is commutative, associative, and idempotent. The update function monotonically increases the internal state according to the compare function. This is thus a correctly-defined CvRDT and will provide strong eventual consistency. The CmRDT equivalent broadcasts increment operations as they are received.
State-based PN-counter
payload integer[n] P, integer[n] N initial [0,0,...,0], [0,0,...,0] update increment() let g = myId() P[g] := P[g] + 1 update decrement() let g = myId() N[g] := N[g] + 1 query value() : integer v let v = P[i] - N[i] compare (X, Y) : boolean b let b = ( [0, n - 1] : X.P[i] Y.P[i] [0, n - 1] : X.N[i] Y.N[i]) merge (X, Y) : payload Z let [0, n - 1] : Z.P[i] = max(X.P[i], Y.P[i]) let [0, n - 1] : Z.N[i] = max(X.N[i], Y.N[i])
A common strategy in CRDT development is to stick multiple primitive CRDTs together to make a more complex CRDT. In this case, two increment-only counters were combined to create a CvRDT supporting both increment and decrement operations. Note that the CvRDT's internal state must increase monotonically, even though its external state as exposed through query can return to previous values.
State-based grow-only set
payload set A initial update add(element e) A := A {e} query lookup(element e) : boolean b let b = (e A) compare (S, T) : boolean b let b = (S.A T.A) merge (S, T) : payload U let U.A = S.A T.A
The grow-only set is a CvRDT implementing a set which only allows adds. Since it is impossible for adds and removes to commute (one must take precedence over the other), any CvRDT supporting both add and remove operations must pick and choose its semantics.
State-based 2P-set
payload set A, set R initial , query lookup(element e) : boolean b let b = (e A e R) update add(element e) A := A {e} update remove(element e) pre lookup(e) R := R {e} compare (S, T) : boolean b let b = (S.A T.A S.R T.R) merge (S, T) : payload U let U.A = S.A T.A let U.R = S.R T.R
Two grow-only set CvRDTs are combined to create the 2P-set CvRDT. With the addition of a "tombstone" set, elements can be added and also removed. Once removed, an element cannot be re-added; that is, once an element e is in the tombstone set, query will never again return True for that element. The 2P-set uses "remove-wins" semantics, so remove(e) takes precedence over add(e).
Sequence CRDT
Sequence CRDT can be used to build Collaborative real-time editor and can be considered as an alternative to Operational transformation (OT).
Known Sequence CRDT are Treedoc, Woot,[9] Logoot,[14] LSEQ.[15] CRATE[16] is a decentralized real-time editor built on top of LSEQ and runnable on a network of browsers thanks to WebRTC.
Others
- OR-Set: Add-wins set with tombstones, supporting multiple adds and removes by storing elements with GUIDs.
- LWW-element-set: Set with LWW timestamped adds and removes
- AWORSet: Add-wins optimized observed-remove set that allows adds and removes (a.k.a. ORSWOT)
- RWORSet: Remove-wins optimized observed-remove set that allows adds and removes
- MVRegister: Optimized multi-value register (Dynamo shopping cart)
- Graphs: Using set CRDTs for the sets of vertices and edges
Industry use
Support for CRDTs is implemented in Riak.[17] League of Legends uses the Riak CRDT implementation for its in-game chat system, which handles 7.5 million concurrent users and 11,000 messages per second.[18] SoundCloud open-sourced Roshi, a LWW-element-set CRDT for the SoundCloud stream implemented on top of Redis.[19]
Bet365 (the largest European on-line betting company with 2.5 million simultaneous users peak), store hundreds of megabytes of data in the Riak implementation of OR-Sets.[20]
TomTom employs CRDTs to synchronize navigation data between user devices.[21]
Phoenix, a web framework written in Elixir, will use CRDTs to support real time multi-node information sharing in version 1.2.[22]
References
- 1 2 3 4 5 6 7 Shapiro, Marc; Preguiça, Nuno; Baquero, Carlos; Zawirski, Marek (2011), Conflict-Free Replicated Data Types, Lecture Notes in Computer Science 6976 (Proc 13th International Symposium, SSS 2011), Grenoble, France: Springer Berlin Heidelberg, pp. 386–400, doi:10.1007/978-3-642-24550-3_29, ISBN 978-3-642-24549-7
- ↑ Letia, Mihai; Preguiça, Nuno; Shapiro, Marc (1 April 2010). "Consistency without Concurrency Control in Large, Dynamic Systems". SIGOPS Oper. Syst. Rev. (ACM): 29–34. doi:10.1145/1773912.1773921.
- 1 2 Baquero, Carlos; Almeida, Paulo Sérgio; Shoker, Ali (2014-06-03). Magoutis, Kostas; Pietzuch, Peter, eds. Making Operation-Based CRDTs Operation-Based. Lecture Notes in Computer Science. Springer Berlin Heidelberg. pp. 126–140. doi:10.1007/978-3-662-43352-2_11. ISBN 9783662433515.
- 1 2 Baquero, Carlos; Moura, Francisco (1 October 1999). "Using Structural Characteristics for Autonomous Operation". SIGOPS Oper. Syst. Rev. (ACM, New York, NY, USA): 90–96.
- 1 2 Almeida, Paulo Sérgio; Shoker, Ali; Baquero, Carlos (2015-05-13). Bouajjani, Ahmed; Fauconnier, Hugues, eds. Efficient State-Based CRDTs by Delta-Mutation. Lecture Notes in Computer Science. Springer International Publishing. pp. 62–76. doi:10.1007/978-3-319-26850-7_5. ISBN 9783319268491.
- ↑ Almeida, Paulo Sérgio; Shoker, Ali; Baquero, Carlos (2016-03-04). "Delta State Replicated Data Types". arXiv:1603.01529 [cs].
- 1 2 Shapiro, Marc; Preguiça, Nuno; Baquero, Carlos; Zawirski, Marek (13 January 2011). "A Comprehensive Study of Convergent and Commutative Replicated Data Types". RR-7506 (HAL - Inria).
- ↑ Shapiro, Marc; Preguiça, Nuno (2007). "Designing a Commutative Replicated Data Type". Computing Research Repository (CoRR). abs/0710.1784.
- 1 2 Oster, Gérald; Urso, Pascal; Molli, Pascal; Imine, Abdessamad (2006). "Data consistency for P2P collaborative editing": 259. doi:10.1145/1180875.1180916.
- ↑ Letia, Mihai; Preguiça, Nuno; Shapiro, Marc (2009). "CRDTs: Consistency without Concurrency Control". Computing Research Repository (CoRR). abs/0907.0929.
- ↑ Preguiça, Nuno; Marques, Joan Manuel; Shapiro, Marc; Letia, Mihai (June 2009), A Commutative Replicated Data Type for Cooperative Editing, Montreal, Quebec, Canada: IEEE Computer Society, pp. 395–403, doi:10.1109/ICDCS.2009.20, ISBN 978-0-7695-3659-0
- ↑ Baquero, Carlos; Moura, Francisco (1997). "Specification of Convergent Abstract Data Types for Autonomous Mobile Computing". Universidade do Minho.
- ↑ Burckhardt, Sebastian; Gotsman, Alexey; Yang, Hongseok; Zawirski, Marek (23 January 2014). "Replicated Data Types: Specification, Verification, Optimality". Int. Symp. on Principles of Prog. Lang. (POPL): 271–284. doi:10.1145/2535838.2535848.
- ↑ Weiss, Stephane; Urso, Pascal; Molli, Pascal (2010). "Logoot-Undo: Distributed Collaborative Editing System on P2P Networks". IEEE Transactions on Parallel and Distributed Systems 21 (8): 1162–1174. doi:10.1109/TPDS.2009.173. ISSN 1045-9219.
- ↑ Nédelec, Brice; Molli, Pascal; Mostefaoui, Achour; Desmontils, Emmanuel (2013). "LSEQ an adaptive structure for sequences in distributed collaborative editing": 37. doi:10.1145/2494266.2494278.
- ↑ Nédelec, Brice; Molli, Pascal; Mostefaoui, Achour (2016). "CRATE: Writing Stories Together with our Browsers". Proceedings of the 25th International Conference Companion on World Wide Web. p. 231. doi:10.1145/2872518.2890539.>
- ↑ "Introducing Riak 2.0: Data Types, Strong Consistency, Full-Text Search, and Much More". Basho Technologies, Inc. 29 October 2013.
- ↑ Hoff, Todd (13 October 2014). "How League of Legends Scaled Chat to 70 Million Players - It Takes Lots of Minions". High Scalability.
- ↑ Bourgon, Peter (9 May 2014). "Roshi: a CRDT system for timestamped events". SoundCloud.
- ↑ Macklin, Dan. "bet365: Why bet365 chose Riak". Basho.
- ↑ Ivanov, Dmitry. "Practical Demystification of CRDTs".
- ↑ McCord, Chris. "What makes Phoenix Presence Special".
External links
- A talk on CRDTs by Marc Shapiro
- Readings in conflict-free replicated data types by Christopher Meiklejohn