Self-synchronizing code
In coding theory, especially in telecommunications, a self-synchronizing code[1] is a uniquely decodable code in which the symbol stream formed by a portion of one code word, or by the overlapped portion of any two adjacent code words, is not a valid code word. Put another way, a set of strings (called "code words") over an alphabet is called a self-synchronizing code if for each string obtained by concatenating two code words, the substring starting at the second symbol and ending at the second-last symbol does not contain any code word as substring. Every self-synchronizing code is a prefix code, but not all prefix codes are self-synchronizing.
Other terms for self-synchronizing code are synchronized code[2] or, ambiguously, comma-free code.[3] A self-synchronizing code permits the proper framing of transmitted code words provided that no uncorrected errors occur in the symbol stream; external synchronization is not required. Self-synchronizing codes also allow recovery from uncorrected errors in the stream; with most prefix codes, an uncorrected error in a single bit may propagate errors further in the stream and make the subsequent data corrupted.
Importance of self-synchronizing codes is not limited to data transmission. Self-synchronization also facilitates some cases of data recovery, for example of a digitally encoded text.
Synchronizing word
A code X over an alphabet A has a synchronizing word (aka "syncword") w in A+ if
- x w y ∈ X * ⇒ {x w, w y} ⊆ X * .[2]
A prefix code is synchronized if and only if it has a synchronizing word.[4]
Examples
- The prefix code {ab,ba} has abba as a synchronizing word.[4]
- The prefix code b∗a has a as a synchronizing word.[4]
- The code 1100001100 produced by the words {11, 00}. The code can be represented by 11 00 00 11 00, with spaces added to show the different words (they are not really in the code).
Let's now assume that four letters (two code words) are read. The code 1000 is not a valid code, because 10 is not one of the two code words defined. Similarly, 0001. Even though 00 is a valid word, 01 is not. The only valid way to read two valid words from the example given is by starting at the very beginning, or just after one of the spaces (which have been inserted for clarity only).
Examples
- High-Level Data Link Control (HDLC)
- Advanced Data Communication Control Procedures (ADCCP)
- in UTF-8, bit patterns
0xxxxxxx
and11xxxxxx
are synchronizing words used to mark the beginning of the next valid character
See also
References
- Berstel, Jean; Perrin, Dominique (1985), Theory of Codes, Pure and Applied Mathematics 117, Academic Press, Zbl 0587.68066
- Berstel, Jean; Perrin, Dominique; Reutenauer, Christophe (2010). Codes and automata. Encyclopedia of Mathematics and its Applications 129. Cambridge: Cambridge University Press. ISBN 978-0-521-88831-8. Zbl 1187.94001.
- This article incorporates public domain material from the General Services Administration document "Federal Standard 1037C" (in support of MIL-STD-188).