Self-synchronizing code

Not to be confused with self-clocking signal.

In coding theory, especially in telecommunications, a self-synchronizing code^[1] is a uniquely decodable code in which the symbol stream formed by a portion of one code word, or by the overlapped portion of any two adjacent code words, is not a valid code word. Put another way, a set of strings (called "code words") over an alphabet is called a self-synchronizing code if for each string obtained by concatenating two code words, the substring starting at the second symbol and ending at the second-last symbol does not contain any code word as substring. Every self-synchronizing code is a prefix code, but not all prefix codes are self-synchronizing.

Other terms for self-synchronizing code are synchronized code^[2] or, ambiguously, comma-free code.^[3] A self-synchronizing code permits the proper framing of transmitted code words provided that no uncorrected errors occur in the symbol stream; external synchronization is not required. Self-synchronizing codes also allow recovery from uncorrected errors in the stream; with most prefix codes, an uncorrected error in a single bit may propagate errors further in the stream and make the subsequent data corrupted.

Importance of self-synchronizing codes is not limited to data transmission. Self-synchronization also facilitates some cases of data recovery, for example of a digitally encoded text.

Synchronizing word

A code $X$ over an alphabet $A$ has a synchronizing word (aka "syncword") $w$ in $A +$ if

x w y \in X * \Rightarrow {x w, w y} \subseteq X *

.^[2]

A prefix code is synchronized if and only if it has a synchronizing word.^[4]

Examples

The prefix code {ab,ba} has abba as a synchronizing word.^[4]
The prefix code b^∗a has a as a synchronizing word.^[4]
The code 1100001100 produced by the words {11, 00}. The code can be represented by 11 00 00 11 00, with spaces added to show the different words (they are not really in the code).
Let's now assume that four letters (two code words) are read. The code 1000 is not a valid code, because 10 is not one of the two code words defined. Similarly, 0001. Even though 00 is a valid word, 01 is not. The only valid way to read two valid words from the example given is by starting at the very beginning, or just after one of the spaces (which have been inserted for clarity only).

Examples

High-Level Data Link Control (HDLC)
Advanced Data Communication Control Procedures (ADCCP)
in UTF-8, bit patterns 0xxxxxxx and 11xxxxxx are synchronizing words used to mark the beginning of the next valid character

References

↑ US Federal Standard 1037C
1 2 Berstel et al (2010) p. 137
↑ Berstel & Perrin (1985) p. 377
1 2 3 Berstel et al (2010) p. 138

Berstel, Jean; Perrin, Dominique (1985), Theory of Codes, Pure and Applied Mathematics 117, Academic Press, Zbl 0587.68066
Berstel, Jean; Perrin, Dominique; Reutenauer, Christophe (2010). Codes and automata. Encyclopedia of Mathematics and its Applications 129. Cambridge: Cambridge University Press. ISBN 978-0-521-88831-8. Zbl 1187.94001.
This article incorporates public domain material from the General Services Administration document "Federal Standard 1037C" (in support of MIL-STD-188).

This article is issued from Wikipedia - version of the Sunday, February 07, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Self-synchronizing code

Synchronizing word

Examples

Examples

See also

References