Dyck language

In the theory of formal languages of computer science, mathematics, and linguistics, the Dyck language is the language consisting of balanced strings of square brackets [ and ]. It is important in the parsing of expressions that must have a correctly nested sequence of brackets, such as arithmetic or algebraic expressions. It is named after the mathematician Walther von Dyck.
Formal definition
Let be the alphabet consisting of the symbols [ and ] and let
denote its Kleene closure. For any element
with length
we define partial functions
and
by
is
with "
" inserted into the
th position
is
with "
" deleted from the
th position
with the understanding that is undefined for
and
is undefined if
. We define an equivalence relation
on
as follows: for elements
we have
if and only if there exists a finite sequence of applications of the
and
functions starting with
and ending with
, where the empty sequence is allowed. That the empty sequence is allowed accounts for the reflexivity of
. Symmetry follows from the observation that any finite sequence of applications of
to a string can be undone with a finite sequence of applications of
. Transitivity is clear from the definition.
The equivalence relation partitions the language into equivalence classes. If we take
to denote the empty string, then the language corresponding to the equivalence class
is called the Dyck language.
Alternative definition
An alternative definition of the Dyck language can be formulated when we introduce the function.
for any
.
where and
are respectively the number of [ and ] in
. I.e.
counts the imbalance of [ over ]. If
is positive then
has more [ than ].
Now, the Dyck language can be defined as the language
Properties
- The Dyck language is closed under the operation of concatenation.
- By treating
as an algebraic monoid under concatenation we see that the monoid structure transfers onto the quotient
, resulting in the syntactic monoid of the Dyck language. The class
will be denoted
.
- The syntactic monoid of the Dyck language is not commutative: if
and
then
.
- With the notation above,
but neither
nor
are invertible in
.
- The syntactic monoid of the Dyck language is isomorphic to the bicyclic semigroup by virtue of the properties of
and
described above.
- By the Chomsky–Schützenberger representation theorem, any context-free language is a homomorphic image of the intersection of some regular language with a homomorphic preimage of the Dyck language on two brackets.[1]
- The Dyck language with two distinct types of brackets can be recognized in the complexity class
.[2]
Examples
We can define an equivalence relation on the Dyck language
. For
we have
if and only if
, i.e.
and
have the same length. This relation partitions the Dyck language
where
. Note that
is empty for odd
. E.g. there are 14 words in
, i.e. [[[[]]]], [[[][]]], [[[]][]], [[][[]]], [[[]]][], [[][][]], [][[[]]], [[][]][], [[]][[]], [][[][]], [[]][][], [][[]][], [][][[]], [][][][].
Having introduced the Dyck words of length , we can introduce a relationship on them. For every
we define a relation
on
; for
we have
if and only if
can be reached from
by a series of proper swaps. A proper swap in a word
swaps an occurrence of '][' with '[]'.
For each
the relation
makes
into a partially ordered set. The relation
is reflexive because an empty sequence of proper swaps takes
to
. Transitivity follows because we can extend a sequence of proper swaps that takes
to
by concatenating it with a sequence of proper swaps that takes
to
forming a sequence that takes
into
. To see that
is also antisymmetric we introduce an auxiliary function
where
ranges over all prefixes of
. The following table illustrates that
is strictly monotonic with respect to proper swaps.
partial sums of ![]() |
![]() | ![]() | ![]() | ![]() |
---|---|---|---|---|
![]() |
![]() | ] | [ | ![]() |
![]() |
![]() | [ | ] | ![]() |
partial sums of ![]() |
![]() | ![]() | ![]() | ![]() |
Difference of partial sums | 0 | 2 | 0 | 0 |
Hence so
when there is a proper swap that takes
into
. Now if we assume that both
and
, then there are non-empty sequences of proper swaps such
is taken into
and vice versa. But then
which is nonsensical. Therefore, whenever both
and
are in
, we have
, hence
is antisymmetric.
The partial ordered set is shown in the illustration accompanying the introduction if we interpret a [ as going up and ] as going down.