Dyck language

Lattice of the 14 Dyck words of length 8 - [ and ] interpreted as up and down

In the theory of formal languages of computer science, mathematics, and linguistics, the Dyck language is the language consisting of balanced strings of square brackets [ and ]. It is important in the parsing of expressions that must have a correctly nested sequence of brackets, such as arithmetic or algebraic expressions. It is named after the mathematician Walther von Dyck.

Formal definition

Let $\Sigma = \{ [, ] \}$ be the alphabet consisting of the symbols [ and ] and let $\Sigma^{*}$ denote its Kleene closure. For any element $u \in \Sigma^{*}$ with length $| u |$ we define partial functions $insert : \Sigma^{*} \times \mathbb{N} \cup \{ 0 \} \rightarrow \Sigma^{*}$ and $delete : \Sigma^{*} \times \mathbb{N} \rightarrow \Sigma^{*}$ by

insert(u, j)

u

with "

[]

" inserted into the

j

th position

delete(u, j)

u

with "

[]

" deleted from the

j

th position

with the understanding that $insert(u, j)$ is undefined for $j > |u|$ and $delete(u, j)$ is undefined if $j > |u| - 2$ . We define an equivalence relation $R$ on $\Sigma^{*}$ as follows: for elements $a, b \in \Sigma^{*}$ we have $(a, b) \in R$ if and only if there exists a finite sequence of applications of the $insert$ and $delete$ functions starting with $a$ and ending with $b$ , where the empty sequence is allowed. That the empty sequence is allowed accounts for the reflexivity of $R$ . Symmetry follows from the observation that any finite sequence of applications of $insert$ to a string can be undone with a finite sequence of applications of $delete$ . Transitivity is clear from the definition.

The equivalence relation partitions the language $\Sigma^{*}$ into equivalence classes. If we take $\epsilon$ to denote the empty string, then the language corresponding to the equivalence class $\operatorname{Cl}(\epsilon)$ is called the Dyck language.

Alternative definition

An alternative definition of the Dyck language can be formulated when we introduce the $imbalance : \Sigma^{*} \rightarrow \left(\mathbb{N} \cup \{0\}\right)$ function.

imbalance ( u ) = |u|_{[} - |u|_{]}

for any

u \in \Sigma^{*}

where $|u|_{[}$ and $|u|_{]}$ are respectively the number of [ and ] in $u$ . I.e. $imbalance$ counts the imbalance of [ over ]. If $imbalance(u)$ is positive then $u$ has more [ than ].

Now, the Dyck language can be defined as the language

\{ u \in \Sigma^{*} \vert imbalance(u) = 0 \text{ and } imbalance(v) \ge 0 \text{ for all prefixes } v \text{ of } u \}

Properties

The Dyck language is closed under the operation of concatenation.
By treating $\Sigma^{*}$ as an algebraic monoid under concatenation we see that the monoid structure transfers onto the quotient $\Sigma^{*} / R$ , resulting in the syntactic monoid of the Dyck language. The class $\operatorname{Cl}(\epsilon)$ will be denoted $1$ .
The syntactic monoid of the Dyck language is not commutative: if $u = \operatorname{Cl}([)$ and $v = \operatorname{Cl}(])$ then $uv = \operatorname{Cl}([]) = 1 \ne \operatorname{Cl}(][) = vu$ .
With the notation above, $uv = 1$ but neither $u$ nor $v$ are invertible in $\Sigma^{*} / R$ .
The syntactic monoid of the Dyck language is isomorphic to the bicyclic semigroup by virtue of the properties of $\operatorname{Cl}([)$ and $\operatorname{Cl}(])$ described above.
By the Chomsky–Schützenberger representation theorem, any context-free language is a homomorphic image of the intersection of some regular language with a homomorphic preimage of the Dyck language on two brackets.^[1]
The Dyck language with two distinct types of brackets can be recognized in the complexity class $TC^{0}$ .^[2]

Examples

We can define an equivalence relation $L$ on the Dyck language $\mathcal{D}$ . For $u,v\in\mathcal{D}$ we have $(u,v)\in L$ if and only if $|u| = |v|$ , i.e. $u$ and $v$ have the same length. This relation partitions the Dyck language $\mathcal{D} / L = \mathcal{D}_{0} \cup \mathcal{D}_{2} \cup \mathcal{D}_{4} \cup \ldots = \bigcup_{n=0}^{\infty} \mathcal{D}_{n}$ where $\mathcal{D}_{n} = \{ u\in\mathcal{D} \mid |u| = n\}$ . Note that $\mathcal{D}_{n}$ is empty for odd $n$ . E.g. there are 14 words in $\mathcal{D}_{8}$ , i.e. [[[[]]]], [[[][]]], [[[]][]], [[][[]]], [[[]]][], [[][][]], [][[[]]], [[][]][], [[]][[]], [][[][]], [[]][][], [][[]][], [][][[]], [][][][].

Having introduced the Dyck words of length $n$ , we can introduce a relationship on them. For every $n \in \mathbb{N}$ we define a relation $S_{n}$ on $\mathcal{D}_{n}$ ; for $u,v\in\mathcal{D}_{n}$ we have $(u,v)\in S_{n}$ if and only if $v$ can be reached from $u$ by a series of proper swaps. A proper swap in a word $u\in\mathcal{D}_{n}$ swaps an occurrence of '][' with '[]'. For each $n\in\mathbb{N}$ the relation $S_{n}$ makes $\mathcal{D}_{n}$ into a partially ordered set. The relation $S_{n}$ is reflexive because an empty sequence of proper swaps takes $u$ to $u$ . Transitivity follows because we can extend a sequence of proper swaps that takes $u$ to $v$ by concatenating it with a sequence of proper swaps that takes $v$ to $w$ forming a sequence that takes $u$ into $w$ . To see that $S_{n}$ is also antisymmetric we introduce an auxiliary function $\sigma_{n}:\mathcal{D}_{n}\rightarrow\mathbb{N}:u\mapsto\sum_{vw=u} imbalance(v)$ where $v$ ranges over all prefixes of $u$ . The following table illustrates that $\sigma_{n}$ is strictly monotonic with respect to proper swaps.

Strict monotonicity of $\sigma_{n}$
partial sums of $\sigma_{n}(u)$	$P$	$P-1$	$P$	$Q$
$u$	$\ldots$	]	[	$\ldots$
$u'$	$\ldots$	[	]	$\ldots$
partial sums of $\sigma_{n}(u')$	$P$	$P+1$	$P$	$Q$
Difference of partial sums	0	2	0	0

Hence $\sigma_{n}(u') - \sigma_{n}(u) = 2 > 0$ so $\sigma_{n}(u) < \sigma_{n}(u')$ when there is a proper swap that takes $u$ into $u'$ . Now if we assume that both $(u,v), (v,u)\in S_{n}$ and $u\ne v$ , then there are non-empty sequences of proper swaps such $u$ is taken into $v$ and vice versa. But then $\sigma_{n}(u) < \sigma_{n}(v) < \sigma_{n}(u)$ which is nonsensical. Therefore, whenever both $(u,v)$ and $(v,u)$ are in $S_{n}$ , we have $u = v$ , hence $S_{n}$ is antisymmetric.

The partial ordered set $D_{8}$ is shown in the illustration accompanying the introduction if we interpret a [ as going up and ] as going down.

Notes

↑ Kambites, Communications in Algebra Volume 37 Issue 1 (2009) 193-208
↑ Barrington and Corbett, Information Processing Letters 32 (1989) 251-256

References

This article is issued from Wikipedia - version of the Friday, May 06, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.