LL grammar

In formal language theory, an LL grammar is a formal grammar that can be parsed by an LL parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence (hence LL, compared with LR parser that constructs a rightmost derivation). A language that has an LL grammar is known as an LL language. These form subsets of deterministic context-free grammars (DCFGs) and deterministic context-free languages (DCFLs), respectively. One says that a given grammar or language "is an LL grammar/language" or simply "is LL" to indicate that it is in this class.

LL parsers are table-based parsers, similar to LR parsers. LL grammars can alternatively be characterized as precisely those that can be parsed by a predictive parser – a recursive descent parser without backtracking – and these can be readily written by hand. This article is about the formal properties of LL grammars; for parsing, see LL parser or recursive descent parser.

Relation to LL parsers

There is a separate LL(k) parser for each natural number k (0, 1, 2, ...). A LL parser is called a LL(k) parser if it uses k tokens of lookahead when parsing a sentence. A LL(k) parser recognizes the languages generated by some ε-free LL(k) grammar. As allowing more tokens of lookahead makes the parser strictly more powerful, the languages that can be recognized with a LL(k) parser are a strict subset of the languages that can be recognized by a LL(k+n), n > 0 parser. This creates a strictly increasing sequence of sets: LL(0) ⊊ LL(1) ⊊ LL(2) ⊊ …. Since these are all DCFLs, a corollary is that for any fixed k, there are DCFLs that cannot be recognized by a LL(k) parser.[1]

An LL parser is called an LL(*) parser if it is not restricted to a finite k tokens of lookahead, but can make parsing decisions by recognizing whether the following tokens belong to a regular language (for example by use of a Deterministic Finite Automaton), and accordingly there are the set of LL(*) grammars and the set of LL(*) languages.[2]

Although ε-free LL(k) grammars are considered for LL(k) parsers, allowing ε-rules increases the expressive power of the grammar: For every ε-free LL(k+1) grammar, there exists a LL(k) grammar with ε-rules that generates the same language.[3]

Relation to other grammar classes

Every LL(k) grammar is also a LR(k) grammar. It is also decidable if a given LR(k) grammar is also a LL(m) grammar for some m.[4] A ε-free LL(1) grammar is also a SLR(1) grammar. A LL(1) grammar with symbols that have both empty and non-empty derivations is also a LALR(1) grammar. A LL(1) grammar with symbols that have only the empty derivation may or may not be LALR(1).[5]

LL grammars cannot have rules left recursion. Removing left recursion from a context-free grammar is always possible. However, the resulting grammar may be bigger, require more lookahead tokens than preferred to be an LL grammar, or not be an LL grammar at all. LL(k) grammars that are ε-free can be transformed into Greibach normal form (which by definition do not have rules with left recursion) without increasing the lookeahead tokens.[6]

Simple deterministic languages

The class of languages having an ε-free LL(1) grammar equals the class of simple deterministic languages,[7] the languages generated by simple context-free grammars, which are the context-free grammars in Greibach normal form (i.e. for which each rule has the form Z \rightarrow aY_1 \ldots Y_n, n \geq 0) such that different right hand sides for the same nonterminal Z always start with different terminals a.

This language class includes the regular sets with end-markers.[8] Equivalence is decidable for it, while inclusion is not.[9]

Applications

LL grammars, particularly LL(1) grammars, are of great practical interest, as they are easy to parse, either by LL parsers or by recursive descent parsers, and many computer languages are designed to be LL(1) for this reason. Languages based on grammars with a high value of k have traditionally been considered to be difficult to parse, although this is less true now given the availability and widespread use of parser generators supporting LL(k) grammars for arbitrary k.

See also

References

  1. Rosenkrantz & Stearns (1970), p. 247
  2. Parr, T.; Fisher, K. (2011). "LL(*)". ACM SIGPLAN Notices 46 (6): 425. doi:10.1145/1993316.1993548.
  3. Rosenkrantz & Stearns (1970), p. 242
  4. Rozenkratz & Stearns (1970), pp. 254–255
  5. Beaty (1982)
  6. Rozenkratz & Stearns (1970), p. 242
  7. Rosenkrantz & Stearns (1970), p. 243
  8. Hopcroft & Ullman (1979), p. 229
  9. Korenjak, A.J., Hopcroft, J.E. (1966). "Simple deterministic languages". IEEE Conf. Rec. 7th Ann. Symp. on Switching and Automata Theory (SWAT). IEEE Pub. No. 16-C-40. pp. 36–46. doi:10.1109/SWAT.1966.22.

Further reading

This article is issued from Wikipedia - version of the Wednesday, March 16, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.