Shunting-yard algorithm
In computer science, the shunting-yard algorithm is a method for parsing mathematical expressions specified in infix notation. It can be used to produce either a postfix notation string, also known as Reverse Polish notation (RPN), or an abstract syntax tree (AST). The algorithm was invented by Edsger Dijkstra and named the "shunting yard" algorithm because its operation resembles that of a railroad shunting yard. Dijkstra first described the Shunting Yard Algorithm in the Mathematisch Centrum report MR 34/61.
Like the evaluation of RPN, the shunting yard algorithm is stack-based. Infix expressions are the form of mathematical notation most people are used to, for instance "3+4" or "3+4*(2−1)". For the conversion there are two text variables (strings), the input and the output. There is also a stack that holds operators not yet added to the output queue. To convert, the program reads each symbol in order and does something based on that symbol. The result for the above examples would be "3 4 +" or "3 4 2 1 - * +".
The shunting-yard algorithm has been later generalized into operator-precedence parsing.
A simple conversion
- Input: 3+4
- Add 3 to the output queue (whenever a number is read it is added to the output)
- Push + (or its ID) onto the operator stack
- Add 4 to the output queue
- After reading the expression, pop the operators off the stack and add them to the output.
- In this case there is only one, "+".
- Output 3 4 +
This already shows a couple of rules:
- All numbers are added to the output when they are read.
- At the end of reading the expression, pop all operators off the stack and onto the output.
The algorithm in detail
- While there are tokens to be read:
- Read a token.
- If the token is a number, then add it to the output queue.
- If the token is a function token, then push it onto the stack.
- If the token is a function argument separator (e.g., a comma):
- Until the token at the top of the stack is a left parenthesis, pop operators off the stack onto the output queue. If no left parentheses are encountered, either the separator was misplaced or parentheses were mismatched.
- If the token is an operator, o1, then:
- while there is an operator token o2, at the top of the operator stack and either
- o1 is left-associative and its precedence is less than or equal to that of o2, or
- o1 is right associative, and has precedence less than that of o2,
- pop o2 off the operator stack, onto the output queue;
- at the end of iteration push o1 onto the operator stack.
- If the token is a left parenthesis (i.e. "("), then push it onto the stack.
- If the token is a right parenthesis (i.e. ")"):
- Until the token at the top of the stack is a left parenthesis, pop operators off the stack onto the output queue.
- Pop the left parenthesis from the stack, but not onto the output queue.
- If the token at the top of the stack is a function token, pop it onto the output queue.
- If the stack runs out without finding a left parenthesis, then there are mismatched parentheses.
- When there are no more tokens to read:
- While there are still operator tokens in the stack:
- If the operator token on the top of the stack is a parenthesis, then there are mismatched parentheses.
- Pop the operator onto the output queue.
- Exit.
To analyze the running time complexity of this algorithm, one has only to note that each token will be read once, each number, function, or operator will be printed once, and each function, operator, or parenthesis will be pushed onto the stack and popped off the stack once – therefore, there are at most a constant number of operations executed per token, and the running time is thus O(n) – linear in the size of the input.
The shunting yard algorithm can also be applied to produce prefix notation (also known as polish notation). To do this one would simply start from the end of a string of tokens to be parsed and work backwards, reverse the output queue (therefore making the output queue an output stack), and flip the left and right parenthesis behavior (remembering that the now-left parenthesis behavior should pop until it finds a now-right parenthesis).
Detailed example
Input: 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3
operator | precedence | associativity |
---|---|---|
^ | 4 | Right |
* | 3 | Left |
/ | 3 | Left |
+ | 2 | Left |
− | 2 | Left |
Note: The symbol ^ represents the power operator (Not XOR).
Token | Action | Output (in RPN) | Operator Stack | Notes |
---|---|---|---|---|
3 | Add token to output | 3 | ||
+ | Push token to stack | 3 | + | |
4 | Add token to output | 3 4 | + | |
* | Push token to stack | 3 4 | * + | * has higher precedence than + |
2 | Add token to output | 3 4 2 | * + | |
/ | Pop stack to output | 3 4 2 * | + | / and * have same precedence |
Push token to stack | 3 4 2 * | / + | / has higher precedence than + | |
( | Push token to stack | 3 4 2 * | ( / + | |
1 | Add token to output | 3 4 2 * 1 | ( / + | |
- | Push token to stack | 3 4 2 * 1 | - ( / + | |
5 | Add token to output | 3 4 2 * 1 5 | - ( / + | |
) | Pop stack to output | 3 4 2 * 1 5 - | ( / + | Repeated until "(" found |
Pop stack | 3 4 2 * 1 5 - | / + | Discard matching parenthesis | |
^ | Push token to stack | 3 4 2 * 1 5 - | ^ / + | ^ has higher precedence than / |
2 | Add token to output | 3 4 2 * 1 5 - 2 | ^ / + | |
^ | Push token to stack | 3 4 2 * 1 5 - 2 | ^ ^ / + | ^ is evaluated right-to-left |
3 | Add token to output | 3 4 2 * 1 5 - 2 3 | ^ ^ / + | |
end | Pop entire stack to output | 3 4 2 * 1 5 - 2 3 ^ ^ / + |
Input: sin ( max ( 2 , 3 ) / 3 * 3.1415 )
Token | Action | Output (in RPN) | Operator Stack | Notes |
---|---|---|---|---|
sin | Push token to stack | sin | ||
max | Push token to stack | max sin | ||
2 | Add token to output | 2 | max sin | |
3 | Add token to output | 2 3 | max sin | |
/ | Pop token to output | 2 3 max | / sin | |
3 | Add token to output | 2 3 max 3 | / sin | |
* | Pop token to output | 2 3 max 3 / | * sin | |
3.1415 | Add token to output | 2 3 max 3 / 3.1415 | * sin | |
end | Pop entire stack to output | 2 3 max 3 / 3.1415 * sin |
If one was writing a compiler, this output would be tokenized and written to a compiled file to be later interpreted. Conversion from infix to RPN can also allow for easier simplification of expressions. To do this, act like one is solving the RPN expression, however, whenever one comes to a variable its value is null, and whenever an operator has a null value, it and its parameters are written to the output (this is a simplification, problems arise when the parameters are operators). When an operator has no null parameters its value can simply be written to the output. This method obviously doesn't include all the simplifications possible: It's more of a constant folding optimization.
See also
External links
- Dijkstra's original description of the Shunting yard algorithm
- Literate Programs implementation in C
- Implementation in various languages, including C and Python
- Shunting Yard Algorithm and Postfix Evaluation in C#
- Java Applet demonstrating the Shunting yard algorithm
- Silverlight widget demonstrating the Shunting yard algorithm and evaluation of arithmetic expressions
- Parsing Expressions by Recursive Descent Theodore Norvell © 1999–2001. Access date September 14, 2006.
- Extension to the ‘Shunting Yard’ algorithm to allow variable numbers of arguments to functions
- Java implementation of the Shunting yard algorithm
- Another Java implementation of the Shunting yard algorithm
- A Python implementation of the Shunting yard algorithm
- A Swift implementation of the Shunting yard algorithm
- A GNU Guile implementation of the Shunting yard algorithm