Simple Parsing Explained – part 1

Someone in a Discord discussion was asking about simple parsing and my explanations started to get longer than simple messaging might allow. Hopefully, this can still be short and sweet.

If you’ve had university level computer science then some of this sort of thing was probably covered in CS 2 or so. At least it was for me. Back then, we never went over an abstract-syntax-tree or any of that as our simple solvers just worked ‘in place’ on the stack.

Let’s talk about that first.

A Super-simple Infix Notation Calculator:

(But Paul, what is ‘infix notation’?)

Infix notation is the one we humans usually use: “1 + 2” or “3 * 8”. This is as opposed to post-fix notation: “1 2 +” or “3 8 *” or prefix notation “+ 1 2” or “* 3 8”. Post-fix especially has some nice properties for calculations but is harder for humans to think about.

First, I’m going to limit our calculator’s operations to addition and multiplication. Multiplication will take precedence over addition. So “1 + 2 * 3 + 4” would be “1 + 6 + 4” or 11 and not “3 * 7” or 21.

For further simplication, we’ll also only support single digit numbers. Anyone vaguely familiar with String parsing should be able to extrapolate. (Even this single character parser could pretty easily accumulate numbers by accumulating literals as (last * 10 + current)… but it complicates the code a little so I’ve skipped it.)

This infix calculator will parse the string as it goes, adding numbers to an operand stack and operators to an operator stack. When we detect that it is time to perform an operation, we pop the top operator and the top two operands and run the operation.

(A rudimentary understanding of stacks is definitely a requirement for following this logic.)

Our first test string:
String input = “1 + 2 * 3 + 4”; (familiar)

A simple parser:

Stack operands = new Stack<>();
Stack operators = new Stack<>();

// We'll keep track of the last token just to prevent
// + * or 1 2 in a row.
char lastToken = 0;

for( int i = 0; i < input.length(); i++ ) {
char token = input.charAt(i);
if( token == ' ' ) {
continue;
}
if( Character.isDigit(token) ) {
// Check for double digits in a row
if( Character.isDigit(lastToken) ) {
throw new RuntimeException(
"Expected operator, found operand at:" + i);
}
int num = Character.getNumericValue(token);
operands.push(num);
} else if( token == '' ) {
// Check for double tokens in a row
if( lastToken == '' || lastToken == '+' ) {
throw new RuntimeException(
"Expected operand, found operator at:" + i);
}
operators.push('');
} else if( token == '+' ) {
// Check for double tokens in a row
if( lastToken == '' || lastToken == '+' ) {
throw new RuntimeException(
"Expected operand, found operator at:" + i);
}
// Multiplication takes precedence over addition, so we'll
// pop them off and calculate them until we run out.
while( !operators.isEmpty() && operators.peek() == '*' ) {
operators.pop();
// Perform the multiplication on the top of the stack
int op1 = operands.pop();
int op2 = operands.pop();
int result = op1 * op2;
// Push the result onto the stack for the next operator
operands.push(result);
}
operators.push('+');
}
lastToken = token;
}

// Calculate what's left on the stack
while( !operators.isEmpty() ) {
char op = operators.pop();
int op1 = operands.pop();
int op2 = operands.pop();
int result = 0;
if( op == '*' ) {
result = op1 * op2;
} else if( op == '+' ) {
result = op1 + op2;
}
operands.push(result);
}
// Should just be one left but we'll check just in case
if( operands.size() != 1 ) {
throw new RuntimeException("Parse error, missing operators");
}
System.out.println("Result:" + operands);

That should print [11] when all is said and done. In theory it should work for a variety of inputs and handle most errors.

More operators, parentheses, etc. are relatively straight-forward to add but complicate the code. More than two operators means calculating precedence is a little more complicated. Parentheses support would need code to pop+calculate until the starting parenthesis on the stack. These all confuse the essence of the “top of stack” style operations.

I know I didn’t go too deeply into explaining each little part. This example is mostly to serve as a foundation for the next article where we’ll build a version the creates an “abstract syntax tree” or AST.

Leave a Reply

Your email address will not be published. Required fields are marked *