Ladder of Precedence

Dear Computer

Chapter 2: Translation

Ladder of Precedence

When we see an expression like x - y * z, our brains immediately build a structure equivalent to this tree:

a tree representation of the expression x - y * za tree representation of the expression x - y * z

We build this structure quickly because we've internalized the fact that * is greedier than +. From a very young age, we learned that multiplication grabs up it operands before addition.

Predecence

The greediness of an operator is its precedence. What we may not have realized as we were learning PEMDAS is that an operator's precedence is a consequence of its depth within a grammar. The more steps it takes to reach an operator from the starting non-terminal, the higher its precedence. The more steps, the lower it will appear in the tree. The higher the precedence, the lower it will appear in the tree.

Suppose we name the non-terminals of a grammar level0, level1, level2, and so on. Each of these non-terminals represents a particular level of precedence. The arithmetic operators + and - are in one level, and the multiplicative operators * and / are in the next higher level. The starting and lowest level is level0. We can visualize how the grammar moves between levels by placing their names on a ladder:

a ladder with level0 as the bottom rung and level2 as the top runga ladder with level0 as the bottom rung and level2 as the top rung

As we write our grammar's productions, we ensure our desired precedences by observing these rules:

  1. On each rung, the productions must either remain on the current rung or climb a single rung to the next higher level of precedence.
  2. We go back down the ladder only when there's a syntactic boundary that resets the precedence, like parentheses.

Given these rules and the fact that + has lower precedence than *, we might write this grammar for arithmetic expressions:

level0 = level1 + level1
       | level1 - level1

level1 = level2 * level2

level2 = IDENTIFIER
       | INTEGER
level0 = level1 + level1
       | level1 - level1

level1 = level2 * level2

level2 = IDENTIFIER
       | INTEGER

One issue of this grammar is that every operator is required on each branch from the root. To make an operator optional, we need to add a production to its level that effectively lets us skip the rung. We still step on the rung as we climb the ladder, but it doesn't contribute anything. Here's our new draft:

level0 = level1 + level1
       | level1 - level1
       | level1

level1 = level2 * level2
       | level2

level2 = IDENTIFIER
       | INTEGER
level0 = level1 + level1
       | level1 - level1
       | level1

level1 = level2 * level2
       | level2

level2 = IDENTIFIER
       | INTEGER

Associativity

The other issue with this grammar is that it only allows a single instance of each operator within a branch. There's no way to add three numbers together. We fix this by revisiting rule 1: each production either climbs to the next higher rung or it remains on the same rung. To get multiple + operators, perhaps we can rewrite level0 to this:

level0 = level0 + level0
       | level0 - level0
       | level1
level0 = level0 + level0
       | level0 - level0
       | level1

There's a problem here. The grammar has become ambiguous. We can generate multiple derivations or parse trees from the same sequence of tokens depending on how we choose to expand the two level0 operands.

We eliminate the ambiguity by allowing only one of the non-terminals to remain on level0. Which one? That depends on the operator's associativity. Do we allow more subtractions in the left operand or in the right?

Since subtraction is left-associative, we allow more subtractions only in the left operand. That means the left operand remains on the level0 rung while the right operand climbs to the next higher rung. As long as we adhere to rule 2, no subtractions will sneak into the right operand once we've climbed past its rung. Our final grammar is this:

level0 = level0 + level1
       | level0 - level1
       | level1

level1 = level1 * level2
       | level2

level2 = IDENTIFIER
       | INTEGER
level0 = level0 + level1
       | level0 - level1
       | level1

level1 = level1 * level2
       | level2

level2 = IDENTIFIER
       | INTEGER

We could add new operators to the current levels or add completely new levels. As long as follow the ladder's rules, we'll end up with a well-formed language. Spending time to get the precedences and associativities correct in the grammar is worth it, as the grammar is essentially pseudocode for the parser.

All operators within a precedence level should have the same associativity. Mixing associativities would complicate the parsing code that we're about to write.

← Parse TreesParsing →