Derivations
When people talk about grammars and parsing, they use various tools to show that a program is legal, that it follows the language's grammar. The first is a derivation, which is a step-by-step expansion of a grammar's productions to the target program.
Suppose you want to prove that the string PIGS FLY
is a legal independent clause in English. The grammar for independent clauses includes this production:
clause = noun verb
clause = noun verb
Your proof that PIGS FLY
is legal is a derivation that shows the sequence of expansions from the starting non-terminal to the string. Only one production is applied at a time, and the expansion ends when only terminals remain. The derivation requires just two expansions and looks like this with annotations explaining each step:
clause -> noun verb # expand clause into: noun verb
-> PIGS verb # expand noun into: PIGS
-> PIGS FLY # expand verb into: FLY
clause -> noun verb # expand clause into: noun verb -> PIGS verb # expand noun into: PIGS -> PIGS FLY # expand verb into: FLY
Suppose you want to prove that the string X O X O X
is legal in this grammar:
start = X pair
pair = O X pair
| O X
start = X pair pair = O X pair | O X
The derivation requires three expansions and looks like this:
start -> X pair # expand start into: X pair
-> X O X pair # expand pair into: O X pair
-> X O X O X # expand pair into: O X
start -> X pair # expand start into: X pair -> X O X pair # expand pair into: O X pair -> X O X O X # expand pair into: O X
Each line shows a single expansion from the previous state. Exactly one non-terminal from the previous state is replaced by the terms of one of its productions. In this example, there's only one non-terminal in each previous state, so you don't have a choice. But more complex grammars will lead to productions with multiple non-terminals. You may be tempted to apply multiple steps at once or replace more than one non-terminal at a time. Don't do it.
As grammars get more complex, you will be confronted with choices when forming a derivation. Consider this grammar for composing boolean expressions out of boolean literals and the binary logical operators:
or = or || and
| and
and = and && atom
| atom
atom = TRUE
| FALSE
or = or || and | and and = and && atom | atom atom = TRUE | FALSE
Some of the productions insert two non-terminals into the derivation, which means you'll have to pick which one to expand next. Here's one way we might derive the expression TRUE && FALSE || TRUE
:
or -> or || and
-> and || and
-> and && atom || and
-> atom && atom || and
-> TRUE && atom || and
-> TRUE && FALSE || and
-> TRUE && FALSE || atom
-> TRUE && FALSE || TRUE
or -> or || and -> and || and -> and && atom || and -> atom && atom || and -> TRUE && atom || and -> TRUE && FALSE || and -> TRUE && FALSE || atom -> TRUE && FALSE || TRUE
Walk through this and make sure you understand how each line is transformed to the next. Notice how we only expand the leftmost non-terminal at each step? When we stick to this pattern, we end up with a leftmost derivation. If we had expanded the rightmost non-terminal instead, we'd have a rightmost derivation, which would look like this for the expression:
or -> or || and
-> or || atom
-> or || TRUE
-> and || TRUE
-> and && atom || TRUE
-> and && FALSE || TRUE
-> atom && FALSE || TRUE
-> TRUE && FALSE || TRUE
or -> or || and -> or || atom -> or || TRUE -> and || TRUE -> and && atom || TRUE -> and && FALSE || TRUE -> atom && FALSE || TRUE -> TRUE && FALSE || TRUE
Neither kind of derivation is better than the other. They both prove that the expression is legal. The steps of the proof are just in a different order. You could also pick one of the non-terminals at random. But that would make comparing your derivation with someone else's difficult.
A derivation is just one of the tools for proving a program is legal. Since derivations are text-heavy and verbose, let's examine another tool that uses a compact picture to encode the same proof.