Derivations

Dear Computer

Chapter 1: Programming Language

Derivations

When people talk about grammars and parsing, they use various tools to show that a program is legal, that it follows the language's grammar. The first is a derivation, which is a step-by-step expansion of a grammar's productions to the target program.

Suppose you want to prove that the string PIGS FLY is a legal independent clause in English. The grammar for independent clauses includes this production:

clause = noun verb

Your proof that PIGS FLY is legal is a derivation that shows the sequence of expansions from the starting non-terminal to the string. Only one production is applied at a time, and the expansion ends when only terminals remain. The derivation requires just two expansions and looks like this with annotations explaining each step:

clause -> noun verb     # expand clause into: noun verb
       -> PIGS verb     # expand noun into: PIGS
       -> PIGS FLY      # expand verb into: FLY

Suppose you want to prove that the string X O X O X is legal in this grammar:

start = X pair
pair  = O X pair
      | O X

The derivation requires three expansions and looks like this:

start -> X pair         # expand start into: X pair
      -> X O X pair     # expand pair into: O X pair
      -> X O X O X      # expand pair into: O X

Each line shows a single expansion from the previous state. Exactly one non-terminal from the previous state is replaced by the terms of one of its productions. In this example, there's only one non-terminal in each previous state, so you don't have a choice. But more complex grammars will lead to productions with multiple non-terminals. You may be tempted to apply multiple steps at once or replace more than one non-terminal at a time. Don't do it.

As grammars get more complex, you will be confronted with choices when forming a derivation. Consider this grammar for composing boolean expressions out of boolean literals and the binary logical operators:

or   = or || and
     | and
and  = and && atom
     | atom
atom = TRUE
     | FALSE

Some of the productions insert two non-terminals into the derivation, which means you'll have to pick which one to expand next. Here's one way we might derive the expression TRUE && FALSE || TRUE:

or -> or || and
   -> and || and
   -> and && atom || and
   -> atom && atom || and
   -> TRUE && atom || and
   -> TRUE && FALSE || and
   -> TRUE && FALSE || atom
   -> TRUE && FALSE || TRUE

Walk through this and make sure you understand how each line is transformed to the next. Notice how we only expand the leftmost non-terminal at each step? When we stick to this pattern, we end up with a leftmost derivation. If we had expanded the rightmost non-terminal instead, we'd have a rightmost derivation, which would look like this for the expression:

or -> or || and
   -> or || atom
   -> or || TRUE
   -> and || TRUE
   -> and && atom || TRUE
   -> and && FALSE || TRUE
   -> atom && FALSE || TRUE
   -> TRUE && FALSE || TRUE

Neither kind of derivation is better than the other. They both prove that the expression is legal. The steps of the proof are just in a different order. You could also pick one of the non-terminals at random. But that would make comparing your derivation with someone else's difficult.

A derivation is just one of the tools for proving a program is legal. Since derivations are text-heavy and verbose, let's examine another tool that uses a compact picture to encode the same proof.

← GrammarsParse Trees →