Lab: Lexing and Parsing

Dear Computer

Chapter 1: Programming Language

Lab: Lexing and Parsing

Your goal in this lab is to learn more about lexing and parsing. This exercise is designed to help you understand what the interpreters and compilers you use every day do once they receive your source code and to help prepare you for writing your own lexer and parser for the spreadsheet project, which will be part of milestone 2.

Alchemoji

You are designing an alchemy notation system for a client. The base materials are emoji. A material may be combined with another to yield a new material using the operator. Or a material may be removed from another using the operator. For example:

Complete the following exercises to help you reason about how a parser recognizes this language. Note that you are asked to draw several different kinds of trees.

  1. Imagine for a moment a different language that allows integers and a binary @ operator. You see the expression 5 @ 2 @ 9. Draw the abstract syntax tree of this expression if the language has this grammar:
    expression = expression @ INT
               | INT
    Note that an abstract syntax tree is simpler than a parse tree. For this language, an AST's internal nodes are operators, and its leaf nodes are integer literals.
  2. Draw the abstract syntax tree of 5 @ 2 @ 9 if the language has this grammar:
    expression = INT @ expression
               | INT
  3. We now return to the alchemical emoji language. Write a grammar that describes the structure of an alchemical equation based on the examples given. Use equation, combine, and remove as non-terminals. Assume the yield on the right-hand side of an equation is always singular, but there may be an arbitrary number of materials on the left-hand side. The materials and yield may be any emoji. Give higher precedence than . Higher-precedent operations are closer to the terminals and farther from the grammar's starting non-terminal. Stated another way, higher-precedence operations are closer to the bottom of the tree and grammar. Revisit the logical operations grammar as an example.
  4. Draw parse trees for the three example expressions. Note that a parse tree is more verbose than an AST. Its root is the starting non-terminal, and it reveals how non-terminals expand according to the rules of the grammar. Its internal nodes are non-terminals and its leaf nodes are terminals.
  5. Craft your own alchemoji expression that has a mix of at least three materials left of the yield. Use at least one and at least one . Think of your expression as telling a story.
  6. Draw a parse tree for your expression.
  7. Consider the arithmetic expression a + b + c. Given the conventions of mathematics, what does its abstract syntax tree look like?
  8. Consider the arithmetic expression 2 ^ 3 ^ 2. Given the conventions of mathematics, what does its abstract syntax tree look like? Verify your understanding with a calculator.
  9. Investigate the terms left-associative and right-associative. How do these terms apply to the arithmetic + and - operators? How about arithmetic ^?

Range Lexing

Consider a language in which numeric ranges like these can be expressed:

-5..7
8.2..15
3..-5

Complete the following exercises to help you reason about how a lexer recognizes the tokens of this language.

  1. The tokens of this language are either integer literals, float literals, or the range operator. Draw three separate state machines for these three types of tokens. Note that numeric literals may be negative. Assume that floats have numbers on both sides of the decimal point.
  2. Draw a new a state machine that merges the three separate machines into one. On any given run, the machine matches only a single token, not an entire range sequence. It should have three different accepting states.
  3. Translate your unified state machine into pseudocode for a lex function that returns a list of all valid tokens found in a string. Follow the approach we used in lecture for Settle. Assume that routines has(symbol), has_digit, capture, emit_token(type), and error have already been written.

Submission

To receive credit for this lab, you must submit a digital version of your answers on Canvas by Monday noon. Late labs or forgot-to-submits are not accepted because Monday at noon is when your instructor has time to grade.

← Lecture: Language to Language