Lab: Lexing and Parsing

Dear Computer

Chapter 1: Programming Language

Lab: Lexing and Parsing

Your goal in this lab is to learn more about lexing and parsing. This exercise is designed to help you understand what the interpreters and compilers you use every day do once they receive your source code and to help prepare you for writing your own lexer and parser for the spreadsheet project, which will be part of milestone 2.

Alchemoji

You are designing an alchemy notation system for a client. The base materials are emoji. A material may be joined with another using the operator. Or a material may be removed from another using the operator. The joining and removing yields a single new material, which is marked by the operator. For example:

Complete the following exercises to help you reason about how a grammar defines the structure of alchemoji equations. Write your answers on paper or in an electronic document.

  1. Ignore the emoji language for a moment, and imagine a different language that allows integers and a binary @ operator. You see the expression 5 @ 2 @ 9. Draw the abstract syntax tree of this expression if the language has this grammar:
    expression = expression @ INT
               | INT
    expression = expression @ INT
               | INT
    Note that an abstract syntax tree is simpler than a parse tree. The AST's internal nodes should be operators (@ for this language), and its leaf nodes are integer terminals. Verify the correctness of your tree using Treeformer. Paste the grammar in the textbox, click abstract syntax tree to place the root node, and expand nodes to match your solution. If you can't generate a tree whose shape matches your own, your solution has a problem.
  2. Draw the abstract syntax tree of 5 @ 2 @ 9 if the language has this grammar:
    expression = INT @ expression
               | INT
    expression = INT @ expression
               | INT
    Use Treeformer to verify the correctness of your tree.
  3. Consider the arithmetic expression a + b + c. Given the conventions of mathematics, what does its abstract syntax tree look like?
  4. Consider the arithmetic expression 2 ^ 3 ^ 2. That's exponentiation, not xor. Given the conventions of mathematics, what does its abstract syntax tree look like? Verify your understanding with a calculator like this one.
  5. Investigate the terms left-associative and right-associative. How do these terms apply to the arithmetic + and - operators? How about arithmetic ^?
  6. Draw the abstract syntax tree of 1 @ 2 @ 3 # 4 # 5 if the language has this grammar:
    atty = atty @ hashy
         | hashy
    hashy = INT # hashy
          | INT
    atty = atty @ hashy
         | hashy
    hashy = INT # hashy
          | INT
    Verify the correctness of your tree using Treeformer. What are the associativities of @ and #? Which operator has higher precedence?
  7. We now return to the alchemical emoji language. Draw abstract syntax trees for the three example expressions above.
  8. Did you make the alchemoji operators left- or right-associative?
  9. Write a grammar that describes the structure of an alchemical equation based on the examples given. Use equation, join, and remove as non-terminals. Make have the lowest precedence, middle precedence, and highest precedence. Revisit the logical operations grammar to consider a similar grammar.
  10. Paste your grammar into Treeformer. Adjust it until you can generate the trees for the example alchemoji equations.
  11. Craft your own alchemoji expression that has a mix of at least three materials left of the yield. Use at least one and at least one . Tell a story with your expression.
  12. Draw an abstract syntax tree for your expression. Verify that you can generate it in Treeformer.

Range Lexing

Consider a language in which numeric ranges like these can be expressed:

-5..7
8.2..15
3..-5
-5..7
8.2..15
3..-5

Complete the following exercises to help you reason about how a lexer recognizes the tokens of this language.

  1. The tokens of this language are either integer literals, float literals, or the range operator. Draw three separate state machines for these three types of tokens. Note that numeric literals may be negative. Assume that floats have numbers on both sides of the decimal point.
  2. Draw a new a state machine that merges the three separate machines into one. On any given run, the machine matches only a single token, not an entire range sequence. It should have three different accepting states.
  3. Translate your unified state machine into pseudocode for a lex function that returns a list of all valid tokens found in a string. Follow the approach we used in lecture for Settle. Assume that routines has(char), has_digit, capture, emit_token(type), and error have already been written.

Submission

To receive credit for this lab, you must submit a digital version of your answers on Canvas by Monday noon. Late labs or forgot-to-submits are not accepted because Monday at noon is when your instructor has time to grade.

← Lecture: Evaluating and Lexing