Lecture: Evaluating and Lexing

Dear Computer

Chapter 1: Programming Language

Lecture: Evaluating and Lexing

Dear students:

In this class, we continue to learn about programming languages in two ways: we're building an interpreter for a small language, and we're building it in Ruby, a language that few of you have used before. The first step of working with programs in an invented language is to translate the program into a tree structure in a language we already know. The second step is to operate on that tree. We've been thinking about this process in reverse. Last week we started modeling the tree and operating on it. We finish that step this week, but we also start looking at how to translating it.

What we talk about today is meant to mirror what you will need to do for milestones 1 and 2 in the spreadsheet project. You should take copious notes because everything we do today is on purpose.

Evaluating

Earlier we saw Serializer, a visitor that walked through an abstract syntax tree and turned its nodes into a string. Now we walk the tree and evaluate it down to a primitive node. Each method of the Evaluator must return a single primitive node, an Integer or a Vector2. That means the visit methods for the primitives have it easy:

Ruby
class Evaluator
  def visit_integer(node)
    node
  end

  def visit_vector2(node)
    node
  end
end
class Evaluator
  def visit_integer(node)
    node
  end

  def visit_vector2(node)
    node
  end
end

The print command is not an expression. It doesn't yield a value but instead achieves a side effect. It has a parameter node, which could be any arbitrary tree. How do we turn that parameter into something we can print? We evaluate it down to a primitive. Then we check the type of that primitive to see what it is. Like this:

Ruby
class Evaluator
  # ...

  def visit_print(node)
    primitive = node.parameter_node.visit(self)
    if primitive.is_a?(Ast::Integer)
      puts primitive.raw_value
    elsif primitive.is_a?(Ast::Vector2)
      puts "[#{primitive.raw_x_value}, #{primitive.raw_y_value}]"
    else
      raise "type can't be printed"
    end
  end
end
class Evaluator
  # ...

  def visit_print(node)
    primitive = node.parameter_node.visit(self)
    if primitive.is_a?(Ast::Integer)
      puts primitive.raw_value
    elsif primitive.is_a?(Ast::Vector2)
      puts "[#{primitive.raw_x_value}, #{primitive.raw_y_value}]"
    else
      raise "type can't be printed"
    end
  end
end

The visit methods for Add and Multiply are implemented similarly. Variables require a bit more work. We need the evaluator to keep track of what primitives are associated with what names. Let's create a helper data structure for tracking information that must be managed as a program executes. We'll call it Runtime. Currently it will just have a dictionary (hash) of the variables:

Ruby
class Runtime
  def initialize
    @variables = {}
  end

  def get_variable(identifier)
    @variables[identifier]
  end

  def set_variable(identifier, primitive)
    @variables[identifier] = primitive
  end
end
class Runtime
  def initialize
    @variables = {}
  end

  def get_variable(identifier)
    @variables[identifier]
  end

  def set_variable(identifier, primitive)
    @variables[identifier] = primitive
  end
end

Later on we may choose to add extra execution state. Maybe function definitions. Maybe a random number generator.

We pass a Runtime along to our Evaluator to do lookups and assignments:

Ruby
class Evaluator
  def initialize(runtime)
    @runtime = runtime
  end

  def visit_variable_reference(node)
    # TODO: raise error if unknown
    @runtime.get_variable(node.identifier)
  end

  def visit_assignment(node)
    primitive = node.right_node.visit(self)
    @runtime.set_variable(node.identifier, primitive)
  end
end
class Evaluator
  def initialize(runtime)
    @runtime = runtime
  end

  def visit_variable_reference(node)
    # TODO: raise error if unknown
    @runtime.get_variable(node.identifier)
  end

  def visit_assignment(node)
    primitive = node.right_node.visit(self)
    @runtime.set_variable(node.identifier, primitive)
  end
end

In the end, we'll have a machine that can turn a tree into a value.

The task of developing model classes and writing Serializer and Evaluator is exactly what you will be doing in milestone 1 of the spreadsheet project.

Lexing

Recall this Vex program from last time:

Ruby
p = [1, -1]
q = [4, 3]
s = p + q * 2
print(s)
p = [1, -1]
q = [4, 3]
s = p + q * 2
print(s)

We built a tree representation of this program by hand. Now we want to build a tool that turns text into a tree. The problem is easier if we build two tools, one to break the text into small chunks and another to organize the chunks into structures. Our brain does something similar when we read. We chunk the letters into words, and then organize the words into grammatical structures.

The words of a program are its tokens. What are the tokens of the program above? Here are a few:

Most of them are just a single character, but that's just a coincidence. Operators could be multiple symbols, like >=. We need to build a machine that eats characters and reports when it has digested a token. Such a machine is called a lexer.

We'll build lexers for recognizing the following tokens: a bingo call and a dimension string of the form WIDTH x HEIGHT. Ultimately a lexer is written in code, but we'll draw them as state machines to develop a visceral feel for what the code does.

Then we'll build a lexer for Vex. We'll draw state machines for all the kinds of tokens we might encounter. Many machines will have just a single accepting state and a single transition. The identifier machine will have two states with a loop.

Ultimately, however, we need these machines running in code. Here's how we might write a routine for lexing a left bracket:

Ruby
if has('[')
  capture
  emit_token(:left_bracket)
end
if has('[')
  capture
  emit_token(:left_bracket)
end

Those little machines recognize only one type of token. We need a mega-machine that recognizes all of them, so we glue them together. Once they are glued, we write the entire lexing routine, which might look like this:

Ruby
if has('[')
  capture
  emit_token(:left_bracket)
elsif has(']')
  capture
  emit_token(:right_bracket)
elsif has('=')
  capture
  emit_token(:equals)
elsif has('+')
  capture
  emit_token(:plus)
elsif has('(')
  capture
  emit_token(:left_parenthesis)
elsif has(')')
  capture
  emit_token(:right_parenthesis)
# ...
elsif has_alphabetic
  while has_alphabetic
    capture
  end
  emit_token(:identifier)
end
if has('[')
  capture
  emit_token(:left_bracket)
elsif has(']')
  capture
  emit_token(:right_bracket)
elsif has('=')
  capture
  emit_token(:equals)
elsif has('+')
  capture
  emit_token(:plus)
elsif has('(')
  capture
  emit_token(:left_parenthesis)
elsif has(')')
  capture
  emit_token(:right_parenthesis)
# ...
elsif has_alphabetic
  while has_alphabetic
    capture
  end
  emit_token(:identifier)
end

This code belongs in a lexer abstraction. We'll write it together and also define the helper methods like has, has_alphabetic, capture, and emit_token.

TODO

Here's your list of things to do before we meet next:

Complete the middle quiz as desired.
Continue working on milestone 1. Devote a little time each day, and it will be done. Commit to your GitHub repository after every work session.
Keep learning Ruby through YouTube or a book.
Default to #general in Discord for messages. I am not the star of this show. We are a community. Use a DM if you need to share personal matters. Don't share graded work like screenshots of book questions in #general.
Tell me through a DM if you are partnering with someone on the spreadsheet project.

See you next time.

Sincerely,

P.S. It's time for a haiku!

I learned English first But not till I learned Spanish Did I learn language
← Lecture: Visiting TreesLab: Lexing and Parsing →