Lecture: Evaluating Trees

Dear Computer

Chapter 1: Programming Language

Lecture: Evaluating Trees

Dear students:

We can't do much with a program in its text form. But if we turn it into a tree, then we can act on it. We can evaluate it, turn it back into nicely formatted source code, turn it into machine code, and so on. Today we consider a sustainable approach to defining tree operations. Instead of fragmenting the code for an operation across the many node classes, we'll put them in one place: a visitor.

Ruby Investigation

Before we dive into the problem of operating on trees, let's investigate the Ruby way of accomplishing certain programming tasks. We'll number off as tables. Answer your table's question from this list:

  1. What's weird about returning values in Ruby?
  2. What's the idiomatic way to loop through a collection in Ruby?
  3. How can I form a string in Ruby without ugly concatenation?
  4. How do we write getters and setters in Ruby?

Consult several sources to confirm your answer. Discuss it with your tablemates.

Node Operations

Once a program is in a tree form, we can do things with it. In general, we can feed it into a machine and get back something else.

What sort of operations might we want to perform on the tree? I can think of the following tasks:

To implement these operations across a tree, our object-oriented intuition tells us to put a method in every class of our node hierarchy. The code would look something like this:

Ruby
module Ast
  class Integer
    def initialize(raw_value)
      @raw_value = raw_value
    end

    def evaluate
      # ...
    end

    def serialize
      # ...
    end

    def check_types
      # ...
    end
  end

  class Vector2
    def initialize(raw_x, raw_y)
      @raw_x = raw_x
      @raw_y = raw_y
    end

    def evaluate
      # ...
    end

    def serialize
      # ...
    end

    def check_types
      # ...
    end
  end

  class Print
    def initialize(parameter_node)
      @parameter_node = parameter_node
    end

    def evaluate
      # ...
    end

    def serialize
      # ...
    end

    def check_types
      # ...
    end
  end
end
module Ast
  class Integer
    def initialize(raw_value)
      @raw_value = raw_value
    end

    def evaluate
      # ...
    end

    def serialize
      # ...
    end

    def check_types
      # ...
    end
  end

  class Vector2
    def initialize(raw_x, raw_y)
      @raw_x = raw_x
      @raw_y = raw_y
    end

    def evaluate
      # ...
    end

    def serialize
      # ...
    end

    def check_types
      # ...
    end
  end

  class Print
    def initialize(parameter_node)
      @parameter_node = parameter_node
    end

    def evaluate
      # ...
    end

    def serialize
      # ...
    end

    def check_types
      # ...
    end
  end
end

This approach makes for some very scattered logic. If there are \(n\) classes in our hierarchy, we'll have \(\frac{1}{n}\) of each algorithm in each class. Wouldn't it be nice if we could put all the evaluate logic together? And all the serialization logic together in a separate structure? We can, using a little indirection.

Visitor Pattern

Let's take all the serialize methods out of the individual classes and put them into one Serializer class that is focused entirely on serialization. As we extract the methods, we rename so they are unique. Since we're not in the classes anymore, the nodes must come in as a parameter. We end up with something like this:

Ruby
class Serializer
  def visit_integer(node)
    # ...
  end

  def visit_vector2(node)
    # ...
  end

  def visit_print(node)
    # ...
  end
end
class Serializer
  def visit_integer(node)
    # ...
  end

  def visit_vector2(node)
    # ...
  end

  def visit_print(node)
    # ...
  end
end

Likewise, all the evaluate methods go in a class named Evaluator:

Ruby
class Evaluator
  def visit_integer(node)
    # ...
  end

  def visit_vector2(node)
    # ...
  end

  def visit_print(node)
    # ...
  end
end
class Evaluator
  def visit_integer(node)
    # ...
  end

  def visit_vector2(node)
    # ...
  end

  def visit_print(node)
    # ...
  end
end

We'll make a class for each operation that we want to perform on a tree. The classes will be focused and easier to maintain. We will like our code and our smart selves.

We have lost something by moving these methods away from the objects. No longer can we ask an object to go serialize itself or evaluate itself. To serialize a node now, we have to know its type so we can call the appropriate method.

Ruby
node = Ast::Print.new(Ast::Vector2.new(6, 3))

# The old way was flexible.
# node.serialize

# The new way is not flexible.
serializer = Serializer.new
serializer.visit_print(node)
node = Ast::Print.new(Ast::Vector2.new(6, 3))

# The old way was flexible.
# node.serialize

# The new way is not flexible.
serializer = Serializer.new
serializer.visit_print(node)

However, we are not done. All our tree algorithm machines have the same interface. They visit nodes. They are visitors. We can give each object exactly one method that tells it how to be visited:

Ruby
module Ast
  class Integer
    def initialize(raw_value)
      @raw_value = raw_value
    end

    def visit(visitor)
      visitor.visit_integer(self)
    end
  end

  class Vector2
    def initialize(raw_x, raw_y)
      @raw_x = raw_x
      @raw_y = raw_y
    end

    def visit(visitor)
      visitor.visit_vector2(self)
    end
  end

  class Print
    def initialize(parameter_node)
      @parameter_node = parameter_node
    end

    def visit(visitor)
      visitor.visit_print(self)
    end
  end
end
module Ast
  class Integer
    def initialize(raw_value)
      @raw_value = raw_value
    end

    def visit(visitor)
      visitor.visit_integer(self)
    end
  end

  class Vector2
    def initialize(raw_x, raw_y)
      @raw_x = raw_x
      @raw_y = raw_y
    end

    def visit(visitor)
      visitor.visit_vector2(self)
    end
  end

  class Print
    def initialize(parameter_node)
      @parameter_node = parameter_node
    end

    def visit(visitor)
      visitor.visit_print(self)
    end
  end
end

With these bridging methods, we again put the authority back in the hands of the objects themselves. This node knows how to get itself visited:

Ruby
node = Ast::Print.new(Ast::Vector2.new(6, 3))
node.visit(Serializer.new)
node = Ast::Print.new(Ast::Vector2.new(6, 3))
node.visit(Serializer.new)

This way of organizing objects without polluting the hierarchy with fragments of their operations is called the visitor pattern. In some languages where types are checked early, like Java, we would need to make a Visitor interface or abstract class. Not so in Ruby. A visitor visits if it has the right visit methods.

Serializer

We're ready to fully implement our serializer. Each visit method must return a string representation of the node. The primitives aren't too bad:

Ruby
class Serializer
  def visit_integer(node)
    node.raw_value.to_s
  end

  def visit_vector2(node)
    "[#{node.x_value}, #{node.y_value}]"
  end
end
class Serializer
  def visit_integer(node)
    node.raw_value.to_s
  end

  def visit_vector2(node)
    "[#{node.x_value}, #{node.y_value}]"
  end
end

Oh, but serialization fails when we test this. We can't access instance variables from outside a class in Ruby. They are private, and that can't be changed. We must add getters to all our classes, like this:

Ruby
class Integer
  attr_reader :raw_value
  def initialize(raw_value)
    @raw_value = raw_value
  end
end
class Integer
  attr_reader :raw_value
  def initialize(raw_value)
    @raw_value = raw_value
  end
end

The remaining visit methods are implemented similarly, but most of them have other nodes as children. We will need to recurse across the node hierarchy to fully serialize a tree. We might serialize an Add node like this:

Ruby
class Serializer
  # ...

  def visit_add(node)
    "(#{node.left_node.visit(self)} + #{node.right_node.visit(self)})"
  end
end
class Serializer
  # ...

  def visit_add(node)
    "(#{node.left_node.visit(self)} + #{node.right_node.visit(self)})"
  end
end

We put parentheses around the string to make the association clear.

In the end, we'll have a machine that can turn trees into text.

Evaluator

Each method of the Evaluator must produce a tree with just a single primitive node, an Integer or a Vector2. That means the visit methods for the primitives have it easy:

Ruby
class Evaluator
  def visit_integer(node)
    node
  end

  def visit_vector2(node)
    node
  end
end
class Evaluator
  def visit_integer(node)
    node
  end

  def visit_vector2(node)
    node
  end
end

The print command is not an expression. It doesn't yield a value but instead achieves a side effect. It has a parameter node, which could be any arbitrary tree. How do we turn that parameter into something we can print? We evaluate it down to a primitive. Then we check the type of that primitive to see what it is. Like this:

Ruby
class Evaluator
  # ...

  def visit_print(node)
    primitive = node.parameter_node.visit(self)
    if primitive.is_a?(Ast::Integer)
      puts primitive.raw_value
    elsif primitive.is_a?(Ast::Vector2)
      puts "[#{primitive.raw_x_value}, #{primitive.raw_y_value}]"
    else
      raise "type can't be printed"
    end
  end
end
class Evaluator
  # ...

  def visit_print(node)
    primitive = node.parameter_node.visit(self)
    if primitive.is_a?(Ast::Integer)
      puts primitive.raw_value
    elsif primitive.is_a?(Ast::Vector2)
      puts "[#{primitive.raw_x_value}, #{primitive.raw_y_value}]"
    else
      raise "type can't be printed"
    end
  end
end

The visit methods for Add and Multiply are implemented similarly. Variables require a bit more work. We need the evaluator to keep track of what primitives are associated with what names. Let's create a helper data structure for tracking information that must be managed as a program executes. We'll call it Runtime. Currently it will just have a dictionary (hash) of the variables:

Ruby
class Runtime
  def initialize
    @variables = {}
  end

  def get_variable(identifier)
    @variables[identifier]
  end

  def set_variable(identifier, primitive)
    @variables[identifier] = primitive
  end
end
class Runtime
  def initialize
    @variables = {}
  end

  def get_variable(identifier)
    @variables[identifier]
  end

  def set_variable(identifier, primitive)
    @variables[identifier] = primitive
  end
end

Later on we may choose to add extra execution state. We pass a Runtime along to our Evaluator to do lookups and assignments:

Ruby
class Evaluator
  def initialize(runtime)
    @runtime = runtime
  end

  def visit_variable_reference(node)
    # TODO: raise error if unknown
    @runtime.get_variable(node.identifier)
  end

  def visit_assignment(node)
    primitive = node.right_node.visit(self)
    @runtime.set_variable(node.identifier, primitive)
  end
end
class Evaluator
  def initialize(runtime)
    @runtime = runtime
  end

  def visit_variable_reference(node)
    # TODO: raise error if unknown
    @runtime.get_variable(node.identifier)
  end

  def visit_assignment(node)
    primitive = node.right_node.visit(self)
    @runtime.set_variable(node.identifier, primitive)
  end
end

In the end, we'll have a machine that can turn a tree into a value.

The task of developing model classes and writing these two visitors is exactly what you will be doing in milestone 1 of the spreadsheet project.

TODO

Here's your list of things to do before we meet next:

Read chapter 1 and take the front quiz before Tuesday morning.
Read the introduction to the spreadsheet project and milestone 1.
If you plan to work with a partner on the spreadsheet project, send me a direct message on Discord.
Start working on milestone 1. Devote a little time each day, and it will be done. Commit to your GitHub repository after every work session.
Keep learning Ruby through YouTube or a book.

See you for our first, in which we continue exploring lexers, grammars, and precedence.

Sincerely,

P.S. It's time for a haiku!

Compilers are dogs They run, fetch, sniff, point, and bark And they visit trees
← Lecture: Hello LanguageLecture: Evaluating and Lexing →