Lecture: Evaluating Trees
Dear students:
We can't do much with a program in its text form. But if we turn it into a tree, then we can act on it. We can evaluate it, turn it back into nicely formatted source code, turn it into machine code, and so on. Today we consider a sustainable approach to defining tree operations. Instead of fragmenting the code for an operation across the many node classes, we'll put them in one place: a visitor.
Ruby Investigation
Before we dive into the problem of operating on trees, let's investigate the Ruby way of accomplishing certain programming tasks. We'll number off as tables. Answer your table's question from this list:
- What's weird about returning values in Ruby?
- What's the idiomatic way to loop through a collection in Ruby?
- How can I form a string in Ruby without ugly concatenation?
- How do we write getters and setters in Ruby?
Consult several sources to confirm your answer. Discuss it with your tablemates.
Node Operations
Once a program is in a tree form, we can do things with it. In general, we can feed it into a machine and get back something else.
What sort of operations might we want to perform on the tree? I can think of the following tasks:
- execute the program
- reformat the code (or serialize the tree back into text)
- check the types
- rename a variable in a particular scope
- generate documentation
- lint the code
- generate machine code
To implement these operations across a tree, our object-oriented intuition tells us to put a method in every class of our node hierarchy. The code would look something like this:
module Ast
class Integer
def initialize(raw_value)
@raw_value = raw_value
end
def evaluate
# ...
end
def serialize
# ...
end
def check_types
# ...
end
end
class Vector2
def initialize(raw_x, raw_y)
@raw_x = raw_x
@raw_y = raw_y
end
def evaluate
# ...
end
def serialize
# ...
end
def check_types
# ...
end
end
class Print
def initialize(parameter_node)
@parameter_node = parameter_node
end
def evaluate
# ...
end
def serialize
# ...
end
def check_types
# ...
end
end
end
module Ast class Integer def initialize(raw_value) @raw_value = raw_value end def evaluate # ... end def serialize # ... end def check_types # ... end end class Vector2 def initialize(raw_x, raw_y) @raw_x = raw_x @raw_y = raw_y end def evaluate # ... end def serialize # ... end def check_types # ... end end class Print def initialize(parameter_node) @parameter_node = parameter_node end def evaluate # ... end def serialize # ... end def check_types # ... end end end
This approach makes for some very scattered logic. If there are \(n\) classes in our hierarchy, we'll have \(\frac{1}{n}\) of each algorithm in each class. Wouldn't it be nice if we could put all the evaluate logic together? And all the serialization logic together in a separate structure? We can, using a little indirection.
Visitor Pattern
Let's take all the serialize
methods out of the individual classes and put them into one Serializer
class that is focused entirely on serialization. As we extract the methods, we rename so they are unique. Since we're not in the classes anymore, the nodes must come in as a parameter. We end up with something like this:
class Serializer
def visit_integer(node)
# ...
end
def visit_vector2(node)
# ...
end
def visit_print(node)
# ...
end
end
class Serializer def visit_integer(node) # ... end def visit_vector2(node) # ... end def visit_print(node) # ... end end
Likewise, all the evaluate
methods go in a class named Evaluator
:
class Evaluator
def visit_integer(node)
# ...
end
def visit_vector2(node)
# ...
end
def visit_print(node)
# ...
end
end
class Evaluator def visit_integer(node) # ... end def visit_vector2(node) # ... end def visit_print(node) # ... end end
We'll make a class for each operation that we want to perform on a tree. The classes will be focused and easier to maintain. We will like our code and our smart selves.
We have lost something by moving these methods away from the objects. No longer can we ask an object to go serialize itself or evaluate itself. To serialize a node now, we have to know its type so we can call the appropriate method.
node = Ast::Print.new(Ast::Vector2.new(6, 3))
# The old way was flexible.
# node.serialize
# The new way is not flexible.
serializer = Serializer.new
serializer.visit_print(node)
node = Ast::Print.new(Ast::Vector2.new(6, 3)) # The old way was flexible. # node.serialize # The new way is not flexible. serializer = Serializer.new serializer.visit_print(node)
However, we are not done. All our tree algorithm machines have the same interface. They visit nodes. They are visitors. We can give each object exactly one method that tells it how to be visited:
module Ast
class Integer
def initialize(raw_value)
@raw_value = raw_value
end
def visit(visitor)
visitor.visit_integer(self)
end
end
class Vector2
def initialize(raw_x, raw_y)
@raw_x = raw_x
@raw_y = raw_y
end
def visit(visitor)
visitor.visit_vector2(self)
end
end
class Print
def initialize(parameter_node)
@parameter_node = parameter_node
end
def visit(visitor)
visitor.visit_print(self)
end
end
end
module Ast class Integer def initialize(raw_value) @raw_value = raw_value end def visit(visitor) visitor.visit_integer(self) end end class Vector2 def initialize(raw_x, raw_y) @raw_x = raw_x @raw_y = raw_y end def visit(visitor) visitor.visit_vector2(self) end end class Print def initialize(parameter_node) @parameter_node = parameter_node end def visit(visitor) visitor.visit_print(self) end end end
With these bridging methods, we again put the authority back in the hands of the objects themselves. This node knows how to get itself visited:
node = Ast::Print.new(Ast::Vector2.new(6, 3))
node.visit(Serializer.new)
node = Ast::Print.new(Ast::Vector2.new(6, 3)) node.visit(Serializer.new)
This way of organizing objects without polluting the hierarchy with fragments of their operations is called the visitor pattern. In some languages where types are checked early, like Java, we would need to make a Visitor
interface or abstract class. Not so in Ruby. A visitor visits if it has the right visit
methods.
Serializer
We're ready to fully implement our serializer. Each visit method must return a string representation of the node. The primitives aren't too bad:
class Serializer
def visit_integer(node)
node.raw_value.to_s
end
def visit_vector2(node)
"[#{node.x_value}, #{node.y_value}]"
end
end
class Serializer def visit_integer(node) node.raw_value.to_s end def visit_vector2(node) "[#{node.x_value}, #{node.y_value}]" end end
Oh, but serialization fails when we test this. We can't access instance variables from outside a class in Ruby. They are private, and that can't be changed. We must add getters to all our classes, like this:
class Integer
attr_reader :raw_value
def initialize(raw_value)
@raw_value = raw_value
end
end
class Integer attr_reader :raw_value def initialize(raw_value) @raw_value = raw_value end end
The remaining visit methods are implemented similarly, but most of them have other nodes as children. We will need to recurse across the node hierarchy to fully serialize a tree. We might serialize an Add
node like this:
class Serializer
# ...
def visit_add(node)
"(#{node.left_node.visit(self)} + #{node.right_node.visit(self)})"
end
end
class Serializer # ... def visit_add(node) "(#{node.left_node.visit(self)} + #{node.right_node.visit(self)})" end end
We put parentheses around the string to make the association clear.
In the end, we'll have a machine that can turn trees into text.
Evaluator
Each method of the Evaluator
must produce a tree with just a single primitive node, an Integer
or a Vector2
. That means the visit methods for the primitives have it easy:
class Evaluator
def visit_integer(node)
node
end
def visit_vector2(node)
node
end
end
class Evaluator def visit_integer(node) node end def visit_vector2(node) node end end
The print command is not an expression. It doesn't yield a value but instead achieves a side effect. It has a parameter node, which could be any arbitrary tree. How do we turn that parameter into something we can print? We evaluate it down to a primitive. Then we check the type of that primitive to see what it is. Like this:
class Evaluator
# ...
def visit_print(node)
primitive = node.parameter_node.visit(self)
if primitive.is_a?(Ast::Integer)
puts primitive.raw_value
elsif primitive.is_a?(Ast::Vector2)
puts "[#{primitive.raw_x_value}, #{primitive.raw_y_value}]"
else
raise "type can't be printed"
end
end
end
class Evaluator # ... def visit_print(node) primitive = node.parameter_node.visit(self) if primitive.is_a?(Ast::Integer) puts primitive.raw_value elsif primitive.is_a?(Ast::Vector2) puts "[#{primitive.raw_x_value}, #{primitive.raw_y_value}]" else raise "type can't be printed" end end end
The visit methods for Add
and Multiply
are implemented similarly. Variables require a bit more work. We need the evaluator to keep track of what primitives are associated with what names. Let's create a helper data structure for tracking information that must be managed as a program executes. We'll call it Runtime
. Currently it will just have a dictionary (hash) of the variables:
class Runtime
def initialize
@variables = {}
end
def get_variable(identifier)
@variables[identifier]
end
def set_variable(identifier, primitive)
@variables[identifier] = primitive
end
end
class Runtime def initialize @variables = {} end def get_variable(identifier) @variables[identifier] end def set_variable(identifier, primitive) @variables[identifier] = primitive end end
Later on we may choose to add extra execution state. We pass a Runtime
along to our Evaluator
to do lookups and assignments:
class Evaluator
def initialize(runtime)
@runtime = runtime
end
def visit_variable_reference(node)
# TODO: raise error if unknown
@runtime.get_variable(node.identifier)
end
def visit_assignment(node)
primitive = node.right_node.visit(self)
@runtime.set_variable(node.identifier, primitive)
end
end
class Evaluator def initialize(runtime) @runtime = runtime end def visit_variable_reference(node) # TODO: raise error if unknown @runtime.get_variable(node.identifier) end def visit_assignment(node) primitive = node.right_node.visit(self) @runtime.set_variable(node.identifier, primitive) end end
In the end, we'll have a machine that can turn a tree into a value.
The task of developing model classes and writing these two visitors is exactly what you will be doing in milestone 1 of the spreadsheet project.
TODO
Here's your list of things to do before we meet next:
See you for our first, in which we continue exploring lexers, grammars, and precedence.
Sincerely,
P.S. It's time for a haiku!