Lecture: Settle, Part 3
Dear students:
Today is the day we finish talking about Settle. In particular, we'll look at the parser and the visitor pattern, which are things you need to know more about for milestones 1 and 2 of the spreadsheet project. But we'll also mix in a discussion of types, which you read about for today.
Typechecking
Programmers err. They commit syntax errors, perhaps forgetting a closing parenthesis. They commit logic errors, in which their algorithm runs but computes the wrong thing. And they commit type errors, in which they use an illegal value or operation. If a vending machine expects coins, and we feed in a washer, that's a type error. If a train can go forward or back up but we try to turn it, that's a type error. Our tools will check for type errors to avert disaster. What makes our programming experiences very different is when those tools check for type errors.
If they check early, before the program is run, we have static typechecking. Static means "without movement." If they check when the program is actually running, we have dynamic typechecking. What are the advantages of each?
Static type checking happens early—but not too early. It happens after a program has been lexed and parsed. The compile traverses the AST and maintains a data structure listing what variables are in scope, what their type is, and whether or not they've been initialized. For these checks to happen, the compiler must know the types. They might be declared explicitly, or the compiler might infer what they are from literals or function return types.
In Settle and milestone 1, we do not have explicit types. Nor will we infer types, because it's too much. That means these languages are going to be dynamically typechecked. Let's see how to do that typechecking. But to get to the point we do typechecking, we need to add a bit more logic to our parser from last week.
Parser
The parser's job is to turn a flat list of tokens into a very not-flat tree. Our parser just walks that list from left to right, visiting each token exactly once. What then is the complexity of our parsing algorithm? Linear. \(\mathcal{O}(n)\). Not all languages can be parsed by only looking one token ahead.
Recall this pair of productions that we came up with last time for one rung of our precedence ladder:
level1 = level1 % levelN
| levelN
Recall also that I said we wrote the grammar first because the parsing code sort of falls out from the grammar. Well, if that's true, let's write the parser method for level1
. That seems tricky because there are two productions, and they start with different non-terminals. Let's pretend for a moment that the second production doesn't exist. Our first production starts off something like this:
def level1
left = level1
# ...
end
This code has a problem. The very first statement is a recursive call to the method we are trying to write. Our program will quickly reach a stack overflow. One of the rules of recursion is that the problem must get a little smaller on each recursive call. It must get closer to the base case. That's not happening here.
To fix this, we rewrite the recursion as a loop. A level1
is always going to expand into an expression of the form levelN % levelN % ... % levelN
. There's going to be at least one levelN
.
def level1
left = levelN
while has(:percent)
advance
right = levelN
left = Ast::Intersect.new(left, right)
end
left
end
You should follow this pattern for all left-associative operators in milestone 2. Interestingly, right-associative operators do not suffer from infinite recursion. By the time you get to the right operand, you've advanced through some tokens. The problem has gotten smaller, so the recursion won't be infinite. Methods for right-associative operators read just as you see them in the grammar.
Serialization
We've got a lexer-parser that turns source code into a tree. That won't feel like an accomplishment unless we can do something with that tree. I know. Let's turn it back into text. Reducing data structures into text is called serialization, and it's part of milestone 1. Debugging trees is hard if you can't serialize and print them.
We could add a serialize
method to every single node the program part hierarchy. That's what object-oriented programmers do. They make a bunch of classes, and each class is a new type. Then they add operations to those types as methods. Object-oriented programming makes types the organizing principle of code.
The problem with organizing solely by type is that our brain gets spread across a lot of code. We solve \(\frac{1}{n}\) of the serialization task in one class, \(\frac{1}{n}\) in another, and so on. This is the method we'd write for SetPrimitive
:
def serialize
"{#{@members.join(', ')}}"
end
But we want to take all these serialize
methods out of the classes and put them in one spot. This is what compiler writers do. They put like code with like code, because writing software is tough. They organize code by operation rather than type. We'll make this Serializer
class with all the methods:
class Serializer
def visit_set_primitive(node)
"{#{node.members.join(', ')}}"
end
# ...
end
I made a few changes. The set primitive node has to be passed as a parameter since its method is no longer in the SetPrimitive
class. The instance variable must be accessed through node
. I also renamed the method to something more generic.
Any operation that I need to apply to the nodes of a tree is going to get a similar single class. That goes for evaluation, typechecking, machine code generation, and so on. My code will stay organized. However, I need a way to ask a node to get itself visited by one of these classes. I added this method to my SetPrimitive
class:
class SetPrimitive
# ...
def traverse(visitor)
visitor.visit_set_primitive(self)
end
end
This single method will work for serializers, evaluators, typecheckers, and so on. We'll add support for the %
operator to see how this organization of our code scales.
Evaluator
We also organize all the evaluation code into a single Evaluator
class. The job of each visit
method is to turn the tree into a primitive. That's an easy task if our node is already a primitive:
class Evaluator
def visit_set_primitive(node)
node
end
# ...
end
For other nodes, we must recursively evaluate their children. Those calls will give us back primitives, but because we allow bad trees to be formed, we have to add typechecks.
class Evaluator
# ...
def visit_intersect(node)
leftPrimitive = node.left.traverse(self)
rightPrimitive = node.right.traverse(self)
if !leftPrimitive.is_a?(Ast::SetPrimitive)
raise "operand isn't a set"
end
if !rightPrimitive.is_a?(Ast::SetPrimitive)
raise "operand isn't a set"
end
shareds = leftPrimitive.members & rightPrimitive.members
Ast::SetPrimitive.new(shareds)
end
end
This is the pattern for every single operator: evaluate operands, typecheck the primitives that we get back, perform the operation in the language we do know, and build a new value in the primitive types of our new language.
If we have time, we'll add more operations like ^
for xor, +
for union, and -
for difference.
TODO
Here's your list of things to do before we meet next:
See you next time.
Sincerely,
P.S. It's time for a haiku! As some background, the original meaning of the word static is still or unchanging.
voter_t
's no static type
It got amended