Lecture: Settle, Part 3

Dear students:

Today is the day we finish talking about Settle. In particular, we'll look at the parser and the visitor pattern, which are things you need to know more about for milestones 1 and 2 of the spreadsheet project. But we'll also mix in a discussion of types, which you read about for today.

Typechecking

Programmers err. They commit syntax errors, perhaps forgetting a closing parenthesis. They commit logic errors, in which their algorithm runs but computes the wrong thing. And they commit type errors, in which they use an illegal value or operation. If a vending machine expects coins, and we feed in a washer, that's a type error. If a train can go forward or back up but we try to turn it, that's a type error. Our tools will check for type errors to avert disaster. What makes our programming experiences very different is when those tools check for type errors.

If they check early, before the program is run, we have static typechecking. Static means "without movement." If they check when the program is actually running, we have dynamic typechecking. What are the advantages of each?

Static type checking happens early—but not too early. It happens after a program has been lexed and parsed. The compile traverses the AST and maintains a data structure listing what variables are in scope, what their type is, and whether or not they've been initialized. For these checks to happen, the compiler must know the types. They might be declared explicitly, or the compiler might infer what they are from literals or function return types.

In Settle and milestone 1, we do not have explicit types. Nor will we infer types, because it's too much. That means these languages are going to be dynamically typechecked. Let's see how to do that typechecking. But to get to the point we do typechecking, we need to add a bit more logic to our parser from last week.

Parser

The parser's job is to turn a flat list of tokens into a very not-flat tree. Our parser just walks that list from left to right, visiting each token exactly once. What then is the complexity of our parsing algorithm? Linear. \(\mathcal{O}(n)\). Not all languages can be parsed by only looking one token ahead.

Recall this pair of productions that we came up with last time for one rung of our precedence ladder:

level1 = level1 % levelN
       | levelN

level1 = level1 % levelN
       | levelN

Recall also that I said we wrote the grammar first because the parsing code sort of falls out from the grammar. Well, if that's true, let's write the parser method for level1. That seems tricky because there are two productions, and they start with different non-terminals. Let's pretend for a moment that the second production doesn't exist. Our first production starts off something like this:

Ruby

def level1
  left = level1
  # ...
end

def level1
  left = level1
  # ...
end

This code has a problem. The very first statement is a recursive call to the method we are trying to write. Our program will quickly reach a stack overflow. One of the rules of recursion is that the problem must get a little smaller on each recursive call. It must get closer to the base case. That's not happening here.

To fix this, we rewrite the recursion as a loop. A level1 is always going to expand into an expression of the form levelN % levelN % ... % levelN. There's going to be at least one levelN.

Ruby

def level1
  left = levelN
  while has(:percent)
    advance
    right = levelN
    left = Ast::Intersect.new(left, right)
  end
  left
end

def level1
  left = levelN
  while has(:percent)
    advance
    right = levelN
    left = Ast::Intersect.new(left, right)
  end
  left
end

You should follow this pattern for all left-associative operators in milestone 2. Interestingly, right-associative operators do not suffer from infinite recursion. By the time you get to the right operand, you've advanced through some tokens. The problem has gotten smaller, so the recursion won't be infinite. Methods for right-associative operators read just as you see them in the grammar.

Serialization

We've got a lexer-parser that turns source code into a tree. That won't feel like an accomplishment unless we can do something with that tree. I know. Let's turn it back into text. Reducing data structures into text is called serialization, and it's part of milestone 1. Debugging trees is hard if you can't serialize and print them.

We could add a serialize method to every single node the program part hierarchy. That's what object-oriented programmers do. They make a bunch of classes, and each class is a new type. Then they add operations to those types as methods. Object-oriented programming makes types the organizing principle of code.

The problem with organizing solely by type is that our brain gets spread across a lot of code. We solve \(\frac{1}{n}\) of the serialization task in one class, \(\frac{1}{n}\) in another, and so on. This is the method we'd write for SetPrimitive:

Ruby

def serialize
  "{#{@members.join(', ')}}"
end

def serialize
  "{#{@members.join(', ')}}"
end

But we want to take all these serialize methods out of the classes and put them in one spot. This is what compiler writers do. They put like code with like code, because writing software is tough. They organize code by operation rather than type. We'll make this Serializer class with all the methods:

Ruby

class Serializer
  def visit_set_primitive(node)
    "{#{node.members.join(', ')}}"
  end

  # ...
end

class Serializer
  def visit_set_primitive(node)
    "{#{node.members.join(', ')}}"
  end

  # ...
end

I made a few changes. The set primitive node has to be passed as a parameter since its method is no longer in the SetPrimitive class. The instance variable must be accessed through node. I also renamed the method to something more generic.

Any operation that I need to apply to the nodes of a tree is going to get a similar single class. That goes for evaluation, typechecking, machine code generation, and so on. My code will stay organized. However, I need a way to ask a node to get itself visited by one of these classes. I added this method to my SetPrimitive class:

Ruby

class SetPrimitive
  # ...
  def traverse(visitor)
    visitor.visit_set_primitive(self)
  end
end

class SetPrimitive
  # ...
  def traverse(visitor)
    visitor.visit_set_primitive(self)
  end
end

This single method will work for serializers, evaluators, typecheckers, and so on. We'll add support for the % operator to see how this organization of our code scales.

Evaluator

We also organize all the evaluation code into a single Evaluator class. The job of each visit method is to turn the tree into a primitive. That's an easy task if our node is already a primitive:

Ruby

class Evaluator
  def visit_set_primitive(node)
    node
  end

  # ...
end

class Evaluator
  def visit_set_primitive(node)
    node
  end

  # ...
end

For other nodes, we must recursively evaluate their children. Those calls will give us back primitives, but because we allow bad trees to be formed, we have to add typechecks.

Ruby

class Evaluator
  # ...

  def visit_intersect(node)
    leftPrimitive = node.left.traverse(self)
    rightPrimitive = node.right.traverse(self)
    if !leftPrimitive.is_a?(Ast::SetPrimitive)
      raise "operand isn't a set"
    end
    if !rightPrimitive.is_a?(Ast::SetPrimitive)
      raise "operand isn't a set"
    end
    shareds = leftPrimitive.members & rightPrimitive.members
    Ast::SetPrimitive.new(shareds)
  end
end

class Evaluator
  # ...

  def visit_intersect(node)
    leftPrimitive = node.left.traverse(self)
    rightPrimitive = node.right.traverse(self)
    if !leftPrimitive.is_a?(Ast::SetPrimitive)
      raise "operand isn't a set"
    end
    if !rightPrimitive.is_a?(Ast::SetPrimitive)
      raise "operand isn't a set"
    end
    shareds = leftPrimitive.members & rightPrimitive.members
    Ast::SetPrimitive.new(shareds)
  end
end

This is the pattern for every single operator: evaluate operands, typecheck the primitives that we get back, perform the operation in the language we do know, and build a new value in the primitive types of our new language.

If we have time, we'll add more operations like ^ for xor, + for union, and - for difference.

TODO

Here's your list of things to do before we meet next:

Keep working on your spreadsheet milestones. Commit to GitHub after every work session.

Complete the middle quiz as desired.

See you next time.

Sincerely,

P.S. It's time for a haiku! As some background, the original meaning of the word static is still or unchanging.

Black, woman, and young voter_t's no static type It got amended

Dear Computer

Lecture: Settle, Part 3

Typechecking

Parser

Serialization

Evaluator

TODO