Milestone 1: Model

Dear Computer

Project: Box

Milestone 1: Model

In this first milestone, you develop the model of your coding challenge application. A model is the set of abstractions that represent the core structures of an application. Models are meant to be independent of any particular user interface, making them reusable under different “skins” and easier to test. In the case of your application, your model is a set of abstractions representing the anatomical parts of a program and the runtime data, like variables, that must be tracked during its execution.

The code you write for your model should use the patterns of abstract syntax trees and visitors discussed in lecture. Before deviating from these patterns, talk with your instructor. If the code you submit is full of wild ideas disconnected from this class, your instructor will assume that you used AI in a way that doesn't support your learning and will ask you to resubmit at a later ready date.

Your application will eventually have its own programming language for writing code. Milestone 2 is about lexing, parsing, and translating that language into an executable form. This milestone doesn't touch on the language itself, but it does have you build the abstractions that your someday interpreter will use to assemble and evaluate abstract syntax trees. You'll design the interface for the application in milestone 3—with hopefully no changes to the model.

Node Hierarchy

A model of a program is an abstract syntax tree, which is built out of nodes for assigning variables, adding values, comparing values, printing things, and so on. For this milestone, write abstractions for these node structures:

If you were writing your application in Java (which isn't an option), you'd define each of these node types as a class and make them all implementers of an interface named AstNode. Choose an appropriate tool in your chosen language to define these abstractions. Give each abstraction the following interface:

Do not add any serialization or evaluation methods to these classes. That code will be placed elsewhere.

The constructor's job is to initialize the node so that it is the root of its subtree. For example, the constructor for arithmetic addition will initialize the state for its left and right child nodes. In milestone 2, these constructors will be called by the parser to construct an abstract syntax tree of a spreadsheet program.

The constructors for the primitive abstractions accept primitives from your target language. In a Java-like language, you might create instances of the integer primitives 5 and 3 with this code:

a = new IntegerPrimitive(5);
b = new IntegerPrimitive(3);
a = new IntegerPrimitive(5);
b = new IntegerPrimitive(3);

Only the four model primitive constructors may accept raw values like this. Every other node class will accept other nodes as parameters. These parameter nodes may be of any node type. For example, you might create an instance of a modulo operation between a sum and a cell with this code:

c = new Add(a, b);
d = new UnaryNegation(new Rvalue("x"));
e = new Multiply(c, d);  // same as (a + b) * -x
c = new Add(a, b);
d = new UnaryNegation(new Rvalue("x"));
e = new Multiply(c, d);  // same as (a + b) * -x

Operator nodes do not accept primitives of the target language. This code, for example, is illegal:

e = new Modulo(5, 3);  // bad: operands must be nodes
e = new Modulo(5, 3);  // bad: operands must be nodes

Compound structures are built out of simpler subtrees, like this:

product = new Multiply(
  new Add(
    new IntegerPrimitive(1),
    new IntegerPrimitive(2)
  ),
  new Add(
    new IntegerPrimitive(3),
    new IntegerPrimitive(4)
  )
);
product = new Multiply(
  new Add(
    new IntegerPrimitive(1),
    new IntegerPrimitive(2)
  ),
  new Add(
    new IntegerPrimitive(3),
    new IntegerPrimitive(4)
  )
);

Such complex expressions are painful to assemble by hand. The interpreter you write in the next milestone translates the more compact and conventional (1 + 2) * (3 + 4) into the unwieldy tree representation shown above.

Though you know that Add should only accept number expressions, Or should only accept boolean expressions, and LeftShift should only accept integer expressions, you should only assume that the operand parameters are of some abstract node type. The constructors should not typecheck their operands. Your application uses a dynamically typed language, and typechecking requires information that is only available at evaluation time. Suppose one of the operands is a variable reference. You won't know the type of value it represents until the variable is looked up, and you therefore cannot typecheck at construction time.

Nor should any constructor evaluate its operands. A child node might reference a variable, but the constructor has no knowledge of variables. Evaluation will happen later. At the constructor stage, you are only building a tree, not executing it.

Here's some advice: start by implementing just a few of these abstractions, like integer primitives and a couple of arithmetic operations. Complete the serialization and evaluation steps below for this small set. Once these steps are working, go back and implement the remaining abstractions.

Visitor

There are two behaviors that we want abstract syntax trees to offer: serialization (which builds a string representation of the tree) and evaluation (which computes a node's primitive value). If we are red-blooded object-oriented programmers, we implement these behaviors with methods scattered across a class hierarchy:

interface Node {
  // Emit a string representation.
  String translate();

  // Evaluate to an int, float, boolean, string, or address.
  PrimitiveNode evaluate(Runtime runtime);
}

class IntegerPrimitive implements PrimitiveNode {
  // ...

  String translate() {
    // ...
  }

  PrimitiveNode evaluate(Runtime runtime) {
    // ...
  }
}

class Add implements Node {
  // ...

  String translate() {
    // ...
  }

  PrimitiveNode evaluate(Runtime runtime) {
    // ...
  }
}
interface Node {
  // Emit a string representation.
  String translate();

  // Evaluate to an int, float, boolean, string, or address.
  PrimitiveNode evaluate(Runtime runtime);
}

class IntegerPrimitive implements PrimitiveNode {
  // ...

  String translate() {
    // ...
  }

  PrimitiveNode evaluate(Runtime runtime) {
    // ...
  }
}

class Add implements Node {
  // ...

  String translate() {
    // ...
  }

  PrimitiveNode evaluate(Runtime runtime) {
    // ...
  }
}

This implementation strategy would work, but it will result in code that is messy and hard to maintain. Suppose you have \(n\) abstractions in your node hierarchy. If you spread the translate and evaluate operations across the hierarchy, you'll have \(\frac{1}{n}\) of their solutions in one file, \(\frac{1}{n}\) in another, and so on. If something needs to change, you'll have to navigate through all the files. And a full-fledged compiler will support many more operations than just serialization and evaluation.

Suppose we could take all those tiny fragments of the solution and group them together in their own file? We can—by following the visitor design pattern. A visitor is an object that recursively processes a tree to achieve some outcome. We'll make one visitor that translates the tree and another that evaluates it. In Java, all visitors would have this common interface:

interface Visitor<R> {
  R visit(IntegerPrimitive node);
  R visit(Add node);
  // ...
}
interface Visitor<R> {
  R visit(IntegerPrimitive node);
  R visit(Add node);
  // ...
}

Type parameter R represents the return type of the visitor.

Then we'd change our node hierarchy to have a universal visit method that works for any Visitor. Each node defers the processing to the method of the visitor appropriate to its type:

interface Node {
  <R> R visit(Visitor<R> visitor);
}

class IntegerPrimitive implements PrimitiveNode {
  // ...
  <R> R visit(Visitor<R> visitor) {
    return visitor.visit(this);
  }
}

class Add implements Node {
  // ...
  <R> R visit(Visitor<R> visitor) {
    return visitor.visit(this);
  }
}
interface Node {
  <R> R visit(Visitor<R> visitor);
}

class IntegerPrimitive implements PrimitiveNode {
  // ...
  <R> R visit(Visitor<R> visitor) {
    return visitor.visit(this);
  }
}

class Add implements Node {
  // ...
  <R> R visit(Visitor<R> visitor) {
    return visitor.visit(this);
  }
}

This reusable visit method in the node classes means that our hierarchy won't get polluted by a bunch of fragments of tree operations. For instance, all the translation code would be tidily organized in this single class:

class Translater implements Visitor<String> {
  String visit(IntegerPrimitive node) {
    // ...
  }

  String visit(Add node) {
    // visit left node
    // visit right node
    // combine results
  }

  // visit methods for other node types
}
class Translater implements Visitor<String> {
  String visit(IntegerPrimitive node) {
    // ...
  }

  String visit(Add node) {
    // visit left node
    // visit right node
    // combine results
  }

  // visit methods for other node types
}

To translate a tree under this visitor pattern, we call its visit method with a new instance of the Translater class:

Node tree = new Add(...);
String text = tree.visit(new Translater());
Node tree = new Add(...);
String text = tree.visit(new Translater());

Evaluation is implemented and performed similarly, but we pass an Evaluator visitor to visit.

If you are using a statically typed language, you can likely implement the visitor pattern just as described for Java. If you are using a dynamically typed language, you've got a problem: dynamically typed languages don't generally support overloading of methods. You can't give a visitor class \(n\) methods named visit with different type signatures. However, you can move the static types into the method names:

class Translater 
  function visitIntegerPrimitive(node)
    # ...

  function visitAdd(node)
    # visit left
    # visit right
    # combine results
class Translater 
  function visitIntegerPrimitive(node)
    # ...

  function visitAdd(node)
    # visit left
    # visit right
    # combine results

A consequence of this renaming is that the node classes can't all share an inherited visit method. They each need their own visit method that explicitly calls their renamed method in the visitor:

class IntegerPrimitive
  function visit(visitor)
    visitor.visitIntegerPrimitive(this)

class Add
  function visit(visitor)
    visitor.visitAdd(this)
class IntegerPrimitive
  function visit(visitor)
    visitor.visitIntegerPrimitive(this)

class Add
  function visit(visitor)
    visitor.visitAdd(this)

The visitor pattern is at the heart of many language tools. If you are using an object-oriented programming language like Ruby, use visitor objects in your implementation. If you are using a functional language like Haskell or Rust, you are likely already organizing your code by operation. In these language, each visitor is a single function that processes the variants of an enum.

Translate

You must write two visitors for this application. The first visitor translates the tree to a string representation of the program in another language. If you translate to Ruby, the end result of visiting is the source code of an equivalent Ruby program. You will primarily translate trees so you can test and debug your code.

The exact format of the serialization is not specified, but the string should show the operations and operands in a readable manner. Consider this node and its translation:

product = new Multiply(
  new Add(
    new IntegerPrimitive(1),
    new IntegerPrimitive(2)
  ),
  new IntegerPrimitive(3),
);
text = product.visit(new Translater());
print(text);
product = new Multiply(
  new Add(
    new IntegerPrimitive(1),
    new IntegerPrimitive(2)
  ),
  new IntegerPrimitive(3),
);
text = product.visit(new Translater());
print(text);

The text ((1 + 2) * 3) is a reasonable result. Feel free to use parentheses liberally to show precedence. If you have time and interest, add logic to parenthesize only when a child operation has lower precedence than its parent.

Evaluate

The second visitor that evaluates an abstract syntax tree. Each node must evaluate to one of your primitive nodes—not one of your target language's primitives. An add operation will yield either your float or integer node. A comparison will yield a boolean node. And so on for all expression nodes. An assignment node returns primitive value of its right-hand side. A print node returns null. A block returns the primitive value of its last statement, or a null primitive if it has no statements.

If a node has operands, the first thing you must do in the visit method is evaluate them. The node is the root of a tree, and the evaluate operation recurses through it to produce a primitive. After evaluating an operation's operands, you must perform type checking. Assert that the arithmetic operations have numeric operands that are compatible. Assert that the logical operations have boolean operands. Assert that the bitwise operations have integer operands. And so on.

Use your target language's error mechanism to bail from the evaluation on a type error and return control to the caller, which can either handle the error or ignore it. In a later milestone you will have the caller present an error message to the user via the interface.

All told, the method for evaluating an add operation will follow this pseudocode structure:

class Evaluator implements Visitor<PrimitiveNode> {
  PrimitiveNode visitAdd(Add node) {
    // evaluate left node to model primitive
    // if left's model primitive has a bad type
    //   raise an exception
    // evaluate right node to model primitive
    // if right's model primitive has a bad type
    //   raise an exception
    // return new float or int primitive for sum
  }
}
class Evaluator implements Visitor<PrimitiveNode> {
  PrimitiveNode visitAdd(Add node) {
    // evaluate left node to model primitive
    // if left's model primitive has a bad type
    //   raise an exception
    // evaluate right node to model primitive
    // if right's model primitive has a bad type
    //   raise an exception
    // return new float or int primitive for sum
  }
}

For many operations, what type of primitive you return depends on the types of the operands. Also, logical operators should short-circuit. If the first operand determines your answer, don't evaluate the second.

The visit methods for evaluating primitives have an easy job. The nodes are already primitives, so they don't need to be evaluated further.

Runtime

During evaluation, we need a place to put persistent state like variables. This data must be carried through the tree traversal, so it can't be tied to any particular node. We'll call this place a runtime environment, or just runtime.

Define a Runtime abstraction to manage an executing program's runtime environment. Have it hold a dictionary of variable bindings. Pass an instance of Runtime along to the Evaluator constructor, and have Evaluator hang on to it. When visit method needs to read or write a variable, it consults the runtime. Most nodes will not need to consult the runtime.

Submission

To submit your milestone, follow the instructions. In particular, do these three things:

In your video, demonstrate adding several programs to a grid and serializing and evaluating them. Include these expressions that are expressed in a hypothetical syntax:

Also evaluate and inspect the output of the tree representations of these multiple-statement programs:

This syntax is just an example pseudocode to communicate the program. You will assemble the trees manually in this first milestone. In the second milestone, you will choose your own syntax.

Show also some programs that fail to typecheck. Incluse these three expressions and others of your own crafting:

In your video, don't comment on or show every single line of code in your video. Select a few representative snippets. Do comment on programming language ideas that interest you or challenged you.

← Recording and SubmittingMilestone 2: Interpreter →