Lecture: Two Polymorphisms

Dear Computer

Chapter 12: Polymorphism Revisited

Lecture: Two Polymorphisms

Dear students:

Programmers love reuse more than is perhaps healthy. Nothing feels greater than making a piece of code work for \(n\) types, even when \(n\) is only 2. Polymorphism is the name we give to the notion of having one chunk of code serve multiple types, and there are two very different mechanisms for achieving it. They are subtype polymorphism and parametric polymorphism. Both are present in Java, C++, and Rust. Today we examine them in Rust as we bumble our way through some data structures. First we'll look at modeling an HTML document using subtype polymorphism. Then we'll try building a binary search tree using parametric polymorphism.

Which Polymorphism?

First, though, let's have a discussion. We have two polymorphisms, and both make code reusable. That doesn't mean they are interchangeable. We must use the one that's suited to our circumstances. With your table, consider these five scenarios that could benefit from polymorphic abstraction:

  1. You are developing a graphical user interface library. Clients will add their widgets to your layout manager. Do you employ subtype polymorphism or parametric polymorphism?
  2. You are developing a sort routine. Do you employ subtype polymorphism or parametric polymorphism?
  3. You are developing a wrapper abstraction that adds a color to the wrapped value. That way you can have yellow or orange strings, black or red numbers, red or yellow alerts, and so on. Do you employ subtype polymorphism or parametric polymorphism?
  4. You are developing a logger that dumps exception messages out to a file. Do you employ subtype polymorphism or parametric polymorphism?
  5. You are developing a method that computes the average of a very large set of numeric values. Do you you employ subtype polymorphism or parametric polymorphism?

Document

Suppose we're building the next markup language: My Average Markup Language (MAML). Its users will write up documents in a lightweight syntax, which will be translated into HTML by an interpreter. Just as we did on milestone 1, we want to create a set of model classes that represent the different elements that may appear in a tree representation of a document. These are some of those elements:

The elements are gathered together in a Document. When we call generate on a Document, it turns all the elements into HTML and concatenates the tags together. Document operates at a high level; it doesn't ask about the type of its elements. Yet somehow it must be able to call methods on these elements. How is this possible? Each object maintains a directory of its methods. At runtime, we call an object's method by looking up its entry in this directory. This is dynamic dispatch. In Rust, dynamic dispatch is achieved through traits. We define this trait that abstracts all elements:

Rust
trait Element {
  fn to_html(&self) -> String;
}

The method receives an immutable reference to the element. It returns a heap-allocated string. Once we have this trait, we proceed to implement the Document type. It will have for state a vector of elements. If Rust were more like Java, we'd write this:

Rust
struct Document {
  elements: Vec<Element>,
}

But this fails. Element is not a type. It's a trait, which describes a set of methods that a type can implement. Rust demands that we have a Vec of some actual type, but we are purposefully trying to be vague about the type. In this case, we can turn the trait into a vanilla type by explicitly tagging the trait name with dyn:

Rust
struct Document {
  elements: Vec<dyn Element>,
}

The dyn qualifier announces that a pointer to the method directory—the virtual table—is tacked onto the value. We like as much stuff to get figured out at compile time as possible. But when Document has a collection of arbitrary elements, the compiler's not going to know exactly what to_html methods to call. That will have to get figured out at runtime. Rust makes it clear that a dynamic lookup is going to happen by demanding the dyn qualifier.

Alas, dyn Element also fails to compile. The compiler doesn't know its size and can't allocate space for the vector. The fix is to make a vector of pointers instead of a vector of the elements themselves. That's what Rust's Box type is for. This definition compiles:

Rust
struct Document {
  elements: Vec<Box<dyn Element>>,
}

That covers the data definition. Now let's think about behaviors. When we make a new type, it's nice to define a constructor function, like this:

Rust
impl Document {
  fn new() -> Self {
    Document {
      elements: Vec::new(),
    }
  }
}

We also add a method for appending an element and another for generating the whole HTML page:

Rust
impl Document {
  // ...

  fn add(&mut self, element: Box<dyn Element>) {
    self.elements.push(element);
  }

  fn generate(&self) -> String {
    let mut html = String::new();
    html.push_str("<html>\n<body>\n");
    for element in self.elements.iter() {
      html.push_str(&element.to_html());
    }
    html.push_str("\n</body>\n</html>\n");
    html
  }
}

These methods take advantage of subtype polymorphism. They will work with any type from the Element hierarchy. Document never has to ask an element what type it is in order to call to_html. The runtime will find the right method by dereferencing each element's vpointer.

The generate method is a little tricky. The push_str method expects a &str parameter, so we pass a reference to the element's generated HTML. The iterator call behind the for loop wants to own the memory, but that's not legal because self owns the memory and it's not mutable. So we iterate through a reference to the elements.

Let's add a type to the Element hierarchy to check that this works. Heading models a title node in a document. We implement it like this:

Rust
struct Heading {
  text: String,
}

impl Element for Heading {
  fn to_html(&self) -> String {
    format!("<h1>{}</h1>", self.text)
  }
}

Now we throw in some audience participation. We have the following elements that we want to make fall into the polymorphic hierarchy rooted at Element.

The Image element shows a picture. We define it like this:

Rust
struct Image {
  url: String,
}

impl Element for Image {
  fn to_html(&self) -> String {
    format!("<img src=\"{}\">", self.url)
  }
}

Now we can put it all together in a main:

Rust
fn main() {
  let doc = Document::new();
  doc.add(Box::new(Heading { text: "AI Arranged My Marriage".to_string() }));
  doc.add(Box::new(Image { url: "couple.jpg".to_string() }));
  println!("{}", doc.generate());
}

Our hierarchy only includes two types currently, but we could add many more. Document will never need to change as we add new types. They will all fit under the Element hierarchy.

Binary Search Tree

Subtype polymorphism was the de facto polymorphism of the golden age of object-oriented programming. But parametric polymorphism has advantages over subtype polymorphism. The compiler knows the types and can therefore make decisions earlier—it can achieve early binding. Let's explore parametric polymorphism in Rust by implementing a binary search tree.

There are several ways we define the type. In Java, we'd create a Node class with three pieces of state: the node payload and two references to child nodes. We could do something similar in Rust, but we have other options. This enum makes every node either a leaf node or an internal node:

Rust
enum Tree<T> {
  Empty,
  Node {
    value: T,
    left: Box<Tree<T>>,
    right: Box<Tree<T>>
  },
}

Using an enum with an empty variant is a lot like using using the None variant of Option to mark the leaf nodes. We need Box because otherwise we have a recursive type whose size can't be determined.

We'll implement just two behaviors on Tree today and save some others for lab. The first is contains. This method should only be available for trees that hold comparable values, so we add a type constraint to T and then examine how the target value compares the node's value as we recurse down to a leaf:

Rust
impl<T: PartialOrd> Tree<T> {
  fn contains(&self, target: T) -> bool {
    match self {
      Tree::Empty => false,
      Tree::Node { value, left, right } => {
        return if target < *value {
          left.contains(target)
        } else if target > *value {
          right.contains(target)
        } else {
          true
        };
      }, 
    }
  }
}

This method can't be tested unless we have a way to populate the tree with elements. The add method walks the tree until it hits the appropriate leaf node, at which point it overwrites the leaf with a non-empty node:

Rust
impl<T: PartialOrd> Tree<T> {
  fn add(&mut self, new_value: T) {
    match self {
      Tree::Empty => {
        *self = Tree::Node {
          value: new_value,
          left: Box::new(Tree::Empty),
          right: Box::new(Tree::Empty),
        };
      },
      Tree::Node { value, left, right } => {
        if new_value < *value {
          left.add(new_value);
        } else if new_value > *value {
          right.add(new_value);
        }
      }
    }
  }
}

Here we make a Tree of u32 and populate it:

Rust
fn main() {
  let mut tree = Tree::<u32>::Empty;
  tree.add(3);
  tree.add(4);
  tree.add(5);
  println!("tree.contains(3): {:?}", tree.contains(3));
  println!("tree.contains(4): {:?}", tree.contains(4));
  println!("tree.contains(5): {:?}", tree.contains(5));
  println!("tree.contains(6): {:?}", tree.contains(6));
}

There are no vtables or vpointers when this code gets compiled. Instead, the compiler sees what types I plug in to the Tree, and it makes a separate compilation for each different use. Each is tailored to the type. The Tree for i32 will have machine code instructions that work with i32. The Tree for String will have machine code instructions that work with String. Because the types are known by the compiler, the execution will be faster than with subtype polymorphism. But parametric polymorphism only serves a single type, not a whole hierarchy.

TODO

Here's your list of things to do before we meet next:

Complete the middle quiz as desired.
Last chapter is canceled. Freedom.
One ready date remains. It is your final extension.
Canvas isn't expressive enough to calculate your final grade. However, you can calculate it just fine with some code or a spreadsheet. Weights are listed in the syllabus. Take the greater of your front and middle quiz scores. If you build a spreadsheet, consider sharing a blank version on Discord.

See you next time.

Sincerely,

P.S. It's time for a haiku!

First it was NY Now you can ❤️ anything Because they're T-shirts
← Ad Hoc PolymorphismLab: Binary Search Tree →