Lecture: Two Polymorphisms
Dear students:
Programmers love reuse more than is perhaps healthy. Nothing feels greater than making a piece of code work for \(n\) types, even when \(n\) is only 2. Polymorphism is the name we give to the notion of having one chunk of code serve multiple types, and there are two very different mechanisms for achieving it. They are subtype polymorphism and parametric polymorphism. Both are present in Java, C++, and Rust. Today we examine them in Rust as we bumble our way through some data structures. First we'll look at modeling an HTML document using subtype polymorphism. Then we'll try building a binary search tree using parametric polymorphism.
Which Polymorphism?
First, though, let's have a discussion. We have two polymorphisms, and both make code reusable. That doesn't mean they are interchangeable. We must use the one that's suited to our circumstances. With your table, consider these five scenarios that could benefit from polymorphic abstraction:
- You are developing a graphical user interface library. Clients will add their widgets to your layout manager. Do you employ subtype polymorphism or parametric polymorphism?
- You are developing a sort routine. Do you employ subtype polymorphism or parametric polymorphism?
- You are developing a wrapper abstraction that adds a color to the wrapped value. That way you can have yellow or orange strings, black or red numbers, red or yellow alerts, and so on. Do you employ subtype polymorphism or parametric polymorphism?
- You are developing a logger that dumps exception messages out to a file. Do you employ subtype polymorphism or parametric polymorphism?
- You are developing a method that computes the average of a very large set of numeric values. Do you you employ subtype polymorphism or parametric polymorphism?
Document
Suppose we're building the next markup language: My Average Markup Language (MAML). Its users will write up documents in a lightweight syntax, which will be translated into HTML by an interpreter. Just as we did on milestone 1, we want to create a set of model classes that represent the different elements that may appear in a tree representation of a document. These are some of those elements:
- heading
- image
- link
The elements are gathered together in a Document
. When we call generate
on a Document
, it turns all the elements into HTML and concatenates the tags together. Document
operates at a high level; it doesn't ask about the type of its elements. Yet somehow it must be able to call methods on these elements. How is this possible? Each object maintains a directory of its methods. At runtime, we call an object's method by looking up its entry in this directory. This is dynamic dispatch. In Rust, dynamic dispatch is achieved through traits. We define this trait that abstracts all elements:
trait Element {
fn to_html(&self) -> String;
}
The method receives an immutable reference to the element. It returns a heap-allocated string. Once we have this trait, we proceed to implement the Document
type. It will have for state a vector of elements. If Rust were more like Java, we'd write this:
struct Document {
elements: Vec<Element>,
}
But this fails. Element
is not a type. It's a trait, which describes a set of methods that a type can implement. Rust demands that we have a Vec
of some actual type, but we are purposefully trying to be vague about the type. In this case, we can turn the trait into a vanilla type by explicitly tagging the trait name with dyn
:
struct Document {
elements: Vec<dyn Element>,
}
The dyn
qualifier announces that a pointer to the method directory—the virtual table—is tacked onto the value. We like as much stuff to get figured out at compile time as possible. But when Document
has a collection of arbitrary elements, the compiler's not going to know exactly what to_html
methods to call. That will have to get figured out at runtime. Rust makes it clear that a dynamic lookup is going to happen by demanding the dyn
qualifier.
Alas, dyn Element
also fails to compile. The compiler doesn't know its size and can't allocate space for the vector. The fix is to make a vector of pointers instead of a vector of the elements themselves. That's what Rust's Box
type is for. This definition compiles:
struct Document {
elements: Vec<Box<dyn Element>>,
}
That covers the data definition. Now let's think about behaviors. When we make a new type, it's nice to define a constructor function, like this:
impl Document {
fn new() -> Self {
Document {
elements: Vec::new(),
}
}
}
We also add a method for appending an element and another for generating the whole HTML page:
impl Document {
// ...
fn add(&mut self, element: Box<dyn Element>) {
self.elements.push(element);
}
fn generate(&self) -> String {
let mut html = String::new();
html.push_str("<html>\n<body>\n");
for element in self.elements.iter() {
html.push_str(&element.to_html());
}
html.push_str("\n</body>\n</html>\n");
html
}
}
These methods take advantage of subtype polymorphism. They will work with any type from the Element
hierarchy. Document
never has to ask an element what type it is in order to call to_html
. The runtime will find the right method by dereferencing each element's vpointer.
The generate
method is a little tricky. The push_str
method expects a &str
parameter, so we pass a reference to the element's generated HTML. The iterator call behind the for loop wants to own the memory, but that's not legal because self
owns the memory and it's not mutable. So we iterate through a reference to the elements.
Let's add a type to the Element
hierarchy to check that this works. Heading
models a title node in a document. We implement it like this:
struct Heading {
text: String,
}
impl Element for Heading {
fn to_html(&self) -> String {
format!("<h1>{}</h1>", self.text)
}
}
Now we throw in some audience participation. We have the following elements that we want to make fall into the polymorphic hierarchy rooted at Element
.
The Image
element shows a picture. We define it like this:
struct Image {
url: String,
}
impl Element for Image {
fn to_html(&self) -> String {
format!("<img src=\"{}\">", self.url)
}
}
Now we can put it all together in a main
:
fn main() {
let doc = Document::new();
doc.add(Box::new(Heading { text: "AI Arranged My Marriage".to_string() }));
doc.add(Box::new(Image { url: "couple.jpg".to_string() }));
println!("{}", doc.generate());
}
Our hierarchy only includes two types currently, but we could add many more. Document
will never need to change as we add new types. They will all fit under the Element
hierarchy.
Binary Search Tree
Subtype polymorphism was the de facto polymorphism of the golden age of object-oriented programming. But parametric polymorphism has advantages over subtype polymorphism. The compiler knows the types and can therefore make decisions earlier—it can achieve early binding. Let's explore parametric polymorphism in Rust by implementing a binary search tree.
There are several ways we define the type. In Java, we'd create a Node
class with three pieces of state: the node payload and two references to child nodes. We could do something similar in Rust, but we have other options. This enum
makes every node either a leaf node or an internal node:
enum Tree<T> {
Empty,
Node {
value: T,
left: Box<Tree<T>>,
right: Box<Tree<T>>
},
}
Using an enum with an empty variant is a lot like using using the None
variant of Option
to mark the leaf nodes. We need Box
because otherwise we have a recursive type whose size can't be determined.
We'll implement just two behaviors on Tree
today and save some others for lab. The first is contains
. This method should only be available for trees that hold comparable values, so we add a type constraint to T
and then examine how the target value compares the node's value as we recurse down to a leaf:
impl<T: PartialOrd> Tree<T> {
fn contains(&self, target: T) -> bool {
match self {
Tree::Empty => false,
Tree::Node { value, left, right } => {
return if target < *value {
left.contains(target)
} else if target > *value {
right.contains(target)
} else {
true
};
},
}
}
}
This method can't be tested unless we have a way to populate the tree with elements. The add
method walks the tree until it hits the appropriate leaf node, at which point it overwrites the leaf with a non-empty node:
impl<T: PartialOrd> Tree<T> {
fn add(&mut self, new_value: T) {
match self {
Tree::Empty => {
*self = Tree::Node {
value: new_value,
left: Box::new(Tree::Empty),
right: Box::new(Tree::Empty),
};
},
Tree::Node { value, left, right } => {
if new_value < *value {
left.add(new_value);
} else if new_value > *value {
right.add(new_value);
}
}
}
}
}
Here we make a Tree
of u32
and populate it:
fn main() {
let mut tree = Tree::<u32>::Empty;
tree.add(3);
tree.add(4);
tree.add(5);
println!("tree.contains(3): {:?}", tree.contains(3));
println!("tree.contains(4): {:?}", tree.contains(4));
println!("tree.contains(5): {:?}", tree.contains(5));
println!("tree.contains(6): {:?}", tree.contains(6));
}
There are no vtables or vpointers when this code gets compiled. Instead, the compiler sees what types I plug in to the Tree
, and it makes a separate compilation for each different use. Each is tailored to the type. The Tree
for i32
will have machine code instructions that work with i32
. The Tree
for String
will have machine code instructions that work with String
. Because the types are known by the compiler, the execution will be faster than with subtype polymorphism. But parametric polymorphism only serves a single type, not a whole hierarchy.
TODO
Here's your list of things to do before we meet next:
See you next time.
Sincerely,
P.S. It's time for a haiku!
T
-shirts