Box Type

Dear Computer

Chapter 11: Managing Memory

Box Type

Implementing our own collection in Rust is a great way to test our understanding of Rust's static ownership model. Collections by definition corral data away from the main program and into an abstraction. But the main program needs to use the data in the collection. Who then should own the data? Let's try writing a linked list and see how we might answer that question.

Data Structure

A linked list is a sequence of nodes that are chained together. Each node has two fields: the item at the given position in the list and a reference or pointer to the next node. We could write a Node type very naturally in at least two different ways in Rust: with an enum or with a struct. This enum implementation emulates Haskell lists with its two variants:

Rust
enum List<T> {
  Empty,
  Cons(T, List<T>),
}

The enum expects a generic type parameter. The list 1, 2, 3 would be created with the expression Cons(1, Cons(2, Cons(3, Empty))), which is equivalent to 1 : 2 : 3 : [] in Haskell.

A struct is also a reasonable choice for modeling a list. This Node types looks more like what we'd write in Java, probably as an inner class in a larger List abstraction:

Rust
struct Node<T> {
  item: T,
  next: Node<T>,
}

Let's stick with the struct implementation because it demonstrates some issues that we'll need to overcome as we write our own abstractions in Rust.

Recursive Definitions

Neither the struct nor the enum compiles. The compiler reports that the types have infinite size. How can that be? The struct has just two fields. The first field is item, and if T is u32, then it's just four bytes. The second field is next. It's however many bytes a Node is—and that's the problem. The size of Node depends on the size of Node. Both the struct and the enum are recursive types. The compiler rejects recursive types because they require infinite memory.

The fix is to make next not a Node itself, but a pointer or reference to a Node. These have known sizes. Here we try using a reference:

Rust
struct Node<T> {
  item: T,
  next: &Node<T>,
}

This code almost compiles. When we put a reference inside a data structure, we are stating that there's an owner of the data somewhere external to the data structure. Because the data structure is borrowing from this owner, it cannot outlive the owner, just as a bicycle does not keep going after its rider gets knocked off.

The compiler can't compare the lifetime of Node against the lifetime of the owner, so it rejects this definition. We could satisfy the compiler by adding a lifetime parameter, but these are too complex for our stage of learning. Really, we don't want our data structure to be dependent on external owners. The list should really own its nodes. References are not a good choice for fixing the infinite size issue.

The favored alternative is the Box type. A Box is a generic container for a pointer to owned and heap-allocated data. Pointers are predictably four or eight bytes wide, which fixes the infinite size issue. When the Box goes out of scope, the heap data to which it points is automatically freed. This Node struct wraps the next node up in a Box and compiles without any errors:

Rust
struct Node<T> {
  item: T,
  next: Box<Node<T>>,
}

Terminating a List

The definition compiles now, but no instance of Node can ever be made. To make a Node, we must have a Box instance to assign to next. But to make a Box, we must have a Node to put in the box. That's a circular dependency.

The fix is to allow next to be null. We've seen how Option wraps around a type to make it nullable. This final definition of Node wraps the next pointer in an Option:

Rust
struct Node<T> {
  item: T,
  next: Option<Box<Node<T>>>,
}

Thanks to Option, we can now assemble a list manually by making instances of Node and linking them together. The tail points to None, and the preceding nodes point to their successors wrapped in a Box wrapped in a Some:

Rust
let tail = Node { item: 3, next: None };
let middle = Node { item: 2, next: Some(Box::new(tail)) };
let head = Node { item: 1, next: Some(Box::new(middle)) };

Print

The list isn't very useful yet. Let's add a print method that recurses through and prints each element. Recall that methods are defined for enums and structs in a impl block separate from the type definition. Because Node has a generic parameter, the impl block needs a generic parameter too. Since we are printing each element, the generic type needs a bound that says its printable, just as a type in Haskell might be a member of the Show typeclass. A type that can be printed with println! implements the Display trait. This impl block defines print for only Node instances whose type T implements Display:

Rust
impl<T: Display> Node<T> {
  fn print(&self) {
    // ...
  }
}

Since the data type is recursive, we'll make this method recursive too. It prints the current element and then recurses on next. Since next is an Option, we need to unpack it. We could use match:

Rust
impl<T: Display> Node<T> {
  fn print(&self) {
    println!("{}", self.item);
    match self.next {
      Some(next) => next.print(),   // compile error       
      None => {},
    }
  }
}

This implementation is a little silly because there's nothing to do for None. An if-let statement targeting Some is less code, so we might try writing it like this:

Rust
impl<T: Display> Node<T> {
  fn print(&self) {
    println!("{}", self.item);
    if let Some(next) = self.next {  // compile error
      next.print();
    }
  }
}

See that destructuring assignment in the condition? That's a move operation. We are trying to transfer ownership from self.next to the local next variable. The receiver self is immutably borrowed here; it is declared as &self in the formal parameters of print. Values can't be moved out of immutable borrows. The fix is to also borrow self.next in the if statement by prefixing it with &:

Rust
impl<T: Display> Node<T> {
  fn print(&self) {
    println!("{}", self.item);
    if let Some(next) = &self.next {
      next.print();
    }
  }
}

We can now print the list with head.print().

Append

A linked list abstraction also needs a method to append a new item. Since appending requires no particular interface for T like printing did, this method should be defined in a separate impl block that doesn't bound T in any way. Additionally, self must be mutable because we are possibly modifying its next field. That leads to this impl block:

Rust
impl<T> Node<T> {
  fn append(&mut self, item: T) {
    // ...
  }
}

To add the new item, we must recurse through the nodes to get to the tail. The tail is identified by checking its next field for None. Since we need to act on both variants of Option, this match statement is appropriate:

Rust
impl<T> Node<T> {
  fn append(&mut self, item: T) {
    match self.next {
      None => ...,
      Some(next) => ...,    // compile error
    }
  }
}

However, just as with print, the destructuring of Some will trigger a move operation out of a reference. The fix is the same: borrow self.next. Since we are going to call append on next, we need a mutable reference:

Rust
impl<T> Node<T> {
  fn append(&mut self, item: T) {
    match &mut self.next {
      None => ...,
      Some(next) => ...,
    }
  }
}

When we reach the tail, which has None for its next value, we point next to a new node wrapped up in a Box and Some. If we're not at the tail, we recurse:

Rust
impl<T> Node<T> {
  fn append(&mut self, item: T) {
    match &mut self.next {
      None => {
        let node = Node {item, next: None};
        self.next = Some(Box::new(box));
      },
      Some(next) => next.add(item),
    }
  }
}

Now that we've got an append method, we use it to construct the list instead of manually sequencing the nodes:

Rust
let mut list = Node {item: 1, next: None};
list.append(2);
list.append(3);
list.print();

This isn't the greatest linked list abstraction in the world. It only supports appending and printing, and appending is a linear time operation. But it does demonstrate how Box and Option are used to model recursive types and how to deal with ownership concerns.

Box is useful when implementing data structures that have recursive definitions, but be careful not to overuse it. It is advertised as a manager of a pointer to heap data, but many types we use in Rust already manage heap data on their own, including String and Vec. We generally don't need to use Box except in the definition of our own recursive types.

← Static OwnershipRc Type →