Box Type
Implementing our own collection in Rust is a great way to test our understanding of Rust's static ownership model. Collections by definition corral data away from the main program and into an abstraction. But the main program needs to use the data in the collection. Who then should own the data? Let's try writing a linked list and see how we might answer that question.
Data Structure
A linked list is a sequence of nodes that are chained together. Each node has two fields: the item at the given position in the list and a reference or pointer to the next node. We could write a Node
type very naturally in at least two different ways in Rust: with an enum or with a struct. This enum implementation emulates Haskell lists with its two variants:
enum List<T> {
Empty,
Cons(T, List<T>),
}
The enum expects a generic type parameter. The list 1, 2, 3 would be created with the expression Cons(1, Cons(2, Cons(3, Empty)))
, which is equivalent to 1 : 2 : 3 : []
in Haskell.
A struct is also a reasonable choice for modeling a list. This Node
types looks more like what we'd write in Java, probably as an inner class in a larger List
abstraction:
struct Node<T> {
item: T,
next: Node<T>,
}
Let's stick with the struct implementation because it demonstrates some issues that we'll need to overcome as we write our own abstractions in Rust.
Recursive Definitions
Neither the struct nor the enum compiles. The compiler reports that the types have infinite size. How can that be? The struct has just two fields. The first field is item
, and if T
is u32
, then it's just four bytes. The second field is next
. It's however many bytes a Node
is—and that's the problem. The size of Node
depends on the size of Node
. Both the struct and the enum are recursive types. The compiler rejects recursive types because they require infinite memory.
The fix is to make next
not a Node
itself, but a pointer or reference to a Node
. These have known sizes. Here we try using a reference:
struct Node<T> {
item: T,
next: &Node<T>,
}
This code almost compiles. When we put a reference inside a data structure, we are stating that there's an owner of the data somewhere external to the data structure. Because the data structure is borrowing from this owner, it cannot outlive the owner, just as a bicycle does not keep going after its rider gets knocked off.
The compiler can't compare the lifetime of Node
against the lifetime of the owner, so it rejects this definition. We could satisfy the compiler by adding a lifetime parameter, but these are too complex for our stage of learning. Really, we don't want our data structure to be dependent on external owners. The list should really own its nodes. References are not a good choice for fixing the infinite size issue.
The favored alternative is the Box
type. A Box
is a generic container for a pointer to owned and heap-allocated data. Pointers are predictably four or eight bytes wide, which fixes the infinite size issue. When the Box
goes out of scope, the heap data to which it points is automatically freed. This Node
struct wraps the next node up in a Box
and compiles without any errors:
struct Node<T> {
item: T,
next: Box<Node<T>>,
}
Terminating a List
The definition compiles now, but no instance of Node
can ever be made. To make a Node
, we must have a Box
instance to assign to next
. But to make a Box
, we must have a Node
to put in the box. That's a circular dependency.
The fix is to allow next
to be null. We've seen how Option
wraps around a type to make it nullable. This final definition of Node
wraps the next
pointer in an Option
:
struct Node<T> {
item: T,
next: Option<Box<Node<T>>>,
}
Thanks to Option
, we can now assemble a list manually by making instances of Node
and linking them together. The tail points to None
, and the preceding nodes point to their successors wrapped in a Box
wrapped in a Some
:
let tail = Node { item: 3, next: None };
let middle = Node { item: 2, next: Some(Box::new(tail)) };
let head = Node { item: 1, next: Some(Box::new(middle)) };
The list isn't very useful yet. Let's add a print
method that recurses through and prints each element. Recall that methods are defined for enums and structs in a impl
block separate from the type definition. Because Node
has a generic parameter, the impl
block needs a generic parameter too. Since we are printing each element, the generic type needs a bound that says its printable, just as a type in Haskell might be a member of the Show
typeclass. A type that can be printed with println!
implements the Display
trait. This impl
block defines print
for only Node
instances whose type T
implements Display
:
impl<T: Display> Node<T> {
fn print(&self) {
// ...
}
}
Since the data type is recursive, we'll make this method recursive too. It prints the current element and then recurses on next
. Since next
is an Option
, we need to unpack it. We could use match
:
impl<T: Display> Node<T> {
fn print(&self) {
println!("{}", self.item);
match self.next {
Some(next) => next.print(), // compile error
None => {},
}
}
}
This implementation is a little silly because there's nothing to do for None
. An if-let
statement targeting Some
is less code, so we might try writing it like this:
impl<T: Display> Node<T> {
fn print(&self) {
println!("{}", self.item);
if let Some(next) = self.next { // compile error
next.print();
}
}
}
See that destructuring assignment in the condition? That's a move operation. We are trying to transfer ownership from self.next
to the local next
variable. The receiver self
is immutably borrowed here; it is declared as &self
in the formal parameters of print
. Values can't be moved out of immutable borrows. The fix is to also borrow self.next
in the if statement by prefixing it with &
:
impl<T: Display> Node<T> {
fn print(&self) {
println!("{}", self.item);
if let Some(next) = &self.next {
next.print();
}
}
}
We can now print the list with head.print()
.
Append
A linked list abstraction also needs a method to append a new item. Since appending requires no particular interface for T
like printing did, this method should be defined in a separate impl
block that doesn't bound T
in any way. Additionally, self
must be mutable because we are possibly modifying its next
field. That leads to this impl
block:
impl<T> Node<T> {
fn append(&mut self, item: T) {
// ...
}
}
To add the new item, we must recurse through the nodes to get to the tail. The tail is identified by checking its next
field for None
. Since we need to act on both variants of Option
, this match
statement is appropriate:
impl<T> Node<T> {
fn append(&mut self, item: T) {
match self.next {
None => ...,
Some(next) => ..., // compile error
}
}
}
However, just as with print
, the destructuring of Some
will trigger a move operation out of a reference. The fix is the same: borrow self.next
. Since we are going to call append
on next
, we need a mutable reference:
impl<T> Node<T> {
fn append(&mut self, item: T) {
match &mut self.next {
None => ...,
Some(next) => ...,
}
}
}
When we reach the tail, which has None
for its next
value, we point next
to a new node wrapped up in a Box
and Some
. If we're not at the tail, we recurse:
impl<T> Node<T> {
fn append(&mut self, item: T) {
match &mut self.next {
None => {
let node = Node {item, next: None};
self.next = Some(Box::new(box));
},
Some(next) => next.add(item),
}
}
}
Now that we've got an append
method, we use it to construct the list instead of manually sequencing the nodes:
let mut list = Node {item: 1, next: None};
list.append(2);
list.append(3);
list.print();
This isn't the greatest linked list abstraction in the world. It only supports appending and printing, and appending is a linear time operation. But it does demonstrate how Box
and Option
are used to model recursive types and how to deal with ownership concerns.
Box
is useful when implementing data structures that have recursive definitions, but be careful not to overuse it. It is advertised as a manager of a pointer to heap data, but many types we use in Rust already manage heap data on their own, including String
and Vec
. We generally don't need to use Box
except in the definition of our own recursive types.