Higher-order Functions
We've seen higher-order functions in Ruby and Haskell. Rust has them too, including the essential map
, filter
, and fold
. These three methods are not methods of arrays or lists. They are invoked on iterators. This code turns a vector into an iterator and then maps over it to double each number:
let xs = vec![1, 2, 3, 4, 5];
let ys = xs.iter().map(|x| x * 2);
What's the type of ys
? Surely it's a vector. Nope. It's another iterator. The map
function returns an iterator rather than a vector so that we can chain higher-order function calls into a gauntlet of computation:
let xs = vec![1, 2, 3, 4, 5];
let ys = xs.iter()
.map(|x| x * 2)
.filter(|&x| x < 10)
.fold(0, |sum, x| sum + x);
Note that the closure passed to filter
accepts a reference parameter, but map
and fold
do not. The filter
function does not assume ownership of the elements, but the others may.
Chaining is one benefit of these higher-order functions producing iterators rather than collections. Another benefit is that values are produced only on demand. In Ruby, we might write this map-take chain:
[1, 2, 3, 4, 5].map { |x| 2 * x }.take(2)
This expression doubles all five elements but ends up taking only two of them. That's a waste of memory and CPU. The equivalent in Rust looks like this:
let xs = vec![1, 2, 3, 4, 5];
let ys = xs.iter().map(|x| x * 2).take(2);
The second line of code only makes iterators. It doesn't traverse the vector xs
at all or allocate any memory for a new vector. The type of ys
is still an iterator. To pump values through this pipeline, we need to establish a demand. There are three common ways to do this: call next
on the final iterator to get values one at a time, walk through the iterator with a for loop, or call collect
to gather all the iterator's values together into a collection.
The collect
method is generic and can be used to build many different collections. We must tell it what kind of collection to build with either a turbofish or through a type determined from the surrounding context. Here a type declaration is used with collect
to produce a vector of doubled numbers:
let xs = vec![1, 2, 3, 4, 5];
let ys: Vec<u32> = xs.iter()
.map(|x| x * 2)
.take(2)
.collect();
The collect
call issues the demand to traverse the final iterator. That final iterator demands only two elements of its preceding iterator. That means the multiplication only happens twice.
Summary
Programs consume memory as they run, but memory is not in infinite supply. To avoid slowdowns and crashes, programs must also free memory. Some languages, like C, expect the programmer to explicitly manage memory through pointers. Manual memory management has two vulnerabilities: pointers might point at unowned memory and programmers might forget to free allocations. Other languages provide automatic garbage collection and do not expose pointers. In a tracing garbage collector, the allocation graph is periodically searched to discover which allocations may no longer be reached. The nondeterministic schedule of tracing is undesirable in some contexts. In a reference-counting system, an allocation is paired with an integer that tracks how many references to the allocation are in active use. The counts are inspected dynamically as references get assigned or go out of scope. When a count reaches 0, its allocation is released. If two allocations refer to each other, neither will be recognized as garbage. In a static ownership system, as we find in Rust, each allocation is allowed a single owner. When that single owner goes out of scope, the allocation is released. Because there's only a single owner, the compiler is able to schedule the release at compile time. Single ownership is a severe restriction that can make programming collections and higher-order functions difficult, and Rust offers several ways to ease the difficulty. Other components may borrow an allocation from its owner. For situations where there is no clear single owner, Rust offers reference-counted wrapper types.