Generators

Dear Computer

Chapter 4: Functions

Generators

When we call a function, we expect the following sequence of events to happen in the computer:

  1. the caller pauses its execution
  2. a new stack frame is allocated with enough memory for the function's parameters and local variables
  3. the function's body executes until it hits a return
  4. the stack frame is deallocated
  5. the caller resumes its execution

In this sense, a function call is a momentary and self-contained aside within some larger computation. As soon as the function returns, its memory is cleaned up, and we can't even tell it ran.

There's a special kind of function called a generator that behaves differently. As a generator executes, it yields a value back to the caller. But instead of getting wiped from memory, it merely pauses. When the caller calls it again, the execution unpauses exactly where it left off. It keeps yielding values and pausing until reaching its final return.

Generators are commonly used to abstract away iteration logic. Suppose we find ourselves frequently iterating over a two-dimensional grid, like this:

JavaScript
for (let y = 0; y < height; ++y) {
  for (let x = 0; x < width; ++x) {
    // process x and y
  }
}
for (let y = 0; y < height; ++y) {
  for (let x = 0; x < width; ++x) {
    // process x and y
  }
}

This isn't a lot of code, but if we have to repeatedly traverse the grid, it adds up. We'd like a way to express the iteration more abstractly. In object-oriented languages, we could write a special GridIterator class with a next method that gets called many times. The next method advances only a single iteration per call. Since each call to next must pick up where the previous call left off, the class must persist its state in instance variables. A class-based iterator quickly becomes complex and awkward to write. A generator function lets us write pausable loops with plain local variables that persist between yield calls.

A generator in JavaScript looks like a function, but it's got a * symbol in front of the function name in the header, and its body contains one or more yield statements. Here's the grid iteration generator:

JavaScript
function *iterator2(width, height) {
  for (let y = 0; y < height; y += 1) {
    for (let x = 0; x < width; x += 1) {
      yield [x, y];
    }
  }
}
function *iterator2(width, height) {
  for (let y = 0; y < height; y += 1) {
    for (let x = 0; x < width; x += 1) {
      yield [x, y];
    }
  }
}

The body of iterator2 is just the normal nested loops for traversing a grid; it's not like the inverted mess of GridIterator. The yield statement pauses the generator and hands control back to this code that consumes its values:

JavaScript
for (let [x, y] of iterator2(width, height)) {
  console.log(x, y);
}
for (let [x, y] of iterator2(width, height)) {
  console.log(x, y);
}

Ruby has generators too, but they aren't nearly as slick as JavaScript's. We create an instance of Enumerator and pass it a block that yields values to a special yielder object. This Ruby generator is equivalent to the JavaScript generator:

Ruby
def iterator2(width, height)
  Enumerator.new do |yielder|
    (0...height).each do |y|
      (0...width).each do |x|
        yielder << [x, y]
      end
    end
  end
end

iterator2(4, 3).each do |x, y|
  puts "#{x}, #{y}"
end
def iterator2(width, height)
  Enumerator.new do |yielder|
    (0...height).each do |y|
      (0...width).each do |x|
        yielder << [x, y]
      end
    end
  end
end

iterator2(4, 3).each do |x, y|
  puts "#{x}, #{y}"
end

Values are yielded back to the caller with the special yield operator <<.

Summary

Programmers decompose complex programs into functions. This decomposition has several benefits: each function is given a meaningful name, has a focused mission, is reusable, simplifies the calling context, and safely operates within its own scope. They can't be entirely separate from the main execution, however, as we need to send data to them and receive data from them. Data is sent to a function via parameters, which may be positional or named. They may be assigned default values. The number of a function's parameters is its arity, which may be fixed or variadic. Actual parameters may be passed as copies, aliases, or executables. Some languages allow multiple or overloaded definitions of a function—provided each definition has a unique signature. Parameters and return values are often plain data, but higher-order functions receive or return functions. Such functions allow algorithms to be written with procedural holes. Map, filter, and fold are three common and useful higher-order functions for iterating over collections. Decorators are higher-order functions that add extra behaviors to the functions they wrap. Generators are pausable functions that lazily yield values to the caller.

← Decorators