Parametric Polymorphism in C++

Dear Computer

Chapter 12: Polymorphism Revisited

Parametric Polymorphism in C++

Generics in C++ appeared in 1991, thirteen years earlier than in Java. They are usually called templates rather than generics, but the meaning is the same. Java's and C++'s syntaxes are similar, but C++ does not implement templates with erasure.

Formal Type Parameters

The formal type parameters of a class are declared in a template clause that precedes the class name. Apart from that, this ArrayList class looks a lot like the Java implementation:

C++
template<typename T>
class ArrayList {
  public:
    ArrayList() {
      size = 0;
      items = new T[10]; 
    }

    T get(int i) {
      assert_index(i);
      return items[i];
    }

    void set(int i, T item) {
      assert_index(i);
      items[i] = item;
    }

    // ...

  private:
    T *items;
    int size;
};

One notable difference in this code is that C++ allows us to make an array whose elements are of a generic type. We don't have to resort to an array of Object or some other supertype.

As in Java, individual methods may also be generic, like this flip method:

C++
template<typename T, typename U>
Pair<U, T> flip(const Pair<T, U>& p) {
  return Pair<U, T>(p.second, p.first); 
}

Actual Type Parameters

The actual type parameters are passed in a similar way to Java. The real types are attached to a generic class name using angle brackets:

C++
ArrayList<string> names;
ArrayList<int> xs = {1, 2, 3};
auto files = ArrayList<File>();

We may explicitly pass actual types to generic methods or let the compiler infer them, as in these two identical calls to flip:

C++
Pair<int, char> yx = flip<char, int>(xy);
Pair<int, char> yx = flip(xy);

Compared to Java, the C++ parser has to examine more tokens to avoid interpreting < as a relational operator.

Implementation

Though the syntaxes are similar, C++ implements generics differently. When the compiler sees a template being used, it creates a brand new version of the template code with the actual type parameters substituted in for the formals in a process called template instantiation. Behind the scenes, the three instances of ArrayList from the example above really become instances of three separate classes:

C++
ArrayListString names;
ArrayListInt xs;
ArrayListFile files;

The name template is an apt description of this behavior. A template class is not really a class but a pattern that gets instantiated into a class when the compiler demands it.

Template instantiation has its own set of costs and benefits. In Java, erasure yields a single version of the compiled generic code. In C++, template instantiation yields many different versions of the generic code, one for each unique set of actual type parameters. This makes for bloated executables. In Java, the generic types are replaced with a supertype, which means that information about the actual types is lost in the compilation process. Since the types aren't known, we can't instantiate new instances or arrays of the generic type. In C++, the actual types are burned into the template instantiation. We are therefore free to write new T(...) or new T[...]. This also means we can use primitives as actual type parameters in C++. They don't have to be subtypes of some supertype class like they do under erasure.

The compiler triggers a new template instantation only when it discovers a use that it hasn't encountered before. That means the template must ship as uncompiled source code to all potential clients. All the templates in the C++ standard library are defined in header files, which are the only parts of the library that haven't been compiled down to machine code. If we wish to share our own template, we will have to publish its source code.

Because templates are compiled on the fly, compiling a C++ program generally takes longer than compiling an equivalent Java program. Template error messages also have a reputation for being long and inscrutable.

Static Value Parameters

Template parameters aren't restricted to types in C++. They can also be values. We can, for example, make a template for an n-dimensional point:

C++
template<int n>
class Point {
  private:
    int coordinates[n];
};

When working in two dimensions, we'll use Point<2>. When in three, Point<3>. When in four, Point<4>.

We wouldn't have to use templates to make such an abstraction. The constructor for this similar but non-generic Point class accepts the number of dimensions as a parameter:

C++
class Point {
  public:
    Point(int n): n(n) {
      coordinates = new int[n];
    }

  private:
    int n;
    int *coordinates;
};

The difference is that in the case of the template the compiler knows the value of n statically and can therefore perform certain operations earlier. For example, the compiler knows the size of the array and can allocate space for it on the stack. If the class contains a loop that iterates up to n, the compiler might unroll the loop into a flat sequence of statements to improve performance. If n isn't known until runtime, then the array will have to be dynamically allocated on the heap, and the loop will have to check its condition to know when it's done iterating.

← Parametric Polymorphism in JavaParametric Polymorphism in Rust →