Parametric Polymorphism in Java

Java made its debut in 1996. Not until 2004 did generics appear in the language. The eight years between was bleak for Java developers who wanted to make code reusable. Library writers' only option to make an abstraction serve many types was to use subtype polymorphism. They targeted a supertype that could serve all clients. In the case of ArrayList, which assumes nothing about its elements, that supertype was Object. The state of ArrayList included an array of Object, and it was likely implemented like this:

Java

class ArrayList {
  private Object[] items;
  private int size;

  public ArrayList() {
    items = new Object[10];
  }

  public Object get(int i) {
    assertIndex(i);
    return items[i];
  }

  public void set(int i, Object item) {
    assertIndex(i);
    items[i] = item;
  }

  // ...
}

class ArrayList {
  private Object[] items;
  private int size;

  public ArrayList() {
    items = new Object[10];
  }

  public Object get(int i) {
    assertIndex(i);
    return items[i];
  }

  public void set(int i, Object item) {
    assertIndex(i);
    items[i] = item;
  }

  // ...
}

The interface of ArrayList is expressed entirely in terms of Object. Take a moment to imagine what it was like for the poor clients of this class. If they wanted to pull out a String from index 0 and print its length, they had to downcast it:

JavaScript

// int n = names.get(0).length(); <- doesn't typecheck
String name = (String) names.get(0);
int n = name.length();

// int n = names.get(0).length(); <- doesn't typecheck
String name = (String) names.get(0);
int n = name.length();

The cast is necessary because Object doesn't have a length method.

Furthermore, because the class targeted the greatest common denominator Object, programmers might accidentally have put items of different types in the list, like so:

Java

ArrayList numbers = new ArrayList();
list.add(Integer.valueOf(12));
list.add("13");

ArrayList numbers = new ArrayList();
list.add(Integer.valueOf(12));
list.add("13");

The dependence on Object also effectively banned primitives. Since int, double, boolean, and so on are not subtypes of Object, they could not be added to an ArrayList.

Both the excessive casting and the accidental heterogeneity were fixed in Java 5 when it added support for parametric polymorphism. Programmers no longer made a plain ArrayList but rather an ArrayList<String> or an ArrayList<Double>. The ArrayList class gained a type parameter or generic parameter.

A type parameter is like a function parameter in that it represents a hole in an abstraction that is filled by the client. A regular function parameter is filled by a value, and a type parameter is filled by a type. Function parameters have two views. Someone implementing a function sees a formal parameter: a named and possibly statically typed placeholder for real data. Someone calling the function sends in real data as an actual parameter. Type parameters have these two views as well. Someone implementing a generic abstraction announces its formal type parameters with generic identifiers like T or E in Java or a in Haskell. Someone using a generic abstraction provides actual type parameters like Integer or String.

Formal Type Parameters

Both classes and individual methods may be made generic. A class or interface announces its formal type parameters in its header. Here ArrayList announces that it expects the client to supply exactly one type, which it names T:

Java

class ArrayList<T> {
  // ...
}

class ArrayList<T> {
  // ...
}

There's nothing special about the name T. E is another popular choice and is short for element. K and V are common for key and value types. Because type parameters are only placeholders, the convention is to name them with a single uppercase letter. They don't merit a more descriptive name because they are generic.

A method also announces its formal type parameters in its header, right between its modifiers and its return type. For example, this flip method inside the non-generic Utilities class swaps the order of a 2-tuple whose fields have generic types T and U:

Java

public Utilities {
  //            ↓ type parameter declarations
  public static <T, U> Pair<U, T> flip(Pair<T, U> p) {
    //                 ↑ return type
    return new Pair<U, T>(p.second, p.first);
  }
}

public Utilities {
  //            ↓ type parameter declarations
  public static <T, U> Pair<U, T> flip(Pair<T, U> p) {
    //                 ↑ return type
    return new Pair<U, T>(p.second, p.first);
  }
}

Compiler writers generally prefer to encounter a declaration of a programming construct before its use. Since the return type may reference the type parameters, the formal declarations are placed before the return type.

The type parameters are referenced within the body of a generic abstraction just as if they were real types. In this generic implementation of ArrayList, most mentions of Object are replaced with T:

Java

class ArrayList<T> {
  private T[] items;
  private int size;

  public ArrayList() {
    items = (T[]) new Object[10]; // can't be new T[10]
  }

  public T get(int i) {
    assertIndex(i);
    return items[i];
  }

  public void set(int i, T item) {
    assertIndex(i);
    items[i] = item;
  }

  // ...
}

class ArrayList<T> {
  private T[] items;
  private int size;

  public ArrayList() {
    items = (T[]) new Object[10]; // can't be new T[10]
  }

  public T get(int i) {
    assertIndex(i);
    return items[i];
  }

  public void set(int i, T item) {
    assertIndex(i);
    items[i] = item;
  }

  // ...
}

Only the array instantation still uses Object. We'll discuss why soon.

Actual Type Parameters

The client supplies real types that get dropped into a type parameter when the generic abstraction is used. Here three lists of different element types are instantiated:

Java

ArrayList<String> names = new ArrayList<String>();
ArrayList<Integer> xs = new ArrayList<Integer>();
ArrayList<File> files = new ArrayList<File>();

ArrayList<String> names = new ArrayList<String>();
ArrayList<Integer> xs = new ArrayList<Integer>();
ArrayList<File> files = new ArrayList<File>();

Sometimes the compiler has enough information to infer the actual type parameters. In these declarations, the type has been omitted from the constructor calls:

Java

ArrayList<String> names = new ArrayList<>();
ArrayList<Integer> xs = new ArrayList<>();
ArrayList<File> files = new ArrayList<>();

ArrayList<String> names = new ArrayList<>();
ArrayList<Integer> xs = new ArrayList<>();
ArrayList<File> files = new ArrayList<>();

The actual type parameters give the compiler the information it needs to perform typechecking. It knows that an ArrayList of Integer receives and produces Integer data, so explicit downcasts are no longer needed when fetching an element, and no one can sneak a String into the list.

Most calls to generic methods rely on inference. This call to flip supplies no explicit actual type parameters:

Java

Pair<Integer, String> yx = Utilities.flip(xy);

Pair<Integer, String> yx = Utilities.flip(xy);

Explicitly qualifying a method call with actual type parameters is different than other languages. The actual type parameters must be placed between the . operator and the method name:

Java

Pair<Integer, String> yx = Utilities.<String, Integer>flip(xy);

Pair<Integer, String> yx = Utilities.<String, Integer>flip(xy);

Other languages often place the type parameters after the method name, as in Class.method<Type>. Parsing Class.<Type>method is easier since it's clear that the < is not being used as a relational operator if it appears after the member operator (.). Also, Java's placement matches the order found in the method definition.

Implementation

Generics give the impression that there are many versions of ArrayList, one for each different actual type parameter mentioned in a program. But this impression is false. When a Java compiler translates a generic class or method to bytecode, it compiles down to a single bytecode structure in which all mentions of the type parameter are erased in the bytecode. This process is erasure. Under erasure, there is no mention of the type parameters anywhere in the compiled code.

Under erasure, the generic ArrayList gets compiled down to exactly how we would have it implemented it before generics were introduced. Its compiled form is written in terms of Object, not T. How can an abstraction be polymorphic if its generic types are erased? The type parameters are replaced with overarching supertypes like Object. The Java compiler effectively replaces parametric polymorphism with subtype polymorphism at compile time.

The Java developers didn't have to implement parametric polymorphism this way. Why did they choose erasure? The developers wanted a polymorphic abstraction to compile down to a singular and universal bytecode structure so that generic classes written in Java 5 and later could still be run by older versions of the Java Virtual Machine that knew nothing about generics. So, generics only live long enough for the compiler to perform typechecking, and then they are thrown out.

Erasure is a clever means of adding support for generics that maintains compatibility and keeps the bytecode from exploding with many versions of an abstraction for each possible type. But it also has two big costs:

Primitives cannot be used with generic types because primitives don't belong to the hierarchy rooted on the erasure type. We can make an ArrayList<Integer> but not an ArrayList<int>. Autoboxing has partially eased this problem. If the compiler senses us adding an int where an Integer is expected, it will automatically wrap the int as an Integer instance. But the indirection of boxing degrades performance.
Because the type has been erased, the bytecode does not have enough information to make new instances of the generic type or make an array whose elements are of the generic type. That's why we had to make an array of Object rather than an array of T in the ArrayList above.

One of the recurring themes of programming language design and life in general is that meaningful decisions require us to choose amongst a set of tradeoffs. Erasure has advantages and disadvantages. The Java developers knew it wasn't perfect. A decade earlier, the developers of C++ also weighed the costs and benefits and chose a different implementation of generics.

← Overwhelmed By Types Parametric Polymorphism in C++ →

Dear Computer

Parametric Polymorphism in Java

Formal Type Parameters

Actual Type Parameters

Implementation