Parametric Polymorphism in Java

Dear Computer

Chapter 12: Polymorphism Revisited

Parametric Polymorphism in Java

Java made its debut in 1996. Not until 2004 did generics appear in the language. The eight years between was bleak for Java developers who wanted to make code reusable. Library writers' only option to make an abstraction serve many types was to use subtype polymorphism—they targeted a supertype that could serve all clients. In the case of ArrayList, which assumes nothing about its elements, that supertype was Object. The state of ArrayList included an array of Object, and it was likely implemented like this:

Java
class ArrayList {
  private Object[] items;
  private int size;

  public ArrayList() {
    items = new Object[10];
  }

  public Object get(int i) {
    assertIndex(i);
    return items[i];
  }

  public void set(int i, Object item) {
    assertIndex(i);
    items[i] = item;
  }

  // ...
}
class ArrayList {
  private Object[] items;
  private int size;

  public ArrayList() {
    items = new Object[10];
  }

  public Object get(int i) {
    assertIndex(i);
    return items[i];
  }

  public void set(int i, Object item) {
    assertIndex(i);
    items[i] = item;
  }

  // ...
}

The interface of ArrayList is expressed entirely in terms of Object. Take a moment to imagine what it was like for the poor clients of this class. If they wanted to pull out a String from index 0 and print its length, they had to downcast it:

JavaScript
// int n = names.get(0).length(); <- doesn't typecheck
String name = (String) names.get(0);
int n = name.length();
// int n = names.get(0).length(); <- doesn't typecheck
String name = (String) names.get(0);
int n = name.length();

The cast is necessary because Object doesn't have a length method.

Furthermore, because the class targeted the greatest common denominator Object, programmers might accidentally put items of different types in the list, like so:

Java
ArrayList numbers = new ArrayList();
list.add(Integer.valueOf(12));
list.add("13");
ArrayList numbers = new ArrayList();
list.add(Integer.valueOf(12));
list.add("13");

The dependence on Object also effectively banned primitives. Since int, double, boolean, and so on are not subtypes of Object, they could not be added to an ArrayList.

Both the excessive casting and the accidental heterogeneity were fixed in Java 5 when it added support for parametric polymorphism. Programmers no longer made a plain ArrayList but rather an ArrayList<String> or an ArrayList<Double>. The ArrayList class gained a type parameter or generic parameter.

A type parameter is like a function parameter in that it represents a hole in an abstraction that is filled by the client. A regular function parameter is filled by a value, and a type parameter is filled by a type. Function parameters have two views. Someone implementing a function sees a formal parameter: a named and possibly statically-typed placeholder for real data. Someone calling the function sends in real data as an actual parameter. Type parameters have these two views as well. Someone implementing a generic abstraction announces its formal type parameters with generic identifiers like T or E. Someone making use of a generic abstraction provides actual type parameters like Integer or String.

Formal Type Parameters

Both classes and individual methods may be made generic. A class or interface announces its formal type parameters in its header. Here ArrayList announces that it expects the client to supply exactly one type, which it names T:

Java
class ArrayList<T> {
  // ...
}
class ArrayList<T> {
  // ...
}

There's nothing special about the name T. E is common and is short for element. K and V are common for key and value types. Because type parameters are only placeholders, the convention is to name them with a single uppercase letter. They don't merit a more descriptive name.

A method also announces its formal type parameters in its header, right between its modifiers and its return type. For example, this flip method inside the non-generic Utilities class swaps the order of a 2-tuple whose fields have generic types T and U:

Java
public Utilities {
  //            ↓ type declarations
  public static <T, U> Pair<U, T> flip(Pair<T, U> p) {
    //                 ↑ return type
    return new Pair<U, T>(p.second, p.first);
  }
}
public Utilities {
  //            ↓ type declarations
  public static <T, U> Pair<U, T> flip(Pair<T, U> p) {
    //                 ↑ return type
    return new Pair<U, T>(p.second, p.first);
  }
}

Compiler writers generally prefer to encounter a declaration of a programming construct before its use. Since the return type may reference the type parameters, the formal declarations are placed before the return type.

The type parameters are referenced within the body of a generic abstraction just as if they were real types. In this generic implementation of ArrayList, most mentions of Object are replaced with T:

Java
class ArrayList<T> {
  private T[] items;
  private int size;

  public ArrayList() {
    items = (T[]) new Object[10]; // can't be new T[10]
  }

  public T get(int i) {
    assertIndex(i);
    return items[i];
  }

  public void set(int i, T item) {
    assertIndex(i);
    items[i] = item;
  }

  // ...
}
class ArrayList<T> {
  private T[] items;
  private int size;

  public ArrayList() {
    items = (T[]) new Object[10]; // can't be new T[10]
  }

  public T get(int i) {
    assertIndex(i);
    return items[i];
  }

  public void set(int i, T item) {
    assertIndex(i);
    items[i] = item;
  }

  // ...
}

Only the array instantation still uses Object. We'll discuss why soon.

Actual Type Parameters

The client supplies real types that get dropped into a type parameter when the generic abstraction is used. Here three lists of different element types are instantiated:

Java
ArrayList<String> names = new ArrayList<String>();
ArrayList<Integer> xs = new ArrayList<Integer>();
ArrayList<File> files = new ArrayList<File>();
ArrayList<String> names = new ArrayList<String>();
ArrayList<Integer> xs = new ArrayList<Integer>();
ArrayList<File> files = new ArrayList<File>();

Sometimes the compiler has enough information to infer the actual type parameters. In these declarations, the type has been omitted from the constructor calls:

Java
ArrayList<String> names = new ArrayList<>();
ArrayList<Integer> xs = new ArrayList<>();
ArrayList<File> files = new ArrayList<>();
ArrayList<String> names = new ArrayList<>();
ArrayList<Integer> xs = new ArrayList<>();
ArrayList<File> files = new ArrayList<>();

The actual type parameters give the compiler the information it needs to perform typechecking. It knows that an ArrayList of Integer receives and produces Integer data, so explicit downcasts are no longer needed when fetching an element, and no one can sneak a String into the list.

Most calls to generic methods rely on inference. This call to flip supplies no explicit actual type parameters:

Java
Pair<Integer, String> yx = Utilities.flip(xy);
Pair<Integer, String> yx = Utilities.flip(xy);

Explicitly qualifying a method call with actual type parameters is different than other languages. The actuals must be placed between the . operator and the method name:

Java
Pair<Integer, String> yx = Utilities.<String, Integer>flip(xy);
Pair<Integer, String> yx = Utilities.<String, Integer>flip(xy);

Other languages often place the type parameters after the method name, as in Class.method<Type>. Parsing Class.<Type>method is easier since it's clear that the < is not being used as a relational operator if it appears after the member operator (.). Also, Java's placement matches the order found in the method definition.

Implementation

Generics give the impression that there are many versions of ArrayList, one for each different actual type parameter mentioned in a program. But this impression is false. When a Java compiler translates a generic class or method to bytecode, it compiles down to a single bytecode structure in which all mentions of the type parameter are erased in the bytecode. This process is erasure. Under erasure, there is no mention of the type parameters anywhere in the bytecode.

The Java developers didn't have to implement parametric polymorphism this way. Why did they choose erasure? The developers wanted a polymorphic abstraction to compile down to a single bytecode structure so that generic classes written in Java 5 and later could still be run by older versions of the Java Virtual Machine that knew nothing about generics. So, generics only live long enough for the compiler to perform typechecking, and then they are thrown out.

How can the generic code be polymorphic after erasure? The type parameters are erased to an overarching supertype like Object. The ArrayList<T> class above, for example, is compiled down to a single ArrayList of Object, just like the original, non-generic version. In a sense, the Java compiler replaces parametric polymorphism with subtype polymorphism at compile time.

Erasure is a clever means of adding support for generics that maintains compatibility and keeps the bytecode from exploding with many versions of an abstraction for each possible type. But it also has two big costs:

One of the recurring themes of programming language design and life in general is that meaningful decisions require us to choose amongst a set of tradeoffs. Erasure has advantages and disadvantages. The Java developers knew it wasn't perfect. A decade earlier, the developers of C++ also weighed the costs and benefits and chose a different implementation of generics.

← Overwhelmed By TypesParametric Polymorphism in C++ →