Parametric Polymorphism in Java
Java made its debut in 1996. Not until 2004 did generics appear in the language. The eight years between was bleak for Java developers who wanted to make code reusable. Library writers' only option to make an abstraction serve many types was to use subtype polymorphism. They targeted a supertype that could serve all clients. In the case of ArrayList, which assumes nothing about its elements, that supertype was Object. The state of ArrayList included an array of Object, and it was likely implemented like this:
class ArrayList {
private Object[] items;
private int size;
public ArrayList() {
items = new Object[10];
}
public Object get(int i) {
assertIndex(i);
return items[i];
}
public void set(int i, Object item) {
assertIndex(i);
items[i] = item;
}
// ...
}
class ArrayList {
private Object[] items;
private int size;
public ArrayList() {
items = new Object[10];
}
public Object get(int i) {
assertIndex(i);
return items[i];
}
public void set(int i, Object item) {
assertIndex(i);
items[i] = item;
}
// ...
}The interface of ArrayList is expressed entirely in terms of Object. Take a moment to imagine what it was like for the poor clients of this class. If they wanted to pull out a String from index 0 and print its length, they had to downcast it:
// int n = names.get(0).length(); <- doesn't typecheck
String name = (String) names.get(0);
int n = name.length();
// int n = names.get(0).length(); <- doesn't typecheck String name = (String) names.get(0); int n = name.length();
The cast is necessary because Object doesn't have a length method.
Furthermore, because the class targeted the greatest common denominator Object, programmers might accidentally have put items of different types in the list, like so:
ArrayList numbers = new ArrayList();
list.add(Integer.valueOf(12));
list.add("13");
ArrayList numbers = new ArrayList();
list.add(Integer.valueOf(12));
list.add("13");The dependence on Object also effectively banned primitives. Since int, double, boolean, and so on are not subtypes of Object, they could not be added to an ArrayList.
Both the excessive casting and the accidental heterogeneity were fixed in Java 5 when it added support for parametric polymorphism. Programmers no longer made a plain ArrayList but rather an ArrayList<String> or an ArrayList<Double>. The ArrayList class gained a type parameter or generic parameter.
A type parameter is like a function parameter in that it represents a hole in an abstraction that is filled by the client. A regular function parameter is filled by a value, and a type parameter is filled by a type. Function parameters have two views. Someone implementing a function sees a formal parameter: a named and possibly statically typed placeholder for real data. Someone calling the function sends in real data as an actual parameter. Type parameters have these two views as well. Someone implementing a generic abstraction announces its formal type parameters with generic identifiers like T or E in Java or a in Haskell. Someone using a generic abstraction provides actual type parameters like Integer or String.
Formal Type Parameters
Both classes and individual methods may be made generic. A class or interface announces its formal type parameters in its header. Here ArrayList announces that it expects the client to supply exactly one type, which it names T:
class ArrayList<T> {
// ...
}
class ArrayList<T> {
// ...
}There's nothing special about the name T. E is another popular choice and is short for element. K and V are common for key and value types. Because type parameters are only placeholders, the convention is to name them with a single uppercase letter. They don't merit a more descriptive name because they are generic.
A method also announces its formal type parameters in its header, right between its modifiers and its return type. For example, this flip method inside the non-generic Utilities class swaps the order of a 2-tuple whose fields have generic types T and U:
public Utilities {
// ↓ type parameter declarations
public static <T, U> Pair<U, T> flip(Pair<T, U> p) {
// ↑ return type
return new Pair<U, T>(p.second, p.first);
}
}
public Utilities {
// ↓ type parameter declarations
public static <T, U> Pair<U, T> flip(Pair<T, U> p) {
// ↑ return type
return new Pair<U, T>(p.second, p.first);
}
}Compiler writers generally prefer to encounter a declaration of a programming construct before its use. Since the return type may reference the type parameters, the formal declarations are placed before the return type.
The type parameters are referenced within the body of a generic abstraction just as if they were real types. In this generic implementation of ArrayList, most mentions of Object are replaced with T:
class ArrayList<T> {
private T[] items;
private int size;
public ArrayList() {
items = (T[]) new Object[10]; // can't be new T[10]
}
public T get(int i) {
assertIndex(i);
return items[i];
}
public void set(int i, T item) {
assertIndex(i);
items[i] = item;
}
// ...
}
class ArrayList<T> {
private T[] items;
private int size;
public ArrayList() {
items = (T[]) new Object[10]; // can't be new T[10]
}
public T get(int i) {
assertIndex(i);
return items[i];
}
public void set(int i, T item) {
assertIndex(i);
items[i] = item;
}
// ...
}Only the array instantation still uses Object. We'll discuss why soon.
Actual Type Parameters
The client supplies real types that get dropped into a type parameter when the generic abstraction is used. Here three lists of different element types are instantiated:
ArrayList<String> names = new ArrayList<String>();
ArrayList<Integer> xs = new ArrayList<Integer>();
ArrayList<File> files = new ArrayList<File>();
ArrayList<String> names = new ArrayList<String>(); ArrayList<Integer> xs = new ArrayList<Integer>(); ArrayList<File> files = new ArrayList<File>();
Sometimes the compiler has enough information to infer the actual type parameters. In these declarations, the type has been omitted from the constructor calls:
ArrayList<String> names = new ArrayList<>();
ArrayList<Integer> xs = new ArrayList<>();
ArrayList<File> files = new ArrayList<>();
ArrayList<String> names = new ArrayList<>(); ArrayList<Integer> xs = new ArrayList<>(); ArrayList<File> files = new ArrayList<>();
The actual type parameters give the compiler the information it needs to perform typechecking. It knows that an ArrayList of Integer receives and produces Integer data, so explicit downcasts are no longer needed when fetching an element, and no one can sneak a String into the list.
Most calls to generic methods rely on inference. This call to flip supplies no explicit actual type parameters:
Pair<Integer, String> yx = Utilities.flip(xy);
Pair<Integer, String> yx = Utilities.flip(xy);
Explicitly qualifying a method call with actual type parameters is different than other languages. The actual type parameters must be placed between the . operator and the method name:
Pair<Integer, String> yx = Utilities.<String, Integer>flip(xy);
Pair<Integer, String> yx = Utilities.<String, Integer>flip(xy);
Other languages often place the type parameters after the method name, as in Class.method<Type>. Parsing Class.<Type>method is easier since it's clear that the < is not being used as a relational operator if it appears after the member operator (.). Also, Java's placement matches the order found in the method definition.
Implementation
Generics give the impression that there are many versions of ArrayList, one for each different actual type parameter mentioned in a program. But this impression is false. When a Java compiler translates a generic class or method to bytecode, it compiles down to a single bytecode structure in which all mentions of the type parameter are erased in the bytecode. This process is erasure. Under erasure, there is no mention of the type parameters anywhere in the compiled code.
Under erasure, the generic ArrayList gets compiled down to exactly how we would have it implemented it before generics were introduced. Its compiled form is written in terms of Object, not T. How can an abstraction be polymorphic if its generic types are erased? The type parameters are replaced with overarching supertypes like Object. The Java compiler effectively replaces parametric polymorphism with subtype polymorphism at compile time.
The Java developers didn't have to implement parametric polymorphism this way. Why did they choose erasure? The developers wanted a polymorphic abstraction to compile down to a singular and universal bytecode structure so that generic classes written in Java 5 and later could still be run by older versions of the Java Virtual Machine that knew nothing about generics. So, generics only live long enough for the compiler to perform typechecking, and then they are thrown out.
Erasure is a clever means of adding support for generics that maintains compatibility and keeps the bytecode from exploding with many versions of an abstraction for each possible type. But it also has two big costs:
-
Primitives cannot be used with generic types because primitives don't belong to the hierarchy rooted on the erasure type. We can make an
ArrayList<Integer>but not anArrayList<int>. Autoboxing has partially eased this problem. If the compiler senses us adding anintwhere anIntegeris expected, it will automatically wrap theintas anIntegerinstance. But the indirection of boxing degrades performance. -
Because the type has been erased, the bytecode does not have enough information to make new instances of the generic type or make an array whose elements are of the generic type. That's why we had to make an array of
Objectrather than an array ofTin theArrayListabove.
One of the recurring themes of programming language design and life in general is that meaningful decisions require us to choose amongst a set of tradeoffs. Erasure has advantages and disadvantages. The Java developers knew it wasn't perfect. A decade earlier, the developers of C++ also weighed the costs and benefits and chose a different implementation of generics.