Enumerations

Dear Computer

Chapter 9: Types Revisited

Enumerations

Earlier we defined a type as a set of values and their operations. For many types, the set of possible values is implicit. There is, for example, no list of all legal 4.2 billion 4-byte integer values stored anywhere in a computer. However, sometimes we do want to define a new type by listing out its possible values. These are enumeration types or enums.

Enums provide a menu of choices for a value. A meal option might be VEGETARIAN, VEGAN, GLUTENFREE, or OMNIVOROUS. A player class might be ELF, DWARF, WIZARD, ORC, or BALROG. A water temperature might be either SCALDING or FREEZING. Programmers often respond to the different choices with switch statements. Ideally, options not on the menu are forbidden. No one should be able to request ORC for a meal.

Languages differ widely in their support of enums. Ruby and JavaScript don't have them. C and C++ treat them as a thin veneer over integers. Java implements enums atop classes, allowing programmers to add custom behaviors. Haskell and Rust elevate enums to a fundamental custom data type. Let's examine these various treatments in more detail.

C and C++

In C and C++, we define an enumeration for the four classical elements of the ancient world with this syntax:

C++
enum element_t {
  FIRE,
  WATER,
  EARTH,
  AIR
};

The options of an enum are its variants. If we print the variants, we see that they look like integers:

C++
printf("%d\n", FIRE);   // prints 0
printf("%d\n", WATER);  // prints 1
printf("%d\n", EARTH);  // prints 2
printf("%d\n", AIR);    // prints 3

Enums in C and C++ are in fact integers. Because they are integers, enums support integer operations like addition:

C++
enum element_t element = FIRE + 1;

But what would the value of FIRE + 20 be? Certainly not one of the four elements. Nevertheless, the expression compiles without complaint and yields the value 20.

By default, the numbering starts at 0, and each successive variant is one more than its predecessor. This scheme may be manually overridden with explicit assignments. For example:

C++
enum element_t {
  FIRE = 1,
  WATER = 2,
  EARTH = 4,
  AIR = 8,
};

Consider this is_cold function that accepts an element_t parameter:

C++
bool is_cold(enum element_t element) {
  return element == EARTH || element == WATER;
}

Aristotle classified earth and water as cold, and air and fire as hot. Because enums and integers are interchangeable, we can call this function with any integer. The call is_cold(156) typechecks. Apparently 156 is not cold. But it's not hot either.

Since enum types in C and C++ can be subverted by integers not on the menu, they are not typesafe.

Java

Java takes a different approach to enums that is typesafe. The syntax is similar to C:

Java
enum Element {
  FIRE,
  WATER,
  EARTH,
  AIR
}

However, instead of assigning each variant a unique integer, the Java compiler turns them into objects. The Element enum effectively translates to this normal class:

Java
final class Element extends Enum<Element> {
  public static final Element FIRE = new Element();
  public static final Element WATER = new Element();
  public static final Element EARTH = new Element();
  public static final Element AIR = new Element();

  private Element() {}
}

The variants are really instances of the Element class. Its constructor is marked private so that no other Element instances can be made. The class is sealed with final so that no subclasses can be made. The four variants are the only instances that will ever exist. If a method expects an Element, we won't be able to pass in a rogue element like AETHER or METAL. Java enums are therefore typesafe.

Java provides a handful of operations on the enum values, including toString, clone, ordinal, and name. But we can also add custom behaviors as regular methods:

Java
enum Element {
  FIRE,
  WATER,
  EARTH,
  AIR;

  public boolean isCold() {
    return this == EARTH || this == WATER;
  }
}

Normally we compare objects with the equals method. Since there are only four instances of Element total, they are uniquely identified by their lvalues. A shallow and fast comparison using == is sufficient.

Haskell

The data command in Haskell defines an enum:

Haskell
data Element = Fire | Water | Earth | Air

Unlike object-oriented languages, which organize related data and code into a single syntactic unit called a class, Haskell keeps the data and code separate. To add an operation to an enum, we define a standalone function that accepts a parameter of the enum type:

Haskell
isCold :: Element -> Bool
isCold element = element == Earth || element == Water

This code fails when we try to run it because, by default, enums can't be compared with ==. We could use a case expression to avoid the comparison:

Haskell
isCold :: Element -> Bool
isCold element = 
  case element of
    Fire -> False
    Water -> True
    Earth -> True
    Air -> False

But that's a little wordy. A less verbose option is to let the compiler define == for the enum by adding a deriving clause to the data definition. After adding this clause, the Element type will have both an == function and a show function:

Haskell
data Element = Fire | Water | Earth | Air
  deriving (Eq, Show)

Consider this enum that lists the three motion states of a vehicle:

Haskell
data Gear = Forward | Reverse | Park
  deriving (Eq, Show)

Suppose we need a function that runs on every frame of an animation and moves the vehicle according to its current state. We could write this case expression:

Haskell
tick :: Gear -> Int -> Int
tick gear position =
  case gear of
    Forward -> position + 1
    Reverse -> position - 1
    Park -> position

But enums also support pattern matching. This definition is syntactically equivalent and breaks the logic up into subdefinitions:

Haskell
tick :: Gear -> Int -> Int
tick Forward position = position + 1
tick Reverse position = position - 1
tick Park position = position

Java's enum classes and Haskell's data definitions implement the OR operation of a type algebra. An enum value can be this variant or that variant or that other variant. Both languages also support the AND operation, which we'll see next.

← Algebraic Data TypesEnums with Data →