Enumerations
Earlier we defined a type as a set of values and their operations. For many types, the set of possible values is implicit. There is, for example, no list of all legal 4.2 billion 4-byte integer values stored anywhere in a computer. However, sometimes we do want to define a new type by listing out its possible values. These are enumeration types or enums.
Enums provide a menu of choices for a value. A meal option might be VEGETARIAN, VEGAN, GLUTENFREE, or OMNIVOROUS. A player class might be ELF, DWARF, WIZARD, ORC, or BALROG. A water temperature might be either SCALDING or FREEZING. Programmers often respond to the different choices with switch statements. Ideally, options not on the menu are forbidden. No one should be able to request ORC for a meal.
Languages differ widely in their support of enums. Ruby and JavaScript don't have them. C and C++ treat them as a thin veneer over integers. Java implements enums atop classes, allowing programmers to add custom behaviors. Haskell and Rust elevate enums to a fundamental custom data type. Let's examine these various treatments in more detail.
C and C++
In C and C++, we define an enumeration for the four classical elements of the ancient world with this syntax:
enum element_t {
FIRE,
WATER,
EARTH,
AIR
};
The options of an enum are its variants. If we print the variants, we see that they look like integers:
printf("%d\n", FIRE); // prints 0
printf("%d\n", WATER); // prints 1
printf("%d\n", EARTH); // prints 2
printf("%d\n", AIR); // prints 3
Enums in C and C++ are in fact integers. Because they are integers, enums support integer operations like addition:
enum element_t element = FIRE + 1;
But what would the value of FIRE + 20
be? Certainly not one of the four elements. Nevertheless, the expression compiles without complaint and yields the value 20.
By default, the numbering starts at 0, and each successive variant is one more than its predecessor. This scheme may be manually overridden with explicit assignments. For example:
enum element_t {
FIRE = 1,
WATER = 2,
EARTH = 4,
AIR = 8,
};
Consider this is_cold
function that accepts an element_t
parameter:
bool is_cold(enum element_t element) {
return element == EARTH || element == WATER;
}
Aristotle classified earth and water as cold, and air and fire as hot. Because enums and integers are interchangeable, we can call this function with any integer. The call is_cold(156)
typechecks. Apparently 156 is not cold. But it's not hot either.
Since enum types in C and C++ can be subverted by integers not on the menu, they are not typesafe.
Java
Java takes a different approach to enums that is typesafe. The syntax is similar to C:
enum Element {
FIRE,
WATER,
EARTH,
AIR
}
However, instead of assigning each variant a unique integer, the Java compiler turns them into objects. The Element
enum effectively translates to this normal class:
final class Element extends Enum<Element> {
public static final Element FIRE = new Element();
public static final Element WATER = new Element();
public static final Element EARTH = new Element();
public static final Element AIR = new Element();
private Element() {}
}
The variants are really instances of the Element
class. Its constructor is marked private
so that no other Element
instances can be made. The class is sealed with final
so that no subclasses can be made. The four variants are the only instances that will ever exist. If a method expects an Element
, we won't be able to pass in a rogue element like AETHER
or METAL
. Java enums are therefore typesafe.
Java provides a handful of operations on the enum values, including toString
, clone
, ordinal
, and name
. But we can also add custom behaviors as regular methods:
enum Element {
FIRE,
WATER,
EARTH,
AIR;
public boolean isCold() {
return this == EARTH || this == WATER;
}
}
Normally we compare objects with the equals
method. Since there are only four instances of Element
total, they are uniquely identified by their lvalues. A shallow and fast comparison using ==
is sufficient.
Haskell
The data
command in Haskell defines an enum:
data Element = Fire | Water | Earth | Air
Unlike object-oriented languages, which organize related data and code into a single syntactic unit called a class, Haskell keeps the data and code separate. To add an operation to an enum, we define a standalone function that accepts a parameter of the enum type:
isCold :: Element -> Bool
isCold element = element == Earth || element == Water
This code fails when we try to run it because, by default, enums can't be compared with ==
. We could use a case
expression to avoid the comparison:
isCold :: Element -> Bool
isCold element =
case element of
Fire -> False
Water -> True
Earth -> True
Air -> False
But that's a little wordy. A less verbose option is to let the compiler define ==
for the enum by adding a deriving
clause to the data
definition. After adding this clause, the Element
type will have both an ==
function and a show
function:
data Element = Fire | Water | Earth | Air
deriving (Eq, Show)
Consider this enum that lists the three motion states of a vehicle:
data Gear = Forward | Reverse | Park
deriving (Eq, Show)
Suppose we need a function that runs on every frame of an animation and moves the vehicle according to its current state. We could write this case
expression:
tick :: Gear -> Int -> Int
tick gear position =
case gear of
Forward -> position + 1
Reverse -> position - 1
Park -> position
But enums also support pattern matching. This definition is syntactically equivalent and breaks the logic up into subdefinitions:
tick :: Gear -> Int -> Int
tick Forward position = position + 1
tick Reverse position = position - 1
tick Park position = position
Java's enum
classes and Haskell's data
definitions implement the OR operation of a type algebra. An enum value can be this variant or that variant or that other variant. Both languages also support the AND operation, which we'll see next.