Algebraic Data Types

Dear Computer

Chapter 9: Types Revisited

Algebraic Data Types

We include Haskell as one of the three lenses through which we are are exploring programming languages because it reveals these three truths that we might not learn if we never leave the comfort of more mainstream languages:

We have discussed the first two ideas in recent chapters, and now we turn our attention to Haskell's type system.

Java's type system is an interesting counterpoint to Haskell's. From its earliest days, Java distinguished between primitives and objects. Generics were not available in its earliest versions, so data structures and algorithms were made polymorphic by writing them in terms of Object or some other supertype. Clients of polymorphic classes and methods would get back supertype references, which had to be explicitly downcast to their real type. Primitives weren't objects, so they couldn't be stored in polymorphic collections.

New type-related features were bolted on as Java matured. Java 5, which was released in 2004, brought generics and autoboxing of primitives into instances of wrapper classes like Integer and Double, fixing the casting and primitive issues of collections. Enums were also introduced. Java 8 brought lambdas, which provided short syntax for defining anonymous implementations of interfaces. Java 10 brought type inference. Java 14 brought record types and pattern matching. The result of these punctuated changes is a type system that has inconsistencies and hidden costs. Java programmers are regularly surprised to discover that they can't make an array of a generic type, that data intensive applications run slowly because of autoboxing, and that lambdas incur runtime costs that anonymous classes do not.

Haskell's type system has also evolved over time, but it has managed to remain relatively coherent. With just the data command, Haskell developers define enums, records, tuples, unions, and recursive collections. Types may have generic parameters. The compiler infers types and performs static typechecking. With just the class command, Haskell developers define interfaces that many types can implement.

The type system in Haskell has held steady because it was designed from the outset to support algebraic data types. An algebra is a set of rules that dictate how symbols may be combined through operations. You are familiar with the algebra of arithmetic, which describes how variables and numbers may be combined via arithmetic operators. The number of rules in the arithmetic algebra is small, but through them you can define a wide variety of mathematical behavior.

A type algebra describes how data types may be combined with other data types via type operations. The two most common operations are known by various names. Let's call them AND and OR. Suppose we have types t1 and t2. In a language that supports algebraic data types, we can make a new type that is t1 AND t2. Values of this type have both t1 and t2 fields. We can also make t1 OR t2. Values of this type hold either a t1 or a t2, but not both at the same time. The AND operation produces tuples and records. The OR operation produces unions and enums. We've seen these compound types in other languages. Haskell's take on algebraic data types is unique in that it provides a single syntactic form for expressing both operations. We can define a wide variety of types in very little code using Haskell's type algebra.

In this chapter we explore the algebraic data types of Haskell and other languages. By its end, you'll be able to answer the following questions:

This chapter also closes out our discussion of Haskell. Even if you don't continue to write in Haskell, you will see many of its ideas borrowed by other languages, including Rust.

Enumerations →