Algebraic Data Types

Dear Computer

Chapter 9: Types Revisited

Algebraic Data Types

We include Haskell as one of the three lenses through which we are are exploring programming languages because it reveals these three truths that we might not learn if we never leave the comfort of more mainstream languages:

We have discussed the first two ideas in recent chapters, and now we turn our attention to the third by examining Haskell's type system.

Java's type system offers a counterpoint to Haskell's. From its inception, Java distinguished between primitives and objects. Generics were not available in its earliest versions, so data structures and algorithms were made polymorphic by targeting Object or some other superclass. The array-based list class, for instance, maintained an array of Object. Primitives like int and double are not objects, so they were not served by superclass polymorphism. Clients of polymorphic classes and methods would get back supertype references, which they had to explicitly downcast to their real type.

New type features were bolted on as Java matured. Java 5, which was released in 2004, brought generics and autoboxing of primitives into instances of wrapper classes like Integer and Double. Collections could now hold primitives, and clients no longer needed to cast back to the element type. Enums were also introduced. Java 8 brought lambdas, which provided short syntax for defining anonymous implementations of interfaces. Java 10 brought type inference. Java 14 brought record types and pattern matching. The result of these punctuated changes is a type system that has inconsistencies and hidden costs. Java programmers are regularly surprised to discover that they can't make an array of a generic type, that data intensive applications run slowly because of autoboxing, and that lambdas incur runtime costs that anonymous classes do not.

Haskell's type system has also evolved over time, but it has managed to remain small and coherent. With just the data command, Haskell developers define enums, records, tuples, unions, and recursive collections. Types may have generic parameters. The compiler infers types and performs static typechecking. With just the class command, Haskell developers define interfaces that many types can implement.

The type system in Haskell has held steady because it was designed from the outset to support algebraic data types. An algebra is a set of rules that dictate how symbols may be combined through operations. You are familiar with the algebra of arithmetic, which describes how variables and numbers may be combined via arithmetic operators. The number of rules in the arithmetic algebra is small, but through them we can define a wide variety of mathematical behavior.

A type algebra describes how data types may be combined with other data types via type operations. The two most common operations are known by various names. Let's call them AND and OR. Suppose we have types t1 and t2. In a language with algebraic data types, we can make a new type that is t1 AND t2. Values of this type have both t1 and t2 fields. We can also make t1 OR t2. Values of this type hold either a t1 or a t2, but not both at the same time. The AND operation produces tuples, records, and classes. The OR operation produces unions and enums. We've seen these compound types in other languages. Haskell's take on algebraic data types is unique in that it provides a single syntactic form for expressing both operations. We can define a wide variety of types with very little code using Haskell's type algebra.

In this chapter we explore the algebraic data types of Haskell and other languages. By its end, you'll be able to answer the following questions:

This chapter also closes out our discussion of Haskell. Even if you don't continue to write in Haskell, you will see many of its ideas borrowed by other languages, including Rust.

Enumerations →