Pattern Matching

Dear Computer

Chapter 8: Functions Revisited

Pattern Matching

Frequently used words and phrases in a spoken language tend to erode and shorten over time. The English goodbye, for example, is a shortened form of God be with ye. Many of the most commonly used verbs in Spanish are irregular, like ir (to go), ser (to be), decir (to say), and poder (to be able to). Their conjugations have deviated from the standard rules to accommodate the ergonomics of speech.

Erosion also happens in programming languages. For example, we frequently process structs, tuples, and arrays by explicitly reaching inside the composite data and accessing individual components with a member (. and ->) or subscript ([]) operation. These explicit accesses appear so frequently in code that some languages offer a shorter alternative called pattern matching.

In pattern matching, composite data is matched against a pattern that describes the shape of a structure. If the data matches the pattern, it is destructured into variables named in the pattern. This pattern, for example, matches against a 3-tuple that will be destructured into three variables:

Haskell
(latitude, longitude, elevation)
(latitude, longitude, elevation)

This pattern matches against a non-empty list that will be destructured into two variables:

Haskell
first : rest
first : rest

The syntax of these destructuring patterns is the same syntax used to assemble the structures in the first place. The only difference is that patterns appear in lvalue contexts. The variables latitude, longitude, elevation, first, and rest don't already exist. They are declared and assigned as a result of the pattern match.

Where

One of the lvalue contexts in which patterns may appear is in the left-hand sides of where clauses. Suppose function label is handed a 3-tuple of a latitude, a longitude, and an elevation. The function's job is to produce a human-readable string describing the location. It destructures the tuple in a where clause:

Haskell
import Text.Printf

label :: (Double, Double, Double) -> String
label location =
  printf "↔ %s ↕ %s 𓊍 %s" latitude longitude elevation
  where
    (latitude, longitude, elevation) = location
import Text.Printf

label :: (Double, Double, Double) -> String
label location =
  printf "↔ %s ↕ %s 𓊍 %s" latitude longitude elevation
  where
    (latitude, longitude, elevation) = location

Without the pattern matching, we'd have to pull out the three values with calls to functions like fst3, snd3, and thd3. None of these functions are provided by Haskell's standard library; we'd have to write them. In Haskell, pattern matching is the only means of accessing the fields of a tuple with more than two elements.

This startsWith function similarly destructures its list parameter in a where clause:

Haskell
startsWith :: Eq a => a -> [a] -> Bool
startsWith target items = target == first
  where
    first : rest = items
startsWith :: Eq a => a -> [a] -> Bool
startsWith target items = target == first
  where
    first : rest = items

The : is the cons operator, which makes the code look like it's assembling a list. But because it's in an lvalue context, the inverse is happening. List items is being destructured.

Discards

If a component of the composite data is not needed, we discard it by placing the wildcard _ in the pattern. This elevation function extracts out only the third component of a location 3-tuple, discarding the latitude and longitude:

Haskell
elevation :: (Double, Double, Double) -> Double
elevation location = y
  where (_, _, y) = location
elevation :: (Double, Double, Double) -> Double
elevation location = y
  where (_, _, y) = location

Nesting

Patterns may also contain nested patterns. This function uses a nested pattern to access the neck of a list, which is the element just after the head:

Haskell
neck :: [a] -> a
neck items = second
  where
    _ : (second : _) = items
neck :: [a] -> a
neck items = second
  where
    _ : (second : _) = items

The parentheses aren't needed since they don't change the precedence.

This function first destructures a list of pairs into its head and tail, but it discards the tail and further destructures the head into x and y:

Haskell
start :: (Show a, Show b) => [(a, b)] -> String
start points = directive
  where
    (x, y) : _ = points
    directive = "go to " ++ show x ++ ", " ++ show y
start :: (Show a, Show b) => [(a, b)] -> String
start points = directive
  where
    (x, y) : _ = points
    directive = "go to " ++ show x ++ ", " ++ show y

The parentheses are needed because they destructure the tuple at the head of the list.

Case

Another lvalue context in which patterns may appear is in the branches of a case structure. In this context, we may wish to place literal values in the patterns that must be matched. For example, this function uses patterns to identify where a point is on a Cartesian grid:

Haskell
gridPart :: (Int, Int) -> String
gridPart xy = case xy of
  (0, 0) -> "origin"
  (0, _) -> "y-axis"
  (_, 0) -> "x-axis"
  (x, y) -> "point " ++ show x + "," ++ show y
gridPart :: (Int, Int) -> String
gridPart xy = case xy of
  (0, 0) -> "origin"
  (0, _) -> "y-axis"
  (_, 0) -> "x-axis"
  (x, y) -> "point " ++ show x + "," ++ show y

This example demonstrates that patterns may match both structures and values. When a value doesn't need to be matched exactly, we either discard it or capture it in a variable.

Formal Parameters

Perhaps the most compelling place to use patterns is in the formal parameter list. Instead of writing the parameter's name, we write a pattern. This revised label function replaces the location formal parameter with a tuple pattern:

Haskell
import Text.Printf

label :: (Double, Double, Double) -> String
label (latitude, longitude, elevation) =
  printf "↔ %s ↕ %s 𓊍 %s" latitude longitude elevation
import Text.Printf

label :: (Double, Double, Double) -> String
label (latitude, longitude, elevation) =
  printf "↔ %s ↕ %s 𓊍 %s" latitude longitude elevation

With the destructuring in the formal parameters, the function no longer needs a where clause.

The elevation function turns into a succinct one-liner:

Haskell
elevation :: (Double, Double, Double) -> Double
elevation (_, _, y) = y
elevation :: (Double, Double, Double) -> Double
elevation (_, _, y) = y

Cons patterns must be surrounded by parentheses when used in place of a formal parameter:

Haskell
neck :: [a] -> a
neck (_ : second : _) = second
neck :: [a] -> a
neck (_ : second : _) = second

When a function's body is a case expression, its definition can be rewritten as a set of subdefinitions, with one subdefinition per arm of the case. Here's the gridPart function rewritten as four subdefinitions that use pattern matching in their formal parameters:

Haskell
gridPart :: (Int, Int) -> String
gridPart (0, 0) = "origin"
gridPart (0, _) = "y-axis"
gridPart (_, 0) = "x-axis"
gridPart (x, y) = "point " ++ show x + "," ++ show y
gridPart :: (Int, Int) -> String
gridPart (0, 0) = "origin"
gridPart (0, _) = "y-axis"
gridPart (_, 0) = "x-axis"
gridPart (x, y) = "point " ++ show x + "," ++ show y

This subdefinition form is especially compelling for recursive functions because it clearly distinguishes the base cases and the recursive cases. Each subdefinition has a narrow task, reducing the cognitive load placed on the programmer. Consider again the function zeroes, which creates a list of n zeroes. Here it is written using a case expression instead of an if expression:

Haskell
zeroes :: Int -> [Int]
zeroes n =
  case n of
    0 -> []
    n -> 0 : zeroes (n - 1)
zeroes :: Int -> [Int]
zeroes n =
  case n of
    0 -> []
    n -> 0 : zeroes (n - 1)

By recasting it as a set of subdefinitions with pattern matching, the intent of the code becomes clearer:

Haskell
zeroes :: Int -> [Int]
zeroes 0 = []
zeroes n = 0 : zeroes (n - 1)
zeroes :: Int -> [Int]
zeroes 0 = []
zeroes n = 0 : zeroes (n - 1)

This isEmpty function is a clone of the builtin null and uses patterns to test the emptiness of its list parameter:

Haskell
isEmpty :: [a] -> Bool
isEmpty [] = True
isEmpty _ = False
isEmpty :: [a] -> Bool
isEmpty [] = True
isEmpty _ = False

The order of the subdefinitions matters. They are matched against the actual parameters in the order they appear in the source code. If the _ case appeared first, it would match every call because the wildcard matches anything.

← RecursionHigher-order Functions →