Pattern Matching
Frequently used words and phrases in a spoken language tend to erode and shorten over time. The English goodbye, for example, is a shortened form of God be with ye. Many of the most commonly used verbs in Spanish are irregular, like ir (to go), ser (to be), decir (to say), and poder (to be able to). Their conjugations have deviated from the standard rules to accommodate the ergonomics of speech.
Erosion also happens in programming languages. For example, we frequently process structs, tuples, and arrays by explicitly reaching inside the composite data and accessing individual components with a member (.
and ->
) or subscript ([]
) operation. These explicit accesses appear so frequently in code that some languages offer a shorter alternative called pattern matching.
In pattern matching, composite data is matched against a pattern that describes the shape of a structure. If the data matches the pattern, it is destructured into variables named in the pattern. This pattern, for example, matches against a 3-tuple that will be destructured into three variables:
(latitude, longitude, elevation)
(latitude, longitude, elevation)
This pattern matches against a non-empty list that will be destructured into two variables:
first : rest
first : rest
The syntax of these destructuring patterns is the same syntax used to assemble the structures in the first place. The only difference is that patterns appear in lvalue contexts. The variables latitude
, longitude
, elevation
, first
, and rest
don't already exist. They are declared and assigned as a result of the pattern match.
Where
One of the lvalue contexts in which patterns may appear is in the left-hand sides of where
clauses. Suppose function label
is handed a 3-tuple of a latitude, a longitude, and an elevation. The function's job is to produce a human-readable string describing the location. It destructures the tuple in a where
clause:
import Text.Printf
label :: (Double, Double, Double) -> String
label location =
printf "↔ %s ↕ %s 𓊍 %s" latitude longitude elevation
where
(latitude, longitude, elevation) = location
import Text.Printf label :: (Double, Double, Double) -> String label location = printf "↔ %s ↕ %s 𓊍 %s" latitude longitude elevation where (latitude, longitude, elevation) = location
Without the pattern matching, we'd have to pull out the three values with calls to functions like fst3
, snd3
, and thd3
. None of these functions are provided by Haskell's standard library; we'd have to write them. In Haskell, pattern matching is the only means of accessing the fields of a tuple with more than two elements.
This startsWith
function similarly destructures its list parameter in a where
clause:
startsWith :: Eq a => a -> [a] -> Bool
startsWith target items = target == first
where
first : rest = items
startsWith :: Eq a => a -> [a] -> Bool startsWith target items = target == first where first : rest = items
The :
is the cons operator, which makes the code look like it's assembling a list. But because it's in an lvalue context, the inverse is happening. List items
is being destructured.
Discards
If a component of the composite data is not needed, we discard it by placing the wildcard _
in the pattern. This elevation
function extracts out only the third component of a location 3-tuple, discarding the latitude and longitude:
elevation :: (Double, Double, Double) -> Double
elevation location = y
where (_, _, y) = location
elevation :: (Double, Double, Double) -> Double elevation location = y where (_, _, y) = location
Nesting
Patterns may also contain nested patterns. This function uses a nested pattern to access the neck of a list, which is the element just after the head:
neck :: [a] -> a
neck items = second
where
_ : (second : _) = items
neck :: [a] -> a neck items = second where _ : (second : _) = items
The parentheses aren't needed since they don't change the precedence.
This function first destructures a list of pairs into its head and tail, but it discards the tail and further destructures the head into x
and y
:
start :: (Show a, Show b) => [(a, b)] -> String
start points = directive
where
(x, y) : _ = points
directive = "go to " ++ show x ++ ", " ++ show y
start :: (Show a, Show b) => [(a, b)] -> String start points = directive where (x, y) : _ = points directive = "go to " ++ show x ++ ", " ++ show y
The parentheses are needed because they destructure the tuple at the head of the list.
Case
Another lvalue context in which patterns may appear is in the branches of a case structure. In this context, we may wish to place literal values in the patterns that must be matched. For example, this function uses patterns to identify where a point is on a Cartesian grid:
gridPart :: (Int, Int) -> String
gridPart xy = case xy of
(0, 0) -> "origin"
(0, _) -> "y-axis"
(_, 0) -> "x-axis"
(x, y) -> "point " ++ show x + "," ++ show y
gridPart :: (Int, Int) -> String gridPart xy = case xy of (0, 0) -> "origin" (0, _) -> "y-axis" (_, 0) -> "x-axis" (x, y) -> "point " ++ show x + "," ++ show y
This example demonstrates that patterns may match both structures and values. When a value doesn't need to be matched exactly, we either discard it or capture it in a variable.
Formal Parameters
Perhaps the most compelling place to use patterns is in the formal parameter list. Instead of writing the parameter's name, we write a pattern. This revised label
function replaces the location
formal parameter with a tuple pattern:
import Text.Printf
label :: (Double, Double, Double) -> String
label (latitude, longitude, elevation) =
printf "↔ %s ↕ %s 𓊍 %s" latitude longitude elevation
import Text.Printf label :: (Double, Double, Double) -> String label (latitude, longitude, elevation) = printf "↔ %s ↕ %s 𓊍 %s" latitude longitude elevation
With the destructuring in the formal parameters, the function no longer needs a where
clause.
The elevation
function turns into a succinct one-liner:
elevation :: (Double, Double, Double) -> Double
elevation (_, _, y) = y
elevation :: (Double, Double, Double) -> Double elevation (_, _, y) = y
Cons patterns must be surrounded by parentheses when used in place of a formal parameter:
neck :: [a] -> a
neck (_ : second : _) = second
neck :: [a] -> a neck (_ : second : _) = second
When a function's body is a case expression, its definition can be rewritten as a set of subdefinitions, with one subdefinition per arm of the case. Here's the gridPart
function rewritten as four subdefinitions that use pattern matching in their formal parameters:
gridPart :: (Int, Int) -> String
gridPart (0, 0) = "origin"
gridPart (0, _) = "y-axis"
gridPart (_, 0) = "x-axis"
gridPart (x, y) = "point " ++ show x + "," ++ show y
gridPart :: (Int, Int) -> String gridPart (0, 0) = "origin" gridPart (0, _) = "y-axis" gridPart (_, 0) = "x-axis" gridPart (x, y) = "point " ++ show x + "," ++ show y
This subdefinition form is especially compelling for recursive functions because it clearly distinguishes the base cases and the recursive cases. Each subdefinition has a narrow task, reducing the cognitive load placed on the programmer. Consider again the function zeroes
, which creates a list of n
zeroes. Here it is written using a case expression instead of an if expression:
zeroes :: Int -> [Int]
zeroes n =
case n of
0 -> []
n -> 0 : zeroes (n - 1)
zeroes :: Int -> [Int] zeroes n = case n of 0 -> [] n -> 0 : zeroes (n - 1)
By recasting it as a set of subdefinitions with pattern matching, the intent of the code becomes clearer:
zeroes :: Int -> [Int]
zeroes 0 = []
zeroes n = 0 : zeroes (n - 1)
zeroes :: Int -> [Int] zeroes 0 = [] zeroes n = 0 : zeroes (n - 1)
This isEmpty
function is a clone of the builtin null
and uses patterns to test the emptiness of its list parameter:
isEmpty :: [a] -> Bool
isEmpty [] = True
isEmpty _ = False
isEmpty :: [a] -> Bool isEmpty [] = True isEmpty _ = False
The order of the subdefinitions matters. They are matched against the actual parameters in the order they appear in the source code. If the _
case appeared first, it would match every call because the wildcard matches anything.