Higher-order I/O
Imagine we've got a list of names, and we print them, one per line. We've learned that iteration is achieved through recursion in Haskell, so we write out the first draft of this routine as an impure recursive function:
printNames :: [String] -> IO ()
printNames names =
if null names then
return ()
else do
putStrLn $ head names
printNames $ tail names
Ah, but this conditional structure is a prime candidate for a pattern matching makeover. Our second draft separates the two cases into two subdefinitions:
printNames :: [String] -> IO ()
printNames [] = return ()
printNames (first : rest) = do
putStrLn first
printNames rest
Perhaps this second draft is good enough. But applying an IO action to each element of a list is common in Haskell. Maybe we have a list of file paths, and we want to read in all their contents. Maybe we have a list of users, and we want to retrieve all their profiles from a web service. Maybe we have a list of URLs, and we want to fetch them all. If this pattern has been abstracted into a higher-order abstraction, we could complete each of these tasks with a single call. It has been, and we can.
The builtin mapM_
function effectively has this signature:
mapM_ :: (a -> IO ()) -> [a] -> IO ()
This function is similar to map
. The second parameter is a list of any type a
, and the first parameter is a function that is applied to each element. The difference is that map
is a pure function that applies a pure transformation function to each element, while mapM_
is an impure function that applies an impure IO action to each element. The return type IO ()
indicates that the action doesn't return a value but achieves a side effect like printing.
This third and final draft of printNames
defers all the recursion to mapM_
:
printNames :: [String] -> IO ()
printNames names = mapM_ putStrLn names
The definition has been reduced considerably since the first draft. It's simple enough now that we might question whether we even need to define a separate printNames
function. Maybe we could just call mapM_ putStrLn names
directly.
If the IO action we wish to perform on each element returns data tainted by IO, then we want to instead use mapM
, which effectively has this type signature:
mapM :: (a -> IO b) -> [a] -> IO [b]
Do you see the difference between mapM
and mapM_
?
If we need a function that reads in a bunch of files and returns their contents as a list of String
, then we can write this one-liner that maps the builtin readFile
over the list of paths:
readFiles :: [String] -> IO [String]
readFiles paths = mapM readFile paths
This main function reads in three files, unboxes the IO [String]
returned by readFiles
, concatenates the file contents together, and writes a single merged string to a new file:
main = do
texts <- readFiles ["a.txt", "b.txt", "c.txt"]
let merged = concat texts
writeFile "abc.txt" merged
You don't have to love Haskell, but imagine writing an equivalent program in C.
Summary
A Haskell program is really just a function call setting off a chain of other function calls. Even iteration is implemented by having a function call itself. Since writing functions is the primary activity of Haskell developers, the language provides ways to make writing them easier. They can be defined with four different mechanics: named definitions, lambdas, partial application, and composition. Named definitions are the traditional way of defining functions that we have seen in other languages. Lambdas are anonymous and syntactically light functions that aren't needed in the current scope but are needed by the higher-order functions to which we pass them off. Partial application builds a low-arity version of a high-arity function by preloading some parameters. Composition welds functions together in a pipeline. Many iteration algorithms can be expressed by passing functions off to map, filter, and fold. Two features make it easier for functions to access values. Compound data may be automatically destructured into more usable pieces by replacing a formal parameter name with a pattern. Functions may also access free variables outside of their scope, which is especially useful for callbacks that must conform to an inflexible interface. Because the free variables need to live as long as the function that may reference them, the language's tooling creates a closure that packages up the function's definition along with its free variable bindings.