Higher-order I/O

Dear Computer

Chapter 8: Functions Revisited

Higher-order I/O

Imagine we've got a list of names, and we print them, one per line. We've learned that iteration is achieved through recursion in Haskell, so we write out the first draft of this routine as an impure recursive function:

Haskell
printNames :: [String] -> IO ()
printNames names =
  if null names then
    return ()
  else do
    putStrLn $ head names
    printNames $ tail names

Ah, but this conditional structure is a prime candidate for a pattern matching makeover. Our second draft separates the two cases into two subdefinitions:

Haskell
printNames :: [String] -> IO ()
printNames [] = return ()
printNames (first : rest) = do
  putStrLn first
  printNames rest

Perhaps this second draft is good enough. But applying an IO action to each element of a list is common in Haskell. Maybe we have a list of file paths, and we want to read in all their contents. Maybe we have a list of users, and we want to retrieve all their profiles from a web service. Maybe we have a list of URLs, and we want to fetch them all. If this pattern has been abstracted into a higher-order abstraction, we could complete each of these tasks with a single call. It has been, and we can.

The builtin mapM_ function effectively has this signature:

Haskell
mapM_ :: (a -> IO ()) -> [a] -> IO ()

This function is similar to map. The second parameter is a list of any type a, and the first parameter is a function that is applied to each element. The difference is that map is a pure function that applies a pure transformation function to each element, while mapM_ is an impure function that applies an impure IO action to each element. The return type IO () indicates that the action doesn't return a value but achieves a side effect like printing.

This third and final draft of printNames defers all the recursion to mapM_:

Haskell
printNames :: [String] -> IO ()
printNames names = mapM_ putStrLn names

The definition has been reduced considerably since the first draft. It's simple enough now that we might question whether we even need to define a separate printNames function. Maybe we could just call mapM_ putStrLn names directly.

If the IO action we wish to perform on each element returns data tainted by IO, then we want to instead use mapM, which effectively has this type signature:

Haskell
mapM :: (a -> IO b) -> [a] -> IO [b]

Do you see the difference between mapM and mapM_?

If we need a function that reads in a bunch of files and returns their contents as a list of String, then we can write this one-liner that maps the builtin readFile over the list of paths:

Haskell
readFiles :: [String] -> IO [String]
readFiles paths = mapM readFile paths

This main function reads in three files, unboxes the IO [String] returned by readFiles, concatenates the file contents together, and writes a single merged string to a new file:

Haskell
main = do
  texts <- readFiles ["a.txt", "b.txt", "c.txt"]
  let merged = concat texts
  writeFile "abc.txt" merged

You don't have to love Haskell, but imagine writing an equivalent program in C.

Summary

A Haskell program is really just a function call setting off a chain of other function calls. Even iteration is implemented by having a function call itself. Since writing functions is the primary activity of Haskell developers, the language provides ways to make writing them easier. They can be defined with four different mechanics: named definitions, lambdas, partial application, and composition. Named definitions are the traditional way of defining functions that we have seen in other languages. Lambdas are anonymous and syntactically light functions that aren't needed in the current scope but are needed by the higher-order functions to which we pass them off. Partial application builds a low-arity version of a high-arity function by preloading some parameters. Composition welds functions together in a pipeline. Many iteration algorithms can be expressed by passing functions off to map, filter, and fold. Two features make it easier for functions to access values. Compound data may be automatically destructured into more usable pieces by replacing a formal parameter name with a pattern. Functions may also access free variables outside of their scope, which is especially useful for callbacks that must conform to an inflexible interface. Because the free variables need to live as long as the function that may reference them, the language's tooling creates a closure that packages up the function's definition along with its free variable bindings.

← CompositionLecture: Embracing Functions →