Map, Filter, and Fold

If you only learn three list functions in Haskell, learn map, filter, and fold. They cover most of what you'd write a loop for in a procedural language. The first two are simple. The third — fold — has three variants (foldr, foldl, foldl') and choosing between them is one of the rituals you have to go through to write production-quality Haskell.

This page covers all three, the differences between the fold variants, why foldl' is usually the right answer, why foldr can magically work on infinite lists, and how scan relates to fold.

`map`

map :: (a -> b) -> [a] -> [b]
map _ []     = []
map f (x:xs) = f x : map f xs

map applies a function to every element, returning a new list of the same length:

map (* 2) [1, 2, 3]            -- [2, 4, 6]
map show [1, 2, 3]             -- ["1", "2", "3"]
map length ["hi", "world"]     -- [2, 5]

It's lazy: map f xs doesn't compute anything until you start consuming the result. take 3 (map expensive [1..]) only calls expensive three times.

`filter`

filter :: (a -> Bool) -> [a] -> [a]
filter _ []     = []
filter p (x:xs)
  | p x       = x : filter p xs
  | otherwise = filter p xs

filter keeps elements that satisfy the predicate:

filter even [1..10]            -- [2, 4, 6, 8, 10]
filter (> 'a') "banana"        -- "bnn"
filter (not . null) ["", "a", "", "b"]   -- ["a", "b"]

Like map, it's lazy. Combining them is idiomatic:

-- Squares of even numbers up to 100
squaresOfEvens :: [Int]
squaresOfEvens = map (^ 2) (filter even [1..100])

GHC fuses map and filter chains into a single traversal at compile time (via "list fusion" rewrite rules), so writing pipelines like this doesn't allocate intermediate lists in optimized builds.

`foldr`: The Right Fold

foldr is the right fold. It collapses a list into a single value by combining each element with the result of folding the rest:

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z []     = z
foldr f z (x:xs) = f x (foldr f z xs)

Conceptually, foldr f z [a, b, c] becomes f a (f b (f c z)). Read it as: replace each : with f, replace [] with z.

foldr (+) 0 [1, 2, 3]      -- 1 + (2 + (3 + 0))   == 6
foldr (*) 1 [1, 2, 3, 4]   -- 24
foldr (:) [] [1, 2, 3]     -- [1, 2, 3]   (literally rebuilds the list)

Many standard functions are folds in disguise:

sumF :: Num a => [a] -> a
sumF = foldr (+) 0

productF :: Num a => [a] -> a
productF = foldr (*) 1

mapF :: (a -> b) -> [a] -> [b]
mapF f = foldr (\x acc -> f x : acc) []

filterF :: (a -> Bool) -> [a] -> [a]
filterF p = foldr (\x acc -> if p x then x : acc else acc) []

lengthF :: [a] -> Int
lengthF = foldr (\_ n -> n + 1) 0

This is the "everything is a fold" lesson, and it's true: any function that processes a list one element at a time can be written as a fold. Category theorists call this the catamorphism view of folds — they're the canonical way to consume an inductive data structure.

Why `foldr` Works on Infinite Lists

Here's the trick that surprises people: foldr can work on infinite lists, depending on the combining function.

-- This terminates and returns True
any (== 5) [1..]

any is defined as any p = foldr (\x acc -> p x || acc) False. Because || is lazy in its second argument, once p x is True, the recursive call never happens. The fold short-circuits.

The deep reason: foldr f z (x:xs) is f x (foldr f z xs). If f doesn't force its second argument, the recursive call is just a thunk. It might never run.

This is why foldr is the right tool for:

Short-circuiting operations (any, all, find).
Infinite or large lazy lists.
Building lists element by element where consumers may not need the whole result.

`foldl`: The Left Fold (Lazy, Usually Wrong)

foldl walks left to right, accumulating into a single value:

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl _ z []     = z
foldl f z (x:xs) = foldl f (f z x) xs

Conceptually, foldl f z [a, b, c] is f (f (f z a) b) c. The accumulator starts as z and gets updated leftward.

foldl (+) 0 [1, 2, 3]      -- ((0 + 1) + 2) + 3 == 6

Looks fine. The problem is that foldl is lazy in the accumulator. On a long list:

sum' :: [Int] -> Int
sum' = foldl (+) 0

main = print (sum' [1..10000000])  -- stack overflow

What happens: each step builds a thunk like ((0 + 1) + 2) + 3 + ... + 10000000. The thunks pile up because nothing forces them until the final result is demanded. Then GHC tries to evaluate the whole chain at once and the stack blows.

foldl is almost never what you want. Use it only when you specifically need a lazy left fold (which is rare and usually a sign of a different design problem).

`foldl'`: The Strict Left Fold (Usually What You Want)

foldl' (note the apostrophe) lives in Data.List and is the strict version of foldl:

import Data.List (foldl')

foldl' :: (b -> a -> b) -> b -> [a] -> b

It forces the accumulator at each step, which prevents the thunk buildup:

import Data.List (foldl')

sumStrict :: [Int] -> Int
sumStrict = foldl' (+) 0

main = print (sumStrict [1..10000000])  -- runs in constant space

The default Prelude sum was lazy and buggy for years. Modern Prelude.sum uses foldl' under the hood. But for any custom accumulating fold, you should reach for foldl' explicitly.

A worked example: build a frequency map.

import qualified Data.Map.Strict as Map
import Data.List (foldl')

frequencies :: Ord a => [a] -> Map.Map a Int
frequencies = foldl' step Map.empty
  where
    step m x = Map.insertWith (+) x 1 m

This runs in constant memory beyond the size of the map itself. Using foldl (or foldr for that matter) here would build up thunks and likely blow the heap on large inputs.

When to Use Which

The decision tree:

Need to short-circuit, work on an infinite list, or build a list lazily? → foldr.

Reducing to a strict scalar (sum, product, count, max, accumulator into a strict map)? → foldl'.

Need lazy left fold semantics? → foldl. (Almost never. If you think you do, reconsider.)

A rule of thumb that holds 95% of the time: if the result is a number or a strict data structure (Map.Strict, Set, IntMap.Strict), use foldl'. If the result is a list or you might short-circuit, use foldr.

`scan`: Fold's Sibling

scanl, scanr, and scanl' are like folds but return all the intermediate accumulator values, not just the final one:

scanl :: (b -> a -> b) -> b -> [a] -> [b]

scanl (+) 0 [1, 2, 3, 4]       -- [0, 1, 3, 6, 10]
scanl1 max [3, 1, 4, 1, 5, 9, 2, 6]   -- [3, 3, 4, 4, 5, 9, 9, 9]

scanl is the running fold. It's perfect for:

Running sums or averages.
Tracking the maximum-so-far for a stock-price-style problem.
Producing animation frames where each frame depends on the previous.

-- Running balance from a list of transactions
balances :: [Int] -> [Int]
balances = scanl (+) 0

-- balances [100, -30, 50, -20] == [0, 100, 70, 120, 100]

Like foldl, scanl has a strict variant:

import Data.List (scanl')

scanl' :: (b -> a -> b) -> b -> [a] -> [b]

Use scanl' for the same reason you use foldl': avoid thunk buildup when the accumulator is strict by nature.

scanr exists too but is less common. It returns the result of folding from each position to the end:

scanr (+) 0 [1, 2, 3]   -- [6, 5, 3, 0]

A Realistic Example: Parsing Stats

Imagine you have a stream of HTTP response codes from a load balancer log and you want totals, error counts, and an error rate:

import Data.List (foldl')

data Stats = Stats
  { totalRequests :: !Int
  , errorCount    :: !Int
  } deriving Show

emptyStats :: Stats
emptyStats = Stats 0 0

addCode :: Stats -> Int -> Stats
addCode (Stats t e) code =
  Stats (t + 1) (if code >= 500 then e + 1 else e)

computeStats :: [Int] -> Stats
computeStats = foldl' addCode emptyStats

errorRate :: Stats -> Double
errorRate (Stats t e)
  | t == 0    = 0
  | otherwise = fromIntegral e / fromIntegral t

The bang patterns (!Int) on the Stats fields plus foldl' mean this runs in constant space over arbitrarily large input. This is exactly the pattern Mercury, Standard Chartered, and Sigma use for streaming analytics: strict accumulator + foldl' + strict fields.

Common Pitfalls

Using foldl instead of foldl'. The classic Haskell space leak. If you wrote a fold that produces a number and your program runs out of memory, this is the first thing to check.

Using foldr on a giant list to produce a strict scalar. foldr (+) 0 [1..10000000] blows the stack because the addition chain builds up before being forced. Use foldl' for strict aggregations.

Forgetting that foldr can short-circuit. If you need any, all, or "find the first thing that matches," foldr is the right tool because the lazy second argument lets the fold stop early. Writing this with foldl' would always traverse the full list.

Building a list with foldl' (++) []. Each ++ is O(n) in its left argument, and foldl' calls them in the wrong direction, giving you O(n²). Use concat (which is foldr (++) []) or build with : and reverse at the end.

Strict data fields without foldl', or foldl' without strict fields. You usually need both. A foldl' over a record with lazy fields still leaks because the field updates produce thunks. Add bang patterns to the fields.

Key Takeaways

map and filter cover most per-element work. foldr is the general consumer of inductive data and works correctly on infinite lists when the combining function is lazy in its second argument. foldl' is the strict left fold, and it's the right tool whenever you're collapsing to a strict scalar or building a strict map. Avoid plain foldl. scan is fold's sibling that returns intermediate values, perfect for running totals. The mantra: short-circuit or list result, use foldr. Strict scalar result, use foldl'.