Space Leaks and Debugging
A space leak is when your Haskell program holds onto memory it doesn't logically need. The garbage collector can't reclaim it because something — a thunk, a lazy field, a never-forced value buried in a long-lived data structure — still references it. The program runs slower and slower, eats more memory, and eventually crashes or gets OOM-killed.
Every nontrivial Haskell program has had a space leak at some point. Mercury, Standard Chartered, IOG, and Tweag have all written postmortems on production leaks. Diagnosing them is a learnable skill, and the tools have gotten genuinely good. This page covers what space leaks look like, why foldl is the classic source, how to find them with heap profiling, and how to fix the common cases.
What a Space Leak Is
A space leak is not the same as using too much memory. If you load a 10GB file into a Map, that's just expected memory use. A leak is when memory grows beyond what the program logically needs to hold:
- An accumulator in a fold that should be a single number, but is a chain of millions of unevaluated additions.
- A long-running event loop where each iteration stacks more thunks onto the state.
- A cache that "remembers" old values because a closure captured them.
- A lazy field in a record that holds a reference to gigabytes of data the rest of the structure has long since forgotten.
The hallmark: memory grows linearly (or faster) with input size or run time, when it should stay flat.
Why foldl Is Dangerous
The textbook leak:
import Data.List (foldl) -- the lazy one
mySum :: [Int] -> Int
mySum = foldl (+) 0
main = print (mySum [1..10_000_000])
This either crashes with stack overflow or runs absurdly slowly. Why?
foldl (+) 0 [1, 2, 3] evaluates step by step like this:
foldl (+) 0 [1, 2, 3]
foldl (+) (0 + 1) [2, 3]
foldl (+) ((0 + 1) + 2) [3]
foldl (+) (((0 + 1) + 2) + 3) []
((0 + 1) + 2) + 3
6
The accumulator is a thunk. Each step nests another + around it. By the time the list is empty, the accumulator is a ten-million-deep chain of + thunks. Forcing it requires recursing into every level — and the runtime stack runs out.
The fix is foldl', which forces the accumulator at every step:
import Data.List (foldl')
mySum :: [Int] -> Int
mySum = foldl' (+) 0
Now each step computes a real number before continuing. Constant space, no stack overflow.
This same trap appears in any hand-written accumulator that doesn't force its state. The fix is always the same: bang patterns, seq, strict fields, or use foldl'/Map.Strict/etc.
Spotting Leaks in Production
The first sign is usually one of:
- Memory usage climbs steadily under load and never drops.
- Long-running processes start fast and slow down over hours.
- The program OOMs on inputs that are small compared to available RAM.
- GC pauses get longer and more frequent in
+RTS -soutput.
Run with +RTS -s to see the runtime statistics:
$ ./myprogram +RTS -s
10,234,567,890 bytes allocated in the heap
8,123,456,789 bytes copied during GC
500,123,456 bytes maximum residency
...
Total time 42.123s
Productivity 23.4% of total user
maximum residency that's much larger than the data you'd expect to hold is a red flag. Productivity below 50% means you're spending more than half the program's time in garbage collection — usually a sign of leak-induced thrashing.
Heap Profiling with -hT
The real diagnostic tool is heap profiling. Compile with profiling enabled (cabal build --enable-profiling or the prof flag in your package.yaml), then run:
$ ./myprogram +RTS -hT -RTS
This produces myprogram.hp, a heap profile broken down by closure type. Convert it to a graph with hp2ps:
$ hp2ps -c myprogram.hp
$ open myprogram.ps
You'll see a stacked area chart of memory use over time, colored by closure type. A leak typically looks like one band growing without bound. The closure-type breakdown tells you what kind of value is leaking — THUNK, BLACKHOLE, specific data constructors, or STACK.
For more granular profiles, use cost-center profiling (-p and -h<X> variants) to attribute heap to specific functions in your code.
The eventlog2html tool and the hs-speedscope viewer are modern alternatives that produce interactive HTML reports from -hT data. Once you've used them on a real leak, you'll never go back to hp2ps.
For long-running services, the ekg library exposes runtime stats over HTTP, and tools like Grafana can chart maximum residency and GC time. Mercury runs heap profiles continuously in staging to catch regressions before production.
Common Leak Patterns and Fixes
Lazy Accumulator in a Fold
-- Leaky
process = foldl step initial
where step s x = updateState s x
-- Fixed
import Data.List (foldl')
process = foldl' step initial
If updateState returns a record, also make its fields strict.
Lazy Map Values
import qualified Data.Map as Map -- lazy
-- Leaky: thunks pile up in map values
counts :: [String] -> Map.Map String Int
counts = foldr (\w -> Map.insertWith (+) w 1) Map.empty
Switch to Data.Map.Strict:
import qualified Data.Map.Strict as Map
counts :: [String] -> Map.Map String Int
counts = foldl' (\m w -> Map.insertWith (+) w 1 m) Map.empty
Both changes (strict map and foldl') are usually needed together.
Lazy Record Fields
-- Leaky: counts are thunks
data Stats = Stats { hits :: Int, misses :: Int }
bumpHit s = s { hits = hits s + 1 }
-- Fixed: bang the fields
data Stats = Stats { hits :: !Int, misses :: !Int }
Or use the StrictData extension at the top of the module:
{-# LANGUAGE StrictData #-}
data Stats = Stats { hits :: Int, misses :: Int }
-- both fields are now strict
StrictData only affects fields in the module where it's enabled. It doesn't ripple into types defined elsewhere.
Long-Running State Loop
-- Leaky: state grows as thunks
loop :: State -> IO ()
loop state = do
event <- waitForEvent
let newState = applyEvent state event
loop newState
-- Fixed: bang the state
{-# LANGUAGE BangPatterns #-}
loop :: State -> IO ()
loop !state = do
event <- waitForEvent
let !newState = applyEvent state event
loop newState
Forcing both the parameter and the locally-bound update is belt-and-suspenders, but cheap insurance.
Unintended Sharing
This one is sneaky. Sometimes laziness causes a value to be retained because something still references it:
-- Whole list is retained because xs is used in two places
firstAndLast :: [Int] -> (Int, Int)
firstAndLast xs = (head xs, last xs)
head xs is fine; last xs walks the whole list. While last is computing, the entire list has to stay in memory because head already forced the front. If the list is generated lazily (e.g., from a file), it gets fully loaded.
This isn't really a leak in the technical sense — it's needed memory — but it surprises people. The fix depends on the situation: process the data in one pass, or use a Vector if you genuinely need both ends.
The StrictData Extension
Mentioned in passing in the previous page; worth highlighting again. StrictData is the closest thing to a "make this codebase less leaky" knob:
{-# LANGUAGE StrictData #-}
module App.User where
data User = User
{ userId :: Int -- automatically strict
, userName :: Text -- automatically strict
, userBio :: Text -- automatically strict
}
You can opt out per-field with ~:
data Settings = Settings
{ defaultLang :: Text
, expensiveDerived :: ~Text -- explicitly lazy
}
Mercury and IOG default to StrictData in nearly every module. The Haskell community is increasingly converging on this as the right default for application code. Library code is a more nuanced — some abstractions genuinely need laziness — but for an app, default to strict and earn the laziness exceptions.
A more aggressive option is -XStrict, which makes all bindings strict (not just data fields). This changes a lot of behavior and breaks code that depends on laziness. Use it sparingly and in single modules.
Real Fixes for Common Cases
Here's a checklist for hunting leaks in a real Haskell service:
- Run with
+RTS -s. Ifmaximum residencyis much larger than the working set, suspect a leak. - Build with profiling and run with
+RTS -hT. Look at the resulting graph. A growing band is your leak. - Check every
foldl,foldr(when accumulating to scalars),Map.insertWith, andIORef/MVarmodification. Convert to strict variants. - Audit data definitions for lazy fields. Add bang patterns or turn on
StrictData. - Check long-running loops for un-forced state. Add bang patterns to recursive parameters.
- Look for closures over large values. A small function that captures a large list keeps the list alive as long as the function does. Use case-of-strict to force what you need and let the rest be GC'd.
- Re-profile. Confirm the leak is gone, not just smaller.
A specific Mercury postmortem: a logging middleware retained request bodies because the Logger closure captured the request. The fix was to extract just the small RequestId immediately and let the request body get GC'd. The leak vanished and memory dropped 30%.
A Realistic Case Study
A naive log analyzer that leaks:
import qualified Data.Map as Map -- lazy!
countByUser :: [LogLine] -> Map.Map UserId Int
countByUser = foldr (\l -> Map.insertWith (+) (logUser l) 1) Map.empty
On 10 million log lines this consumes several gigabytes. Heap profile shows a THUNK_* band growing linearly. The fix:
import qualified Data.Map.Strict as Map
import Data.List (foldl')
countByUser :: [LogLine] -> Map.Map UserId Int
countByUser = foldl' (\m l -> Map.insertWith (+) (logUser l) 1 m) Map.empty
Two changes: strict map, strict fold. Memory drops to a few megabytes (just the map itself).
Common Pitfalls
Profiling without optimizations. GHC's optimizer turns many leaks into non-leaks via strictness analysis. Always profile -O1 or -O2 builds; profiling unoptimized code can show leaks that don't exist in production binaries (and miss leaks that do).
Assuming "I added a bang, leak fixed." The bang only forces the value at that point. If the leak is downstream (in a captured closure, in a Map's values, in a record field), the local bang doesn't help.
Confusing thunk leaks with retainer leaks. A thunk leak is "we built up unevaluated work." A retainer leak is "we're holding references to data we don't logically need." The fixes are different: forcing fixes the first; restructuring data flow fixes the second.
Reading the -hT graph backwards. The bands are stacked, not overlaid. The y-axis is total memory; each band is one closure type's contribution.
Not enabling profiling on dependencies. If a leak is in a library, you'll see it as <unknown> or aggregated unhelpfully unless your dependencies are also built with profiling. cabal build --enable-library-profiling for full coverage.
Key Takeaways
Space leaks happen when thunks or unneeded references accumulate in long-lived structures. The classic source is foldl (use foldl'). The next is lazy Map values (use Data.Map.Strict). The next is lazy record fields (use bang patterns or StrictData). Heap profiling with -hT shows you what's leaking; the runtime statistics from -RTS -s tell you whether you have a leak at all. Build with profiling enabled for any service you'll actually deploy, and run profiles in staging on realistic loads. The fix is almost always strictness in the right place.