Profiling and Deployment

Haskell's lazy evaluation and immutable data give you a lot for free, but they also create unique performance pathologies. Space leaks (where unevaluated thunks pile up), unintended sharing failures, and laziness-induced quadratic behavior are the bugs that take production-grade Haskell experience to diagnose. The tooling for finding them is good but takes some practice.

For deployment, Haskell binaries are easier than the language's reputation suggests. GHC produces standalone native binaries; Linux deployment is closer to a Go binary than to a JVM app.

GHC's profiler

The basic invocation: rebuild with profiling enabled, run with +RTS -p, get a .prof file.

$ cabal build --enable-profiling
$ cabal run myapp -- +RTS -p
$ less myapp.prof

The output shows time and allocation per cost center (function or annotation). With -fprof-auto, GHC adds cost centers automatically to every top-level function:

COST CENTRE   MODULE     %time  %alloc
parseRecord   Parser      45.2   38.1
encodeJson    Encoder     22.1   18.7
main          Main         8.0    5.2

This tells you where time is spent. A common failure mode is to find that 80% of time is in evaluate or <some standard combinator>, which means thunks are accumulating somewhere else and being forced in this spot. That's a space leak. The profiler shows the symptom; finding the cause needs heap profiling.

Heap profiling

The killer feature for Haskell-specific bugs:

$ cabal run myapp -- +RTS -hc -p

This produces myapp.hp, a heap profile broken down by cost center. Process it:

$ hp2ps -c myapp.hp
$ ps2pdf myapp.ps

Or, far better in 2026, use eventlog2html:

$ cabal run myapp -- +RTS -l -hT
$ eventlog2html myapp.eventlog

This produces an interactive HTML page showing heap residency over time, broken down by type, closure, or cost center depending on which -h flag you used:

-hc: by cost center
-hT: by closure type (great default)
-hd: by data constructor
-hy: by type
-hm: by module

When you see a graph that grows without bound, you've found a space leak. The breakdown tells you what's leaking, and you trace back to figure out why.

Real example. A worker pulls jobs from a queue, processes each, and updates a counter:

-- Buggy version
counter <- newIORef 0
forever $ do
  job <- atomically $ readTBQueue queue
  process job
  modifyIORef counter (+1)  -- BUG: builds thunks

Heap profile shows Int allocation growing forever. Replace with atomicModifyIORef' or use a strict update:

counter <- newIORef 0
forever $ do
  job <- atomically $ readTBQueue queue
  process job
  atomicModifyIORef' counter (\n -> (n+1, ()))

Heap is flat, problem solved.

ThreadScope for concurrency

If you're using multiple cores and threads, ThreadScope shows what each capability (CPU thread) is doing over time.

$ cabal run myapp -- +RTS -ls -N4
$ threadscope myapp.eventlog

You see a timeline: which capabilities are running threads, when they're idle, when GC happens, when sparks succeed or fail. The most common discoveries:

Capabilities are mostly idle while one is busy: your work isn't actually parallelized.
GC takes 30%+ of total time: too much allocation, look at -S output.
Sparks (from par) are mostly fizzling: laziness is preventing actual parallelism.

For real parallel programs (not just concurrent), ThreadScope is essential.

Strictness and bang patterns

Most performance fixes in Haskell amount to forcing evaluation at the right point. A few tools for this:

! in patterns and field declarations:

data Counter = Counter !Int !Int  -- both fields strict

incr :: Counter -> Counter
incr (Counter !a !b) = Counter (a+1) (b+1)

seq and deepseq:

import Control.DeepSeq

let !result = force expensiveComputation

force (from deepseq) evaluates a structure to its leaves. seq only evaluates to weak head normal form.

BangPatterns extension lets you mark function arguments strict:

{-# LANGUAGE BangPatterns #-}

sumList :: [Int] -> Int
sumList = go 0
  where
    go !acc []     = acc
    go !acc (x:xs) = go (acc + x) xs

Without the bang, acc accumulates thunks (((0+1)+2)+3...) and either runs out of stack or takes O(n) extra memory.

Building static binaries

For Linux deployment, you want a binary that runs on any reasonable Linux distribution without needing to install GHC's runtime libraries on the target. A few approaches:

Dynamic with explicit deps. Just cabal build and ship the binary with documented runtime dependencies (libgmp, libffi, etc.). Works for most internal deployments where you control the target machine.

Static with musl. The cleanest path: build against musl libc instead of glibc, which lets you produce a fully static binary. Use the musl-haskell Docker image or set up a musl-cross-compiler:

$ cabal build --enable-executable-static

The catch: GHC's runtime depends on libgmp, which is LGPL. For pure static builds you swap in integer-simple (no GMP, slower bigint), or use the ghc-bignum wired-in option.

Nix-built static binary. With static-haskell-nix, you get reproducible static builds. Used by IOG for some Cardano tooling. Heavy infrastructure, great results.

Docker for Haskell apps

A standard multi-stage Dockerfile:

FROM haskell:9.6.4 AS builder
WORKDIR /build
COPY . .
RUN cabal update && cabal build --enable-executable-static

FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y ca-certificates libgmp10 \
    && rm -rf /var/lib/apt/lists/*
COPY --from=builder /build/dist-newstyle/build/.../myapp /usr/local/bin/myapp
ENTRYPOINT ["/usr/local/bin/myapp"]

The haskell image has GHC and cabal pre-installed. Build artifacts are copied into a slim runtime image. The exact dist-newstyle path depends on your project; in practice you use cabal install --installdir=/out to put the binary in a known location.

For cabal, the build is the slow step. CI caching of ~/.cabal/store/ and dist-newstyle/ saves significant time.

For Stack, the equivalent uses the fpco/stack-build images. Stack's --docker flag can run the build inside a container automatically, useful but slower than a hand-rolled multi-stage build.

Nix for reproducibility

For projects where reproducibility matters (auditable builds, regulated industries, scientific computation), Nix is the answer. With haskell.nix (IOHK's tooling) or the older haskellPackages infrastructure in nixpkgs, you describe your Haskell project as a Nix expression and get bit-reproducible builds across machines and over time.

A minimal flake-based setup:

{
  inputs.nixpkgs.url = "github:NixOS/nixpkgs/nixos-23.11";
  outputs = { self, nixpkgs }: {
    packages.x86_64-linux.default =
      let pkgs = import nixpkgs { system = "x86_64-linux"; };
      in pkgs.haskellPackages.callCabal2nix "myapp" ./. {};
  };
}

$ nix build
$ ./result/bin/myapp

The full picture is more involved (you need to handle non-Hackage dependencies, GHC version pins, system libraries), but haskell.nix documentation walks through the patterns. Galois, IOG, and Tweag use Nix extensively for Haskell projects. For most projects, the overhead isn't worth it; for projects where reproducibility is non-negotiable, nothing else comes close.

Production deployment patterns

A few practices that distinguish production-quality Haskell deployments:

RTS flags. Always set +RTS -N to use all cores. Often -A32m or -A64m for a larger nursery (reduces GC frequency). For long-running services, -T enables runtime stats accessible via GHC.Stats, useful for dashboards.

$ ./myapp +RTS -N -A64m -T

GC tuning. The default GC settings are conservative. For services with bursty allocation, -qg (parallel GC) and -qa (affinity) can help. Measure before tuning; the defaults are usually fine.

Health checks. Expose /healthz and /readyz endpoints. The latter should fail until your DB pool, caches, etc. are ready.

Metrics. prometheus-client for instrumentation. Export GC stats, request rates, latencies, error rates. Standard stuff, just more deliberate than what you'd get from a Spring Boot autoconfig.

Logging. Structured logs (JSON) via katip or co-log. Plain putStrLn is fine for small CLI tools, awful for services.

Common Pitfalls

Profiling without -prof-built dependencies. The cost-center attribution is wrong. Rebuild everything with profiling enabled (cabal build --enable-profiling does this).

Heap profiling overhead. -h* flags slow your program down significantly. Use them in staging or load tests, not in production-shaped runs (unless you specifically need to capture a leak that only manifests under real load).

Trusting GHC's strictness analyzer to fix your code. The analyzer is good but doesn't catch everything. Be explicit with ! and force for values you know need to be strict.

Forgetting the ' on strict variants. foldl' not foldl, atomicModifyIORef' not atomicModifyIORef, Data.Map.Strict not Data.Map. The strict versions are usually what you want; the lazy ones build thunks.

Building with -O0 and assuming performance. Always benchmark with -O2. The optimizer is GHC's secret weapon; without it, even good code is slow.

Docker images that include GHC. The base image is 2GB+. Use multi-stage builds to ship only the binary in a slim runtime image.

Key Takeaways

GHC's profiler (-prof plus +RTS -p) tells you where time and allocation go. Heap profiling with -h* flags plus eventlog2html finds space leaks, the most common Haskell-specific performance bug.

ThreadScope visualizes concurrent and parallel execution. Use it when threads are involved.

Static binaries via musl or Nix give you Linux-portable artifacts. Multi-stage Dockerfiles produce small images.

Nix is overkill for most projects but unmatched when reproducibility is non-negotiable. IOG, Galois, and Tweag rely on it heavily.

Always set +RTS -N, expose metrics, log structurally, and benchmark with -O2. The boring deployment hygiene matters more than language-specific tricks.