Threads, MVar, and IORef

Haskell's concurrency story is unusual among compiled languages. A forkIO call creates a green thread that costs about 1.5KB of memory and is scheduled by the GHC runtime, not the OS. Facebook's Sigma spam-filtering system runs millions of these threads on a single machine, evaluating rules against incoming messages. This is the kind of scale you'd expect from Erlang, not from a typed functional language.

The cost model matters. OS threads (the kind pthread_create gives you) cost roughly 8MB of stack each by default. Spawning a million of them is a non-starter. GHC's threads are scheduled onto a small pool of OS threads (set by +RTS -N), and they yield at safe points the runtime knows about. You can spawn hundreds of thousands of them without thinking about it.

forkIO and the runtime

The basic primitive lives in Control.Concurrent:

import Control.Concurrent

main :: IO ()
main = do
  forkIO $ do
    putStrLn "Hello from a thread"
  putStrLn "Hello from main"
  threadDelay 1000000  -- 1 second in microseconds

The output ordering is undefined. forkIO returns a ThreadId immediately and the new thread runs concurrently with whatever spawned it. The main thread doesn't wait for child threads to finish before exiting, which is one of the first things that bites newcomers.

To use multiple cores you need to compile with -threaded and run with +RTS -N (or -N4 for four capabilities). Without -N, you get concurrency but not parallelism, all your threads multiplex onto a single OS thread.

import Control.Concurrent
import Control.Monad

main :: IO ()
main = do
  forM_ [1..1000000] $ \i -> forkIO $ do
    -- some small task
    return ()
  threadDelay 5000000

A million threads. This will run, though forM_ on the main thread is the bottleneck, not the threads themselves. In real code you'd batch work or use a worker pool.

IORef for unsynchronized mutation

IORef is a mutable reference cell. It has no locking. If two threads write to the same IORef, the result is non-atomic (though pointer writes themselves are atomic on the platforms GHC supports, so you won't see torn writes, just lost updates).

import Data.IORef

main :: IO ()
main = do
  ref <- newIORef (0 :: Int)
  writeIORef ref 42
  x <- readIORef ref
  print x  -- 42

Use IORef when you have a single owning thread or when you genuinely don't care about lost updates (e.g., a counter where ordering doesn't matter and you'll use atomicModifyIORef' for the increment).

import Data.IORef

bumpCounter :: IORef Int -> IO Int
bumpCounter ref = atomicModifyIORef' ref $ \n -> (n + 1, n + 1)

atomicModifyIORef' is a CAS loop under the hood. It's fine for hot counters but it doesn't give you the ability to wait for a value to change, you'd be polling.

The strict version (atomicModifyIORef' with the apostrophe) forces the new value before storing it. Without that, you build up thunks inside the IORef and eventually one read pays a huge cost or blows the stack. This is one of the most common space leaks in concurrent Haskell code, always reach for the strict version.

MVar: a box with locking

An MVar a is a one-slot mailbox. It's either empty or full. takeMVar blocks until it's full and then empties it. putMVar blocks until it's empty and then fills it. This gives you both communication and synchronization in one primitive.

import Control.Concurrent
import Control.Concurrent.MVar

main :: IO ()
main = do
  mv <- newEmptyMVar
  forkIO $ do
    threadDelay 500000
    putMVar mv "result from worker"
  result <- takeMVar mv
  putStrLn result

This is essentially a future. The main thread blocks on takeMVar until the worker thread fills the slot. No polling, no busy waiting, the runtime knows the thread is parked.

You can use MVar as a lock around shared state by storing the state inside it:

import Control.Concurrent.MVar

newtype Counter = Counter (MVar Int)

newCounter :: IO Counter
newCounter = Counter <$> newMVar 0

incr :: Counter -> IO Int
incr (Counter mv) = modifyMVar mv $ \n ->
  let n' = n + 1 in return (n', n')

modifyMVar is take, then your function, then put with the new value, in a bracket that puts the old value back if your function throws. This is the standard idiom.

The catch: MVar is a fair queue, so it's susceptible to deadlock the same way any lock is. If thread A holds MVar X and waits for Y, while thread B holds Y and waits for X, you deadlock. GHC's runtime can sometimes detect this and throw BlockedIndefinitelyOnMVar, but only when no other thread can possibly fill the slot. With multiple readers and writers, it can't tell.

When to use what

Reach for IORef when you have a single thread, or when the value is a counter or cache where atomic CAS is enough. The constant factor is tiny.

Reach for MVar when you need a thread to wait for a value or when you want to serialize access to a piece of state. The classic pattern is a worker thread that owns some resource (a database connection, a file handle) and other threads send it work via an MVar or Chan.

Reach for STM (covered next) when you need to coordinate multiple pieces of state atomically. Using MVar for that path leads to nested locks and deadlock.

Chan is worth a mention. It's an unbounded FIFO channel built on MVars. Useful for producer-consumer pipelines but the unboundedness can hide memory leaks if your producer outpaces your consumer. For bounded channels, look at TBQueue from STM or unagi-chan.

A practical example: a bounded worker pool

import Control.Concurrent
import Control.Concurrent.MVar
import Control.Monad

workerPool :: Int -> [IO ()] -> IO ()
workerPool n tasks = do
  taskMV <- newMVar tasks
  doneMVs <- replicateM n newEmptyMVar
  forM_ doneMVs $ \done -> forkIO $ do
    let loop = do
          next <- modifyMVar taskMV $ \case
            []     -> return ([], Nothing)
            (t:ts) -> return (ts, Just t)
          case next of
            Nothing -> putMVar done ()
            Just t  -> t >> loop
    loop
  forM_ doneMVs takeMVar

n workers pull from a shared list. The MVar around the list serializes access. When the list is empty, each worker signals completion via its own MVar. The main thread waits for all of them. This pattern works but you'd typically reach for async (next-but-one section) which handles exception propagation properly.

Common Pitfalls

Lazy values inside MVar and IORef. If you putMVar mv (x + 1) without forcing, you store a thunk. The next reader pays the cost, possibly years of accumulated thunks. Use modifyMVar' and atomicModifyIORef' (the strict variants) and add bang patterns to the values you store.

Forgetting -threaded. Without it, forkIO works but FFI calls block the entire runtime. If your program calls C code, every forkIO thread can grind to a halt waiting on a single FFI call. Compile with -threaded always for production code.

Main exiting before threads finish. forkIO is fire-and-forget, the main thread doesn't wait. Either use MVar to synchronize completion, or use the async library which handles this for you.

Deadlock from nested MVar takes. If you take MVar A and then try to take B, and another thread does the reverse, you deadlock. Always take MVars in a consistent order, or better, use STM to compose multiple updates atomically.

Key Takeaways

GHC's lightweight threads are cheap, you can spawn millions, and the scheduler handles them efficiently. Compile with -threaded and use +RTS -N to get parallelism.

IORef is for unsynchronized mutation, single-owner state, or atomic counters. Always use the strict variants (atomicModifyIORef').

MVar is a one-slot mailbox that doubles as a lock. It's the right tool when one thread waits for another, or when you need to serialize access to mutable state.

MVar does not compose. Two operations on different MVars are not atomic together. For that, you want STM.