Software Transactional Memory

STM is the feature that makes Haskell concurrency genuinely different. Locks don't compose, two correct lock-based operations combined into one operation can deadlock or violate invariants. STM solves this by making transactions composable. Two correct transactions combined are still correct.

The original STM paper came out of Microsoft Research in 2005 (Harris, Marlow, Peyton Jones, Herlihy). Twenty years later it's still the best implementation of TM in any mainstream language. Clojure has refs, JVM has experimental implementations, but neither matches GHC's STM in performance or expressiveness, largely because GHC's runtime makes the necessary tracking cheap.

The STM monad

STM lives in its own monad. You can't do arbitrary IO inside a transaction, only STM-specific operations. A transaction is run by atomically:

import Control.Concurrent.STM

transfer :: TVar Int -> TVar Int -> Int -> STM ()
transfer src dst amount = do
  s <- readTVar src
  writeTVar src (s - amount)
  d <- readTVar dst
  writeTVar dst (d + amount)

main :: IO ()
main = do
  alice <- atomically $ newTVar 100
  bob   <- atomically $ newTVar 50
  atomically $ transfer alice bob 30

TVar is a transactional variable. Reads and writes inside atomically are tracked. At commit time, the runtime checks whether any of the values you read have changed since you read them. If they have, the transaction restarts. If not, all your writes are committed atomically.

This is optimistic concurrency control with read-set tracking. It's the same model databases use, but in-process and very fast.

Why composition works

Suppose you have withdraw :: TVar Int -> Int -> STM () and deposit :: TVar Int -> Int -> STM (). With locks, you'd need to know the locking discipline of both, hold both locks in the right order, and not deadlock. With STM, you just sequence them:

transfer src dst amount = do
  withdraw src amount
  deposit dst amount

Both happen atomically together. There's no lock to hold, no order to get right. Either both succeed or neither does. This is the core argument for STM, you can't get this with locks.

retry: blocking on conditions

retry is what makes STM more than just optimistic locking. When a transaction calls retry, the runtime aborts it and parks the thread until one of the TVars the transaction read changes. Then it wakes the thread up and re-runs.

withdraw :: TVar Int -> Int -> STM ()
withdraw acc amount = do
  balance <- readTVar acc
  if balance < amount
    then retry
    else writeTVar acc (balance - amount)

If there's not enough money, the thread blocks. When someone deposits and the balance changes, the runtime re-runs the transaction. If it succeeds this time, the withdrawal completes. If not (still insufficient), it blocks again.

There's no condition variable, no manual signal/wait, no risk of missing a wakeup. The runtime tracks what you read and wakes you when it changes. This is dramatically simpler than the equivalent with MVar.

orElse: combining alternatives

orElse lets you say "try this, but if it would block, try that instead":

takeFromEither :: TVar [a] -> TVar [a] -> STM a
takeFromEither v1 v2 = takeFrom v1 `orElse` takeFrom v2
  where
    takeFrom v = do
      xs <- readTVar v
      case xs of
        []     -> retry
        (x:rest) -> do
          writeTVar v rest
          return x

If v1 is empty, the first branch retries. orElse catches the retry and tries the second branch. If that also retries, the whole transaction blocks on both TVars and wakes when either changes. You can't do this with locks at all, you'd need a custom condition variable for the disjunction.

The dining philosophers

This is the canonical concurrency exercise. Five philosophers, five forks, each needs both adjacent forks to eat. With locks, you have to be careful about the order in which philosophers grab forks or you deadlock (the classic solution: one philosopher grabs in the opposite order).

With STM, the problem evaporates:

import Control.Concurrent
import Control.Concurrent.STM
import Control.Monad

type Fork = TVar Bool  -- True = available

takeForks :: Fork -> Fork -> STM ()
takeForks f1 f2 = do
  a1 <- readTVar f1
  a2 <- readTVar f2
  if a1 && a2
    then do
      writeTVar f1 False
      writeTVar f2 False
    else retry

releaseForks :: Fork -> Fork -> STM ()
releaseForks f1 f2 = do
  writeTVar f1 True
  writeTVar f2 True

philosopher :: Int -> Fork -> Fork -> IO ()
philosopher i left right = forever $ do
  atomically $ takeForks left right
  putStrLn $ "Philosopher " ++ show i ++ " eating"
  threadDelay 100000
  atomically $ releaseForks left right
  threadDelay 100000

main :: IO ()
main = do
  forks <- replicateM 5 (atomically $ newTVar True)
  forM_ [0..4] $ \i ->
    forkIO $ philosopher i (forks !! i) (forks !! ((i+1) `mod` 5))
  threadDelay 5000000

No deadlock is possible. The transaction either gets both forks atomically or it retries until both are available. There's no order dependency, no asymmetric solution, just the natural specification.

Performance characteristics

STM is not free. Each TVar access goes through a transaction log. Conflicts cause restarts, which waste work. For uncontended workloads, STM has overhead in the 2-5x range compared to a raw IORef. For contended workloads, the picture is more complex, STM scales well when conflicts are rare but degrades when many transactions touch the same TVar.

Practical guidance from production systems:

Keep transactions short. The longer they run, the more likely they conflict.
Don't do IO inside atomically (the type system prevents it, but you can't even put IO actions to run after commit, you have to structure your code so the IO happens after atomically returns).
Avoid huge TVars. A TVar containing a giant Map will conflict on any update. Split it into smaller TVars if updates are independent.
For high-throughput queues, look at TBQueue (bounded), it's tuned for the producer-consumer pattern.

Facebook's Sigma uses STM at scale. Standard Chartered uses STM in their Mu trading systems. IOG uses STM extensively in Cardano's node software for managing the chain state. The performance is good enough for these systems, which makes it good enough for almost anything.

TMVar, TChan, TQueue

There's a transactional version of most concurrency primitives:

TMVar is MVar but transactional. Useful when you want MVar semantics composable with other STM ops.
TChan is an unbounded channel.
TBQueue is a bounded queue, what you usually want for backpressure.
TArray is a transactional mutable array.

import Control.Concurrent.STM
import Control.Concurrent.STM.TBQueue

worker :: TBQueue Job -> IO ()
worker queue = forever $ do
  job <- atomically $ readTBQueue queue
  process job

producer :: TBQueue Job -> Job -> IO ()
producer queue job = atomically $ writeTBQueue queue job

If the queue is full, writeTBQueue retries until there's space. If empty, readTBQueue retries until there's something to read. Both operations block in the right way for free.

Common Pitfalls

Doing too much inside atomically. The transaction is replayed on conflict, so if you compute a 10MB value inside, you re-compute it on every conflict. Compute outside, write inside.

Lazy values in TVars. Same problem as IORef, thunks accumulate. Use strict data types and force values before writing.

Starvation under high contention. A long transaction can be repeatedly aborted by short ones. There's no priority system, you have to keep transactions short or accept that long ones may take a while.

Trying to do IO inside atomically. The type system prevents this, but newcomers sometimes try unsafeIOToSTM. Don't. The transaction may be replayed, your IO will run multiple times, and your code will eventually break in production.

Using STM where IORef would do. If you have a single counter and no need for composition, atomicModifyIORef' is dramatically faster. STM's value is composition, not raw speed.

Key Takeaways

STM gives you composable atomic transactions. Two correct transactions combine into a correct transaction, which is something locks can't deliver.

retry blocks until one of the read TVars changes. orElse tries an alternative if the first would retry. Together these make conditional waiting trivial.

The dining philosophers problem dissolves under STM, you specify the requirement (need both forks) and the runtime handles the coordination.

Performance is good but not free. Keep transactions short, split large TVars, use the right primitive (TBQueue for producer-consumer), and use IORef when you don't need composition.