5 min read
On this page

Databases

The Haskell database library landscape is fragmented in a way that surprises newcomers. There's no Active Record, no single SQLAlchemy. Instead there are five viable libraries, each with a different philosophy. The choice matters because switching later is painful.

The shortlist:

  • postgresql-simple: thin wrapper around libpq. Write SQL strings, parse results into Haskell types. Lowest abstraction, highest control.
  • persistent: schema-first ORM with a Template Haskell DSL. Type-safe queries via selectList and friends.
  • beam: Haskell-embedded query DSL. SQL queries written as Haskell code, type-checked end to end.
  • opaleye: arrow-based query DSL with SQL-like semantics. Fewer users than beam but well regarded.
  • esqueleto: builds on persistent's models but adds proper SQL queries with joins, subqueries, etc.

postgresql-simple: the practical default

For most projects, postgresql-simple is the right choice. You write SQL, you get back Haskell values, the types of parameters and results are checked at the boundary. No magic.

import Database.PostgreSQL.Simple

data User = User
  { userId :: Int
  , email  :: Text
  , name   :: Text
  } deriving Show

instance FromRow User where
  fromRow = User <$> field <*> field <*> field

getUserByEmail :: Connection -> Text -> IO (Maybe User)
getUserByEmail conn em = do
  rs <- query conn
    "SELECT id, email, name FROM users WHERE email = ?"
    (Only em)
  case rs of
    [u] -> return (Just u)
    _   -> return Nothing

The ? is parameter placeholder, the values come from the tuple in the second argument. You can't accidentally interpolate user input into the SQL, the API requires parameters.

Inserts and updates work the same way:

createUser :: Connection -> Text -> Text -> IO Int
createUser conn em nm = do
  [Only uid] <- query conn
    "INSERT INTO users (email, name) VALUES (?, ?) RETURNING id"
    (em, nm)
  return uid

The library also has executeMany for batch inserts (much faster than one-at-a-time for large loads) and fold for streaming over large result sets without loading everything into memory.

This is the library Mercury uses for most of their database access. The argument is simple: SQL is what your DBA reviews, what your monitoring tools understand, what shows up in slow query logs. Hiding it behind layers of abstraction is a net loss when something goes wrong.

persistent and Esqueleto

persistent defines your schema in a TH-quoted DSL:

{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persistLowerCase|
User
  email Text
  name  Text
  UniqueEmail email
  deriving Show
Post
  title  Text
  body   Text
  author UserId
  deriving Show
|]

This generates Haskell types (User, Post), a migration function, and a query API. Simple queries are easy:

import Database.Persist
import Database.Persist.Sql

getUserByEmail :: Text -> SqlPersistT IO (Maybe (Entity User))
getUserByEmail em = getBy (UniqueEmail em)

allUsersByName :: SqlPersistT IO [Entity User]
allUsersByName = selectList [] [Asc UserName]

The downside: persistent's query API can't express joins, group by, or anything beyond simple selectList-style queries. For real SQL, you bring in esqueleto:

import qualified Database.Esqueleto.Experimental as E

postsWithAuthors :: SqlPersistT IO [(E.Entity Post, E.Entity User)]
postsWithAuthors = E.select $ do
  (p E.:& u) <- E.from $ E.table @Post
    `E.innerJoin` E.table @User
    `E.on` (\(p E.:& u) -> p E.^. PostAuthor E.==. u E.^. UserId)
  return (p, u)

This is type-checked, you can't reference a column that doesn't exist on the table you're joining. The cost is that the syntax is heavier than raw SQL.

Yesod uses persistent by default. If you're using Yesod, this is the path of least resistance. Otherwise, weigh it against postgresql-simple.

beam and opaleye

beam lets you write queries as Haskell:

allUsers :: Q PgQueryBuilder MyDb s (UserT (QExpr PgExpressionSyntax s))
allUsers = all_ (users myDb)

usersWithEmail :: Text -> Q ...
usersWithEmail em = filter_ (\u -> _userEmail u ==. val_ em) allUsers

It's expressive, type-safe, and the queries are fully checked against your schema. The cost is a steeper learning curve and slower compile times. IOG uses beam in some Cardano-adjacent tooling.

opaleye uses arrow notation and has a smaller community. Both beam and opaleye are valid choices for projects where you want maximum type safety in queries.

Schema migrations

This is where Haskell ORMs are weaker than Rails or Django. There's no canonical migration tool. Options:

  1. Hand-written SQL migrations, tracked in a migrations/ folder, applied with a tool like sqitch or dbmate. This is what most production Haskell apps do.
  2. persistent's built-in migration via runMigration migrateAll. Convenient for prototypes, but it's destructive in some cases (it'll drop columns) and not flexible for complex changes.
  3. The postgresql-migration package for application-level migrations driven by SQL files.

For production, hand-written SQL with a tracking table is the safest. You see exactly what runs, you can hand-review every migration, and rolling back is straightforward.

-- migrations/001_create_users.sql
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email TEXT NOT NULL UNIQUE,
  name TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Apply with dbmate up or your tool of choice. Track applied migrations in a schema_migrations table.

Connection pooling

You don't open a fresh database connection per request. You use a pool. The standard library is resource-pool (often via Data.Pool):

import Data.Pool
import Database.PostgreSQL.Simple

makePool :: ConnectInfo -> IO (Pool Connection)
makePool ci = newPool $ defaultPoolConfig
  (connect ci)      -- create
  close             -- destroy
  60                -- keepalive seconds
  10                -- max connections

withDb :: Pool Connection -> (Connection -> IO a) -> IO a
withDb pool action = withResource pool action

withResource checks out a connection, runs your action, and returns the connection to the pool (or destroys it if the action threw). The pool size should match your database's max_connections setting divided by the number of app instances. For Postgres, 10-20 per instance is typical.

persistent and beam integrate with resource-pool and provide their own pool helpers.

Transactions

In postgresql-simple:

import Database.PostgreSQL.Simple
import Database.PostgreSQL.Simple.Transaction

transferMoney :: Connection -> Int -> Int -> Int -> IO ()
transferMoney conn from to amount = withTransaction conn $ do
  execute conn
    "UPDATE accounts SET balance = balance - ? WHERE id = ?"
    (amount, from)
  execute conn
    "UPDATE accounts SET balance = balance + ? WHERE id = ?"
    (amount, to)

withTransaction does BEGIN, runs your action, COMMITs on success, ROLLBACKs on exception. There are also withTransactionLevel for explicit isolation levels and withSavepoint for nested transactions.

A common mistake: holding a transaction open while doing slow non-DB work (HTTP calls, heavy computation). Postgres locks held in a transaction block other writers. Keep transactions short, do slow work outside.

A complete service slice

module Db where

import Data.Pool
import Data.Text (Text)
import Database.PostgreSQL.Simple
import Database.PostgreSQL.Simple.Transaction

data User = User { userId :: Int, email :: Text, name :: Text }
  deriving Show

instance FromRow User where
  fromRow = User <$> field <*> field <*> field

getUser :: Pool Connection -> Int -> IO (Maybe User)
getUser pool uid = withResource pool $ \c -> do
  rs <- query c "SELECT id, email, name FROM users WHERE id = ?" (Only uid)
  return $ case rs of
    [u] -> Just u
    _   -> Nothing

createUserWithProfile
  :: Pool Connection -> Text -> Text -> Text -> IO Int
createUserWithProfile pool em nm bio = withResource pool $ \c ->
  withTransaction c $ do
    [Only uid] <- query c
      "INSERT INTO users (email, name) VALUES (?, ?) RETURNING id"
      (em, nm)
    _ <- execute c
      "INSERT INTO profiles (user_id, bio) VALUES (?, ?)"
      (uid, bio)
    return uid

Two operations, one transaction, one connection from the pool. This is the typical shape.

Common Pitfalls

Lazy field reads from query results. postgresql-simple returns Text and ByteString that are evaluated when forced, but if you store them in a long-lived structure without forcing, the connection's buffer might be reused. Force values you intend to keep (use strict fields in your record types).

Long transactions. As mentioned, transactions hold locks. A 30-second transaction with HTTP calls inside is a common cause of database stalls. Move side effects outside the transaction.

Connection leaks from withResource exceptions. The library handles this correctly, but if you forkIO inside withResource, the spawned thread won't have access to the connection after the outer block returns. Don't share connections across threads.

Picking the wrong library too early. The libraries are not cleanly interchangeable. Try postgresql-simple first, switch to persistent or beam only when the project's query patterns clearly call for them.

Schema migrations via persistent's auto-migrate in production. It can drop columns silently. Use hand-written SQL migrations for anything you care about.

Key Takeaways

postgresql-simple is the practical default. SQL strings, type-checked params and results, no surprises.

persistent plus esqueleto is the right choice if you're using Yesod or want code-generated models. Pure persistent is too limiting for non-trivial queries.

beam and opaleye give you fully type-safe queries written in Haskell. The cost is compile time and a learning curve. Worth it for some projects, overkill for many.

Migrations are not solved at the library level. Hand-written SQL files plus a tracker (dbmate, sqitch) is the production path.

Pool connections with resource-pool, keep transactions short, force values you intend to outlive the connection.