Databases
The Haskell database library landscape is fragmented in a way that surprises newcomers. There's no Active Record, no single SQLAlchemy. Instead there are five viable libraries, each with a different philosophy. The choice matters because switching later is painful.
The shortlist:
postgresql-simple: thin wrapper around libpq. Write SQL strings, parse results into Haskell types. Lowest abstraction, highest control.persistent: schema-first ORM with a Template Haskell DSL. Type-safe queries viaselectListand friends.beam: Haskell-embedded query DSL. SQL queries written as Haskell code, type-checked end to end.opaleye: arrow-based query DSL with SQL-like semantics. Fewer users than beam but well regarded.esqueleto: builds on persistent's models but adds proper SQL queries with joins, subqueries, etc.
postgresql-simple: the practical default
For most projects, postgresql-simple is the right choice. You write SQL, you get back Haskell values, the types of parameters and results are checked at the boundary. No magic.
import Database.PostgreSQL.Simple
data User = User
{ userId :: Int
, email :: Text
, name :: Text
} deriving Show
instance FromRow User where
fromRow = User <$> field <*> field <*> field
getUserByEmail :: Connection -> Text -> IO (Maybe User)
getUserByEmail conn em = do
rs <- query conn
"SELECT id, email, name FROM users WHERE email = ?"
(Only em)
case rs of
[u] -> return (Just u)
_ -> return Nothing
The ? is parameter placeholder, the values come from the tuple in the second argument. You can't accidentally interpolate user input into the SQL, the API requires parameters.
Inserts and updates work the same way:
createUser :: Connection -> Text -> Text -> IO Int
createUser conn em nm = do
[Only uid] <- query conn
"INSERT INTO users (email, name) VALUES (?, ?) RETURNING id"
(em, nm)
return uid
The library also has executeMany for batch inserts (much faster than one-at-a-time for large loads) and fold for streaming over large result sets without loading everything into memory.
This is the library Mercury uses for most of their database access. The argument is simple: SQL is what your DBA reviews, what your monitoring tools understand, what shows up in slow query logs. Hiding it behind layers of abstraction is a net loss when something goes wrong.
persistent and Esqueleto
persistent defines your schema in a TH-quoted DSL:
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persistLowerCase|
User
email Text
name Text
UniqueEmail email
deriving Show
Post
title Text
body Text
author UserId
deriving Show
|]
This generates Haskell types (User, Post), a migration function, and a query API. Simple queries are easy:
import Database.Persist
import Database.Persist.Sql
getUserByEmail :: Text -> SqlPersistT IO (Maybe (Entity User))
getUserByEmail em = getBy (UniqueEmail em)
allUsersByName :: SqlPersistT IO [Entity User]
allUsersByName = selectList [] [Asc UserName]
The downside: persistent's query API can't express joins, group by, or anything beyond simple selectList-style queries. For real SQL, you bring in esqueleto:
import qualified Database.Esqueleto.Experimental as E
postsWithAuthors :: SqlPersistT IO [(E.Entity Post, E.Entity User)]
postsWithAuthors = E.select $ do
(p E.:& u) <- E.from $ E.table @Post
`E.innerJoin` E.table @User
`E.on` (\(p E.:& u) -> p E.^. PostAuthor E.==. u E.^. UserId)
return (p, u)
This is type-checked, you can't reference a column that doesn't exist on the table you're joining. The cost is that the syntax is heavier than raw SQL.
Yesod uses persistent by default. If you're using Yesod, this is the path of least resistance. Otherwise, weigh it against postgresql-simple.
beam and opaleye
beam lets you write queries as Haskell:
allUsers :: Q PgQueryBuilder MyDb s (UserT (QExpr PgExpressionSyntax s))
allUsers = all_ (users myDb)
usersWithEmail :: Text -> Q ...
usersWithEmail em = filter_ (\u -> _userEmail u ==. val_ em) allUsers
It's expressive, type-safe, and the queries are fully checked against your schema. The cost is a steeper learning curve and slower compile times. IOG uses beam in some Cardano-adjacent tooling.
opaleye uses arrow notation and has a smaller community. Both beam and opaleye are valid choices for projects where you want maximum type safety in queries.
Schema migrations
This is where Haskell ORMs are weaker than Rails or Django. There's no canonical migration tool. Options:
- Hand-written SQL migrations, tracked in a
migrations/folder, applied with a tool likesqitchordbmate. This is what most production Haskell apps do. persistent's built-in migration viarunMigration migrateAll. Convenient for prototypes, but it's destructive in some cases (it'll drop columns) and not flexible for complex changes.- The
postgresql-migrationpackage for application-level migrations driven by SQL files.
For production, hand-written SQL with a tracking table is the safest. You see exactly what runs, you can hand-review every migration, and rolling back is straightforward.
-- migrations/001_create_users.sql
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Apply with dbmate up or your tool of choice. Track applied migrations in a schema_migrations table.
Connection pooling
You don't open a fresh database connection per request. You use a pool. The standard library is resource-pool (often via Data.Pool):
import Data.Pool
import Database.PostgreSQL.Simple
makePool :: ConnectInfo -> IO (Pool Connection)
makePool ci = newPool $ defaultPoolConfig
(connect ci) -- create
close -- destroy
60 -- keepalive seconds
10 -- max connections
withDb :: Pool Connection -> (Connection -> IO a) -> IO a
withDb pool action = withResource pool action
withResource checks out a connection, runs your action, and returns the connection to the pool (or destroys it if the action threw). The pool size should match your database's max_connections setting divided by the number of app instances. For Postgres, 10-20 per instance is typical.
persistent and beam integrate with resource-pool and provide their own pool helpers.
Transactions
In postgresql-simple:
import Database.PostgreSQL.Simple
import Database.PostgreSQL.Simple.Transaction
transferMoney :: Connection -> Int -> Int -> Int -> IO ()
transferMoney conn from to amount = withTransaction conn $ do
execute conn
"UPDATE accounts SET balance = balance - ? WHERE id = ?"
(amount, from)
execute conn
"UPDATE accounts SET balance = balance + ? WHERE id = ?"
(amount, to)
withTransaction does BEGIN, runs your action, COMMITs on success, ROLLBACKs on exception. There are also withTransactionLevel for explicit isolation levels and withSavepoint for nested transactions.
A common mistake: holding a transaction open while doing slow non-DB work (HTTP calls, heavy computation). Postgres locks held in a transaction block other writers. Keep transactions short, do slow work outside.
A complete service slice
module Db where
import Data.Pool
import Data.Text (Text)
import Database.PostgreSQL.Simple
import Database.PostgreSQL.Simple.Transaction
data User = User { userId :: Int, email :: Text, name :: Text }
deriving Show
instance FromRow User where
fromRow = User <$> field <*> field <*> field
getUser :: Pool Connection -> Int -> IO (Maybe User)
getUser pool uid = withResource pool $ \c -> do
rs <- query c "SELECT id, email, name FROM users WHERE id = ?" (Only uid)
return $ case rs of
[u] -> Just u
_ -> Nothing
createUserWithProfile
:: Pool Connection -> Text -> Text -> Text -> IO Int
createUserWithProfile pool em nm bio = withResource pool $ \c ->
withTransaction c $ do
[Only uid] <- query c
"INSERT INTO users (email, name) VALUES (?, ?) RETURNING id"
(em, nm)
_ <- execute c
"INSERT INTO profiles (user_id, bio) VALUES (?, ?)"
(uid, bio)
return uid
Two operations, one transaction, one connection from the pool. This is the typical shape.
Common Pitfalls
Lazy field reads from query results. postgresql-simple returns Text and ByteString that are evaluated when forced, but if you store them in a long-lived structure without forcing, the connection's buffer might be reused. Force values you intend to keep (use strict fields in your record types).
Long transactions. As mentioned, transactions hold locks. A 30-second transaction with HTTP calls inside is a common cause of database stalls. Move side effects outside the transaction.
Connection leaks from withResource exceptions. The library handles this correctly, but if you forkIO inside withResource, the spawned thread won't have access to the connection after the outer block returns. Don't share connections across threads.
Picking the wrong library too early. The libraries are not cleanly interchangeable. Try postgresql-simple first, switch to persistent or beam only when the project's query patterns clearly call for them.
Schema migrations via persistent's auto-migrate in production. It can drop columns silently. Use hand-written SQL migrations for anything you care about.
Key Takeaways
postgresql-simple is the practical default. SQL strings, type-checked params and results, no surprises.
persistent plus esqueleto is the right choice if you're using Yesod or want code-generated models. Pure persistent is too limiting for non-trivial queries.
beam and opaleye give you fully type-safe queries written in Haskell. The cost is compile time and a learning curve. Worth it for some projects, overkill for many.
Migrations are not solved at the library level. Hand-written SQL files plus a tracker (dbmate, sqitch) is the production path.
Pool connections with resource-pool, keep transactions short, force values you intend to outlive the connection.