7 min read
On this page

Algebraic Data Types

Algebraic data types are the workhorse of Haskell. The phrase sounds intimidating but the idea is simple: you build new types by combining existing ones with two operations — "or" and "and." A sum type expresses "this or that." A product type expresses "this and that." Most domain modeling in Haskell is one of these, often nested.

Mastering ADTs is the single highest-leverage skill in Haskell. The reason Haskell programs feel like they "just work" once they compile is that ADTs make illegal states unrepresentable. You can't accidentally have a user with no email and no phone number when your type says you must have one or the other. The compiler refuses.

The data Keyword

You introduce a new type with data. The simplest form looks like this:

data Bool = True | False

That's the actual definition of Bool from the standard library. There's nothing special about it — you could define it yourself in a file and use it.

The pieces:

  • data introduces a new type.
  • Bool is the type name.
  • True and False are data constructors — values of type Bool.
  • The | separates alternatives. This is a sum type with two cases.

A slightly richer example:

data Direction = North | South | East | West
    deriving (Show, Eq)

Now Direction has four possible values and we've asked GHC to derive Show (so we can print it) and Eq (so we can compare it). Without deriving Show you can't even type North in GHCi — there's no way to display it.

Sum Types

A sum type has multiple constructors, each potentially carrying different data. The classic example is Maybe:

data Maybe a = Nothing | Just a

This is parameterized — a is a type variable, so Maybe Int, Maybe String, Maybe (Maybe Bool) are all valid types. Nothing is a value of any Maybe a. Just is a constructor that takes one argument and produces a Maybe a.

ghci> Just 42 :: Maybe Int
Just 42
ghci> Nothing :: Maybe String
Nothing
ghci> :type Just
Just :: a -> Maybe a

Note that Just is a function from a to Maybe a. Constructors are functions you can pass around like any other.

Either is the other ubiquitous sum type:

data Either a b = Left a | Right b

Two type parameters. By convention Left carries an error and Right carries success — "right" as in correct. This is how you express functions that can fail with information:

parseInt :: String -> Either String Int
parseInt s = case reads s of
    [(n, "")] -> Right n
    _         -> Left ("not a number: " ++ s)

ghci> parseInt "42"
Right 42
ghci> parseInt "abc"
Left "not a number: abc"

You define your own sum types when modeling a domain with several distinct cases. A common pattern:

data PaymentMethod
    = CreditCard CardNumber CVV ExpiryDate
    | BankTransfer AccountNumber RoutingNumber
    | PayPal EmailAddress
    | ApplePay DeviceId
    deriving (Show, Eq)

Each case carries the fields that case actually needs. There's no "PayPal payment with a credit card number" — the type makes that representable as a compile error rather than a runtime check. Compare to a typical OO approach with a base class and optional fields, which lets you create nonsense at runtime.

Product Types and Records

Product types combine multiple values. The simplest form uses positional fields:

data Point = Point Double Double
    deriving (Show, Eq)

origin :: Point
origin = Point 0.0 0.0

Point here is both the type name and the constructor name. They live in different namespaces (types vs values) so there's no conflict. This is conventional when there's only one constructor.

For more than two or three fields, use record syntax:

data User = User
    { userName  :: String
    , userEmail :: String
    , userAge   :: Int
    }
    deriving (Show, Eq)

ada :: User
ada = User { userName = "Ada Lovelace", userEmail = "ada@example.com", userAge = 36 }

Record syntax gives you free accessor functions. userName ada returns "Ada Lovelace". It also gives you a record-update syntax:

olderAda :: User
olderAda = ada { userAge = 37 }
-- a new User with everything copied except age, which is 37

The original ada is unchanged because Haskell is immutable. olderAda is a new value with the updated field.

A modern wrinkle: GHC's automatic field accessors create top-level functions that can collide if two records share field names. The DuplicateRecordFields and OverloadedRecordDot extensions handle this, and Mercury and similar production codebases use them heavily. The convention used to be prefixing fields like userName, userEmail to avoid collisions; with OverloadedRecordDot, you can write user.name like in most other languages.

{-# LANGUAGE OverloadedRecordDot #-}

data User = User { name :: String, email :: String }
data Company = Company { name :: String, address :: String }

-- with OverloadedRecordDot, no collision:
greet :: User -> String
greet u = "Hello, " ++ u.name

Combining Sum and Product

Real domain types usually combine both. Each constructor of a sum type can carry product-type data. This is where the modeling power shines.

data Shape
    = Circle    { radius :: Double }
    | Rectangle { width  :: Double, height :: Double }
    | Triangle  { base   :: Double, triHeight :: Double }
    deriving (Show, Eq)

area :: Shape -> Double
area (Circle r)        = pi * r * r
area (Rectangle w h)   = w * h
area (Triangle b h)    = 0.5 * b * h

You can pattern-match on each constructor and the compiler ensures you handle every case (with -Wall on, missing cases produce a warning). Adding a new shape and forgetting to update area becomes a compile-time problem, not a runtime one.

A classic real-world example — modeling JSON:

data JSON
    = JNull
    | JBool   Bool
    | JNumber Double
    | JString String
    | JArray  [JSON]
    | JObject [(String, JSON)]
    deriving (Show, Eq)

This is a recursive type — JArray and JObject contain more JSON values. Recursive types are how you express trees, lists, and other nested structures. Haskell's pattern matching plus recursion make these natural to work with.

prettyPrint :: JSON -> String
prettyPrint JNull         = "null"
prettyPrint (JBool True)  = "true"
prettyPrint (JBool False) = "false"
prettyPrint (JNumber n)   = show n
prettyPrint (JString s)   = "\"" ++ s ++ "\""
prettyPrint (JArray xs)   = "[" ++ intercalate "," (map prettyPrint xs) ++ "]"
prettyPrint (JObject kvs) = "{" ++ intercalate "," (map kv kvs) ++ "}"
  where
    kv (k, v) = "\"" ++ k ++ "\":" ++ prettyPrint v

This is recursive, exhaustive, and the compiler tells you if you miss a case. Try writing this in Java without a visitor pattern.

Parameterized Types

Types can take type parameters, the way Maybe a and Either a b do. You define them by adding type variables in the declaration:

data Tree a
    = Leaf
    | Node (Tree a) a (Tree a)
    deriving (Show, Eq)

This is a binary tree holding values of any type a. Tree Int, Tree String, Tree (Maybe User) — all valid. The type parameter is a placeholder that gets filled in at use site.

example :: Tree Int
example = Node (Node Leaf 1 Leaf) 2 (Node Leaf 3 Leaf)

depth :: Tree a -> Int
depth Leaf         = 0
depth (Node l _ r) = 1 + max (depth l) (depth r)

Note the underscore in Node l _ r — we don't care about the value, just the children. The wildcard pattern matches anything.

The deriving Mechanism

deriving asks GHC to generate boilerplate instances for common typeclasses automatically. The most useful ones:

  • Show — generates show, which gives you a string representation. Used for printing and debugging.
  • Eq — generates == and /=, structural equality.
  • Ord — generates compare, <, >, etc., based on constructor order.
  • Read — generates read, the inverse of show. Rarely used.
  • Enum, Bounded — for types that form a sequence, like Direction.
  • Generic — used by libraries like aeson (JSON) for free serialization.
{-# LANGUAGE DeriveGeneric #-}

import GHC.Generics (Generic)
import Data.Aeson (FromJSON, ToJSON)

data User = User { name :: String, email :: String }
    deriving (Show, Eq, Generic)

instance ToJSON User
instance FromJSON User

That's all you need to serialize and deserialize User to and from JSON. The Generic instance plus the empty ToJSON/FromJSON declarations let aeson figure out the rest from the type structure. This is the kind of leverage that ADTs give you.

For more control, GHC supports DerivingStrategies and DerivingVia, which let you pick how an instance is derived. These show up in serious codebases but you don't need them on day one.

The Smart Constructor Pattern

When a type has invariants the data constructor can't enforce — say, a non-negative integer or a non-empty list — the smart constructor pattern hides the data constructor and exposes a function that validates input.

module Age (Age, mkAge, getAge) where

newtype Age = Age Int

mkAge :: Int -> Maybe Age
mkAge n
    | n < 0     = Nothing
    | n > 150   = Nothing
    | otherwise = Just (Age n)

getAge :: Age -> Int
getAge (Age n) = n

By exporting only Age (the type), mkAge, and getAge — but not the Age constructor itself — you guarantee that every Age in your program was validated. Anywhere you see a value of type Age, you know the invariant holds. This is a deeper guarantee than runtime checks, and it's free at runtime because newtype is erased at compile time.

Common Pitfalls

Forgetting to derive Show. New ADT, can't print it in GHCi, error message is cryptic. Always deriving Show when prototyping.

Overusing records when sum types fit better. A record with five optional fields where only certain combinations are valid is begging to be a sum type. If you find yourself writing if user.method == "credit_card" then ... else ..., model it as a sum.

Mixing concerns in one ADT. A User type with database concerns, presentation concerns, and authentication concerns all jammed in is harder to maintain than three separate types. ADTs are cheap; make new ones.

Using String fields for things that should be types. email :: String lets any string in. email :: EmailAddress (with a smart constructor) means every EmailAddress in the codebase has been validated.

Reaching for deriving Generic and JSON instances by default. These are convenient but couple your wire format to your internal type. For public APIs, define a separate type with explicit ToJSON/FromJSON so you can change the internal representation without breaking clients.

Key Takeaways

Algebraic data types are sums (alternatives, with |) and products (collections of fields). Sum types model "this or that" — Maybe, Either, custom domain types like PaymentMethod. Product types model "this and that" — records with fields. They combine freely; recursive ADTs give you trees and other nested structures. deriving generates instances for Show, Eq, Ord, Generic, and others automatically. Smart constructors enforce invariants at the type level. Modeling your domain with ADTs is the single biggest source of Haskell's correctness payoff.