Testing Haskell

Testing in Haskell has one feature that the rest of the industry envies: property-based testing via QuickCheck. The idea is older than the language's mainstream adoption (the original paper is from 2000) and it's been copied into roughly every modern language: Hypothesis for Python, fast-check for JavaScript, proptest for Rust. The Haskell version is still the reference implementation, and using it well is one of the highest-leverage skills you can develop.

For ordinary unit tests, the ecosystem has converged on hspec and tasty. For modern property testing, hedgehog is replacing QuickCheck in many projects.

HSpec for behavior tests

HSpec gives you the RSpec-style describe/it structure familiar from Ruby:

import Test.Hspec

spec :: Spec
spec = do
  describe "reverse" $ do
    it "reverses an empty list" $
      reverse ([] :: [Int]) `shouldBe` []

    it "is its own inverse" $
      reverse (reverse [1,2,3]) `shouldBe` [1,2,3]

    it "reverses a singleton" $
      reverse [42] `shouldBe` [42]

main :: IO ()
main = hspec spec

shouldBe, shouldThrow, shouldSatisfy, and friends are the assertion vocabulary. There's before and after for setup and teardown, around for resource bracketing.

For testing IO code:

import Test.Hspec
import System.IO.Temp

spec :: Spec
spec = around (withSystemTempFile "test") $ do
  it "writes and reads" $ \(path, _) -> do
    writeFile path "hello"
    contents <- readFile path
    contents `shouldBe` "hello"

around runs setup that produces a value passed to each test, with cleanup guaranteed even on failure.

QuickCheck: the killer feature

Instead of writing input/output pairs, you state a property and let QuickCheck generate inputs:

import Test.QuickCheck

prop_reverseInverse :: [Int] -> Bool
prop_reverseInverse xs = reverse (reverse xs) == xs

prop_reverseLength :: [Int] -> Bool
prop_reverseLength xs = length (reverse xs) == length xs

Run with quickCheck prop_reverseInverse. By default it generates 100 random inputs and reports the first failure (with shrinking, so you get the smallest counterexample).

The power shows up when the property is non-obvious:

import Data.List (sort)

prop_sortIsIdempotent :: [Int] -> Bool
prop_sortIsIdempotent xs = sort (sort xs) == sort xs

prop_sortPreservesLength :: [Int] -> Bool
prop_sortPreservesLength xs = length (sort xs) == length xs

prop_sortPreservesElements :: [Int] -> Bool
prop_sortPreservesElements xs = sort xs `elem` permutations xs

(The last one is too slow for real use, just illustrative.)

Real bugs found by QuickCheck:

Off-by-one errors in pagination (property: concat (paginate n xs) == xs)
Round-trip serialization bugs (property: decode (encode x) == Just x)
State machine inconsistencies (model the system, generate sequences of operations, check observed state matches model)

Galois (the security-focused consultancy) uses QuickCheck extensively for testing cryptographic implementations. The classic example is testing AES against a reference implementation, generate random keys and plaintexts, check that both implementations agree.

Generators and Arbitrary

For your own types, you write a generator:

import Test.QuickCheck

data Tree a = Leaf | Node (Tree a) a (Tree a)
  deriving Show

instance Arbitrary a => Arbitrary (Tree a) where
  arbitrary = sized go
    where
      go 0 = return Leaf
      go n = frequency
        [ (1, return Leaf)
        , (n, do
            l <- go (n `div` 2)
            x <- arbitrary
            r <- go (n `div` 2)
            return (Node l x r))
        ]

The sized combinator gives you a size parameter that QuickCheck shrinks; using it correctly avoids infinite-tree generation.

Shrinking

When QuickCheck finds a failing input, it shrinks it to a minimal counterexample. For a list, it tries shorter lists. For an integer, smaller magnitudes. For your own types, you provide a shrink function (or rely on the generic deriving).

This is the part that makes property testing practical. Without shrinking, a failure on a list of 47 randomly-generated elements is hard to debug. With shrinking, you usually get a 2- or 3-element counterexample.

Hedgehog: the modern alternative

Hedgehog rethinks property testing with a few important differences:

Generators carry their own shrinking. You write the generator, you get shrinking automatically that respects the generator's invariants. (QuickCheck's shrinking sometimes produces invalid inputs.)
Properties are tested with explicit randomness control, every failure is reproducible.
The API is more uniform.

import Hedgehog
import qualified Hedgehog.Gen as Gen
import qualified Hedgehog.Range as Range

prop_reverse :: Property
prop_reverse = property $ do
  xs <- forAll $ Gen.list (Range.linear 0 100) (Gen.int (Range.linear 0 1000))
  reverse (reverse xs) === xs

The integration with tasty (tasty-hedgehog) makes Hedgehog work in the same suite as your other tests:

import Test.Tasty
import Test.Tasty.Hedgehog

main :: IO ()
main = defaultMain $ testGroup "all"
  [ testProperty "reverse is involutive" prop_reverse
  ]

For new projects in 2026, Hedgehog is increasingly the default. QuickCheck is still everywhere and isn't going away, but for a fresh codebase, Hedgehog's better shrinking is hard to argue with.

Golden tests with tasty-golden

When you have a function that produces text or binary output and you want to track changes, golden tests are the right tool. The first run creates a "golden" file. Subsequent runs compare output to the golden file:

import Test.Tasty
import Test.Tasty.Golden

main :: IO ()
main = defaultMain $ testGroup "golden"
  [ goldenVsString "render homepage"
      "test/golden/homepage.html"
      (renderHomepage <$> loadFixture)
  ]

If the output changes, you see a diff. If the change is intentional, you delete the golden file and run again to regenerate. This is great for testing renderers, code generators, JSON serialization formats, anything where the output is the spec.

Doctests

Comments that double as tests:

-- | Returns the factorial of a non-negative integer.
--
-- >>> factorial 5
-- 120
-- >>> factorial 0
-- 1
factorial :: Int -> Int
factorial 0 = 1
factorial n = n * factorial (n - 1)

Run with the doctest tool. The example outputs are checked against actual evaluation. This is excellent for documentation that can't go stale, the build fails if the doc claims something untrue.

The downside: doctests are slow because they spin up the interpreter. Use them for documentation, not for the bulk of your test suite.

Testing IO code

The naive approach is to use mocks. Haskell offers a better path: parameterize your code over an effect interface and run it against a pure implementation in tests.

class Monad m => MonadDB m where
  fetchUser :: Int -> m (Maybe User)
  saveUser :: User -> m ()

instance MonadDB IO where
  fetchUser uid = -- real implementation
  saveUser u = -- real implementation

newtype TestDB a = TestDB (State (Map Int User) a)
  deriving (Functor, Applicative, Monad, MonadState (Map Int User))

instance MonadDB TestDB where
  fetchUser uid = gets (Map.lookup uid)
  saveUser u = modify (Map.insert (userId u) u)

The same business logic runs against IO in production and TestDB in tests. No mock library, no global mutable state. The test version is just a pure function in State.

This pattern shows up in production code at Standard Chartered, Mercury, and Anduril. The exact mechanism varies (mtl-style classes, free monads, effect systems like polysemy or effectful), but the principle is the same, the effect interface is a type class or interpreter and the test version is pure.

Putting it together with tasty

tasty is the meta-framework that runs HSpec, QuickCheck, Hedgehog, golden tests, and HUnit suites under one runner with shared options:

import Test.Tasty
import Test.Tasty.HUnit
import Test.Tasty.QuickCheck
import Test.Tasty.Hedgehog

main :: IO ()
main = defaultMain $ testGroup "all tests"
  [ testGroup "unit tests"
    [ testCase "trivial" $ 1 + 1 @?= 2 ]
  , testGroup "properties"
    [ testProperty "reverse involutive" $
        \xs -> reverse (reverse xs) == (xs :: [Int])
    ]
  ]

You get parallel test execution, filtering by name, and consistent output formatting.

Common Pitfalls

Treating property tests as unit tests. A property is a universal claim, "for all inputs, this holds." Writing prop_xPlusYIsZero x = x + 0 == x only checks one half of the addition. Think about what you're claiming, not what you're computing.

Generators that don't shrink. If your Arbitrary instance generates random data without a corresponding shrink, failure messages are huge. Always provide shrinkers, or use Hedgehog (which handles this automatically).

Property tests that always pass because the precondition fails. prop_x x = x > 0 ==> ... will appear to pass if your generator rarely produces positive values. QuickCheck has discard-counting; pay attention to "discarded" stats. Better, use a generator that produces only valid inputs.

IO tests that share state. Two tests both modifying /tmp/foo will race. Use withSystemTempFile or per-test temp directories.

Mocking everything. The Haskell idiom is to abstract over effects and provide pure interpretations, not to mock with library magic. If you find yourself reaching for a mock library, your code probably wants restructuring.

Key Takeaways

HSpec is the standard for behavior-style tests. tasty ties everything together if you mix frameworks.

Property-based testing is Haskell's signature testing technique. QuickCheck is everywhere; Hedgehog is the modern choice for new projects. Use it for round-trips, invariants, and anything you can express as "for all inputs."

Golden tests via tasty-golden are the right tool for output-shaped tests (renderers, serializers, code generators).

Doctests keep examples in your documentation honest. Use them for docs, not as your main test suite.

Test IO code by abstracting effects and providing pure interpretations. This is more work upfront than mocking, but the tests are faster, more deterministic, and force better separation.