Lists and List Comprehensions

Haskell lists are singly linked lists. That sentence is the most important thing to remember about them. Every textbook example uses lists, every tutorial uses lists, and the standard library's Prelude is built around them. So beginners reach for lists for everything, including problems that scream for Vector, Seq, or Map. The fix isn't to avoid lists. It's to know what they are and where they break down.

A list is either empty ([]) or a head element prepended to another list (x : xs). That's it. Indexing is O(n). Length is O(n). Random access is O(n). Concatenating to the back is O(n). Concatenating to the front is O(1). This shapes everything that follows.

Constructing Lists

The literal syntax:

xs :: [Int]
xs = [1, 2, 3, 4, 5]

ys :: [String]
ys = ["alice", "bob", "carol"]

empty :: [Int]
empty = []

Internally, [1, 2, 3] is 1 : 2 : 3 : []. The : operator (called "cons") prepends an element:

zs :: [Int]
zs = 0 : [1, 2, 3]   -- [0, 1, 2, 3]

-- Equivalent
zs = 0 : 1 : 2 : 3 : []

Pattern matching on lists uses the same operator:

describe :: [Int] -> String
describe []        = "empty"
describe [x]       = "one element: " ++ show x
describe (x:_:_)   = "starts with " ++ show x ++ " and has more"

The [x] pattern is sugar for x : []. The (x:_:_) pattern matches a list with at least two elements, binding the first to x and ignoring the second and the rest.

Ranges

Haskell has built-in syntax for arithmetic sequences:

oneToTen :: [Int]
oneToTen = [1..10]                   -- [1,2,3,4,5,6,7,8,9,10]

evens :: [Int]
evens = [2, 4..20]                   -- [2,4,6,8,10,12,14,16,18,20]

countDown :: [Int]
countDown = [10, 9..1]               -- [10,9,8,7,6,5,4,3,2,1]

letters :: [Char]
letters = ['a'..'z']                 -- "abcdefghijklmnopqrstuvwxyz"

infinite :: [Int]
infinite = [1..]                     -- lazy, never finishes if forced fully

[1..] is the kind of thing that breaks intuition coming from strict languages. It produces a list that conceptually has infinite length, but because Haskell is lazy, you can take 10 [1..] and only the first ten get computed. More on laziness in topic 06.

Common List Functions

You'll use these constantly:

head :: [a] -> a              -- first element; crashes on []
tail :: [a] -> [a]            -- everything but the first; crashes on []
last :: [a] -> a              -- final element; crashes on []
init :: [a] -> [a]            -- everything but the last; crashes on []

length :: [a] -> Int          -- O(n), forces the spine
null   :: [a] -> Bool         -- O(1), preferable to length xs == 0

take :: Int -> [a] -> [a]     -- first n elements (or fewer)
drop :: Int -> [a] -> [a]     -- skip first n
splitAt :: Int -> [a] -> ([a], [a])

reverse :: [a] -> [a]         -- O(n)
(++) :: [a] -> [a] -> [a]     -- O(n) in the LEFT list
elem :: Eq a => a -> [a] -> Bool   -- O(n) membership

head and tail are partial: they crash on empty input. Modern Haskell style prefers pattern matching or the safe variants from Data.Maybe (listToMaybe) and safe package (headMay):

firstOrDefault :: a -> [a] -> a
firstOrDefault d []    = d
firstOrDefault _ (x:_) = x

-- Or with Maybe
firstMaybe :: [a] -> Maybe a
firstMaybe []    = Nothing
firstMaybe (x:_) = Just x

GHC will warn you about partial functions if you turn on -Wall, which you should.

List Comprehensions

List comprehensions look like set-builder notation from math class:

squares :: [Int]
squares = [x * x | x <- [1..10]]
-- [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

evenSquares :: [Int]
evenSquares = [x * x | x <- [1..10], even x]
-- [4, 16, 36, 64, 100]

The pieces:

Generator (x <- xs): pulls elements from a source list.
Guard (even x): a Bool filter.
Output expression (x * x): what to produce per element.

You can have multiple generators and guards:

pairs :: [(Int, Int)]
pairs = [(x, y) | x <- [1..3], y <- [1..3], x /= y]
-- [(1,2),(1,3),(2,1),(2,3),(3,1),(3,2)]

Multiple generators behave like nested loops, with the rightmost generator varying fastest. This is exactly how SQL JOIN reads if you squint.

A common pattern: pythagorean triples up to some bound:

triples :: Int -> [(Int, Int, Int)]
triples n =
  [ (a, b, c)
  | c <- [1..n]
  , b <- [1..c]
  , a <- [1..b]
  , a * a + b * b == c * c
  ]

You can also bind values inside the comprehension with let:

squareDistances :: [(Double, Double)] -> [Double]
squareDistances pts =
  [ d
  | (x, y) <- pts
  , let d = x * x + y * y
  , d < 100
  ]

When NOT to Use Lists

This is where most Haskell code goes wrong. Lists feel like the universal collection, but they're optimized for sequential access from the front and nothing else.

Reach for Data.Vector when:

You need O(1) random access.
You're doing numeric work or anything that benefits from contiguous memory.
You're processing fixed-size data and want predictable performance.

Reach for Data.Sequence when:

You need fast access to both ends.
You're building a queue or deque.
You're concatenating frequently in the middle of a workflow.

Reach for Data.Map or Data.HashMap.Strict when:

You need key-based lookup. (elem and lookup on lists are O(n).)
You're tracking counts, configurations, or any keyed collection.

Reach for Data.Set when:

You need membership tests on a collection that's not just being walked once.

The classic mistake is nub :: Eq a => [a] -> [a], which removes duplicates from a list in O(n²). For anything more than a few hundred elements, convert to a Set and back:

import qualified Data.Set as Set

dedup :: Ord a => [a] -> [a]
dedup = Set.toList . Set.fromList

This is O(n log n) and won't bite you when the input grows.

Mercury, IOG, and most production Haskell shops use Vector and Map heavily. Lists show up for streaming, recursive algorithms, and IO-bounded work where the list itself is consumed lazily.

Strings Are Lists

String is [Char]. This is a famous wart. It means string operations have all the inefficiencies of lists:

type String = [Char]

-- These are equivalent
greeting :: String
greeting = "hello"

greeting' :: [Char]
greeting' = ['h', 'e', 'l', 'l', 'o']

For real string work, use Data.Text (UTF-16 internally, the standard for most apps) or Data.ByteString (raw bytes, for IO and binary protocols). The OverloadedStrings extension lets you write string literals that resolve to Text or ByteString:

{-# LANGUAGE OverloadedStrings #-}

import Data.Text (Text)
import qualified Data.Text as T

name :: Text
name = "Alice"

shout :: Text -> Text
shout = T.toUpper

Use String for tutorial code, error messages, file paths from the standard library, and small fixed text. Use Text for everything else.

A Realistic Example

Suppose you're processing log entries. The naive list-based version:

import Data.List (sort, group)

data LogEntry = LogEntry { user :: String, action :: String }

-- Most active users
topUsers :: Int -> [LogEntry] -> [(String, Int)]
topUsers n entries =
  take n
  . reverse
  . sort
  . map (\xs -> (head xs, length xs))
  . group
  . sort
  . map user
  $ entries

This works for 10,000 entries. For 10 million, the sort allocates a giant list, group produces another giant list of lists, and length walks each group. Replace with Map.fromListWith:

import qualified Data.Map.Strict as Map
import Data.List (sortBy)
import Data.Ord (comparing, Down(..))

topUsers :: Int -> [LogEntry] -> [(String, Int)]
topUsers n entries =
    take n
  . sortBy (comparing (Down . snd))
  . Map.toList
  $ Map.fromListWith (+) [(user e, 1) | e <- entries]

The list comprehension stays — it's just data transformation. The aggregation moves to a Map, where it belongs.

Common Pitfalls

Using length xs == 0 instead of null xs. length walks the entire list. null checks the first cons cell. For finite small lists it doesn't matter; for long lists or in tight loops, it does. For infinite lists, length never returns.

xs !! n for indexing. It works, but it's O(n) and partial (crashes on out-of-bounds). If you find yourself writing this, you want a Vector.

Repeated ++ on the left. (a ++ b) ++ c walks a twice. a ++ (b ++ c) walks a once. ++ is right-associative, so the second form is what you get by default — but if you're building a result with foldl (++) [] you'll get the bad version. Use foldr (++) [] or, better, concat.

nub on large lists. O(n²). Use Set-based deduplication.

Treating String as if it were Text. A character-by-character traversal of a 10MB log file in String form will allocate gigabytes of cons cells. Use Text or ByteString.

Key Takeaways

Lists are linked lists, which means O(1) at the front and O(n) everywhere else. List comprehensions give you set-builder notation: generators, guards, and let-bindings combine into compact data transformations. Ranges like [1..n] and [1..] are lazy, so infinite lists are fine as long as you don't force the whole spine. Reach for Vector, Seq, Map, or Text when the access pattern doesn't fit a singly linked list. The Haskell ecosystem has these for a reason.