5 min read
On this page

Files and Network

The textbook IO examples — getLine, putStrLn — are fine for hello-world, but the moment you write anything real you need files, HTTP, JSON, and bytes. Haskell's standard library handles the basics; the Hackage ecosystem covers everything else. This page is about the working set you actually use.

File handles vs convenience functions

System.IO exposes a Unix-flavoured handle API:

import System.IO

main :: IO ()
main = do
  h <- openFile "data.txt" ReadMode
  contents <- hGetContents h
  putStr contents
  hClose h

That code looks reasonable. It is also wrong, in a couple of ways that production code has learned to avoid:

  1. hGetContents is lazy. It does not read the whole file. It reads as you consume the result. If you forget to consume everything before hClose, you get partial data or surprising behaviour.
  2. If putStr throws, hClose never runs and the handle leaks.

Both problems are fixed by using strict reads and bracket for safety.

Strict reads

import qualified Data.Text as T
import qualified Data.Text.IO as TIO

readWholeFile :: FilePath -> IO T.Text
readWholeFile = TIO.readFile

Data.Text.IO.readFile reads the entire file into a Text value strictly. This is what you want 95% of the time. For binary, Data.ByteString.readFile. Both close the handle automatically.

For writing:

TIO.writeFile  :: FilePath -> T.Text -> IO ()
TIO.appendFile :: FilePath -> T.Text -> IO ()

These exist for ByteString too.

When you need bracket

For anything where you open a resource and must close it — a file handle, a database connection, a socket — use bracket from Control.Exception:

import Control.Exception (bracket)
import System.IO

withLogFile :: FilePath -> (Handle -> IO a) -> IO a
withLogFile path = bracket (openFile path AppendMode) hClose

bracket acquire release use guarantees release runs even if use throws an exception or is killed by an async exception. This is the only safe pattern for resource handling. Library functions like withFile, withBinaryFile, and hSetBuffering exist precisely so you do not write bracket calls by hand for common cases:

import System.IO

logLine :: FilePath -> String -> IO ()
logLine path msg =
  withFile path AppendMode $ \h -> do
    hPutStrLn h msg

withFile is bracket with openFile/hClose baked in. Reach for it before bracket whenever it fits.

String, Text, ByteString — pick correctly

Three string-shaped types, each with a different job.

Type When
String ([Char]) Examples, throwaway scripts, error messages from old APIs
Text Anything that is human-readable text in production
ByteString Raw bytes — files, network protocols, binary formats

Text comes in Data.Text (strict) and Data.Text.Lazy (lazy). Default to strict. Same for ByteString.

Conversions:

import qualified Data.Text as T
import qualified Data.Text.Encoding as TE
import qualified Data.ByteString as BS

T.pack       :: String -> T.Text
T.unpack     :: T.Text -> String
TE.encodeUtf8 :: T.Text -> BS.ByteString
TE.decodeUtf8 :: BS.ByteString -> T.Text   -- throws on invalid UTF-8

For network or binary, you start in ByteString, decode to Text for textual processing, then operate in Text-land. The OverloadedStrings extension lets string literals become Text or ByteString directly:

{-# LANGUAGE OverloadedStrings #-}

greeting :: T.Text
greeting = "hello"   -- no T.pack needed

Most production codebases turn this on globally.

HTTP

Two libraries dominate.

http-client (with http-client-tls for HTTPS)

The lower-level option, used directly when you want explicit control:

{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Client
import Network.HTTP.Client.TLS (tlsManagerSettings)
import qualified Data.ByteString.Lazy.Char8 as L8

fetch :: String -> IO L8.ByteString
fetch url = do
  manager <- newManager tlsManagerSettings
  req <- parseRequest url
  resp <- httpLbs req manager
  pure (responseBody resp)

You create a Manager once (it pools connections) and reuse it. Calling newManager on every request is a common mistake that destroys throughput.

req

Higher-level, type-safer, easier:

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DataKinds #-}
import Network.HTTP.Req
import Data.Aeson (FromJSON)

getUser :: Int -> IO (JsonResponse User)
getUser uid = runReq defaultHttpConfig $
  req GET (https "api.example.com" /: "users" /: T.pack (show uid))
      NoReqBody jsonResponse mempty

req builds the URL with combinators, picks the right body and response handlers based on type, and produces a typed result. For most application code, prefer it over http-client directly.

wreq is a third option, popular for ad-hoc scripts. It uses lens heavily, which some teams love and others avoid.

JSON with Aeson

aeson is the default JSON library. You parse and serialize through FromJSON and ToJSON typeclasses:

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
import Data.Aeson
import GHC.Generics

data User = User
  { userId    :: Int
  , userName  :: T.Text
  , userEmail :: T.Text
  } deriving (Show, Generic)

instance FromJSON User
instance ToJSON User

Generic derivation gives you JSON encoding for free if your field names match the JSON keys. They almost never do — JSON usually wants "id", your record field is userId. Customize with deriveJSON or write the instance by hand:

instance FromJSON User where
  parseJSON = withObject "User" $ \o ->
    User <$> o .: "id"
         <*> o .: "name"
         <*> o .: "email"

That <$>/<*> shape is the applicative idiom from the previous topic — parseJSON returns a Parser, which is an Applicative.

For options on naming, deriveJSON from Data.Aeson.TH accepts a field-name modifier:

{-# LANGUAGE TemplateHaskell #-}
import Data.Aeson.TH

$(deriveJSON defaultOptions { fieldLabelModifier = drop 4 } ''User)
-- "userId" -> "Id" -> "id" with toLower applied separately

In practice, most projects converge on deriveGeneric plus a small Options record that strips a prefix and lowercases the first letter.

Decoding from a response body

import qualified Data.ByteString.Lazy as BL
import Data.Aeson (eitherDecode)

parseUsers :: BL.ByteString -> Either String [User]
parseUsers = eitherDecode

eitherDecode returns Left on parse error with a (sometimes terse) message. decode returns Maybe, throwing away the reason. Always prefer eitherDecode in real code.

Putting it together

A small "fetch and decode" function:

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}

import Network.HTTP.Client
import Network.HTTP.Client.TLS
import Data.Aeson
import GHC.Generics
import qualified Data.ByteString.Lazy as BL

data Repo = Repo
  { repoName     :: String
  , stargazers   :: Int
  } deriving (Show, Generic)

instance FromJSON Repo where
  parseJSON = withObject "Repo" $ \o ->
    Repo <$> o .: "name"
         <*> o .: "stargazers_count"

fetchRepo :: Manager -> String -> IO (Either String Repo)
fetchRepo mgr fullName = do
  req <- parseRequest ("https://api.github.com/repos/" ++ fullName)
  let req' = req { requestHeaders = [("User-Agent", "haskell")] }
  resp <- httpLbs req' mgr
  pure (eitherDecode (responseBody resp))

main :: IO ()
main = do
  mgr <- newManager tlsManagerSettings
  result <- fetchRepo mgr "haskell/cabal"
  case result of
    Left err -> putStrLn ("decode failed: " ++ err)
    Right r  -> print r

Note the shape:

  • Manager is created once.
  • The HTTP request returns a lazy bytestring.
  • Aeson decodes; failures come back as Left String.
  • The function returns IO (Either String Repo) — IO because the network call happens, Either for the decode failure that is not a true exception.

That last point is a recurring pattern in production Haskell: network errors are exceptions (the http-client family throws on connection failures), but business-level errors like "the JSON did not parse" are Either. The next topic covers when each is appropriate.

Streaming for large data

Reading a 50 GB log file with readFile is a bad idea — it loads the whole thing. Use a streaming library:

  • conduit — Michael Snoyman's library, used by Yesod, persistent, amazonka.
  • pipes — Gabriel Gonzalez's library, smaller core, similar capability.
  • streamly — newer, focuses on performance and concurrency.

A conduit example, reading lines and counting non-empty ones:

import Conduit

countNonEmpty :: FilePath -> IO Int
countNonEmpty path = runConduitRes $
       sourceFile path
    .| decodeUtf8C
    .| linesUnboundedC
    .| filterC (not . T.null)
    .| lengthC

This processes the file in constant memory regardless of size. For anything large or unbounded (logs, network streams), reach for streaming.

Common pitfalls

Using hGetContents and closing the handle too early. Lazy IO and explicit handle management do not mix. Either go strict (Data.Text.IO.readFile) or stream (conduit). Avoid lazy hGetContents in production.

Creating an HTTP Manager per request. Connection pooling is the entire point of Manager. Create one at startup, share it through your application.

Ignoring HTTP error status codes. http-client by default throws on non-2xx responses. If you want to handle them yourself, set checkResponse = \_ _ -> pure () on the request.

Using String for binary data. String is [Char], where Char is a Unicode code point. Putting raw bytes in a String corrupts them silently. Use ByteString.

Trusting T.decodeUtf8 on untrusted input. It throws on invalid UTF-8. Use decodeUtf8' (returns Either) or decodeUtf8Lenient (replaces bad bytes) when you do not control the source.

Forgetting OverloadedStrings. Without it, "hello" :: Text is a type error. Most projects turn it on globally.

Reading the whole file when streaming would do. A request handler that reads a 100 MB upload into memory is a memory-spike waiting to happen. conduit or streamly keeps it bounded.

Key takeaways

  • Data.Text.IO and Data.ByteString for strict reads and writes; withFile for resource-safe handle work.
  • bracket is the universal pattern for "acquire, use, release"; withFile/withConnection etc. are convenience wrappers.
  • Text for human-readable text, ByteString for bytes, String only for examples and error messages.
  • http-client (with tlsManagerSettings) for low-level HTTP; req for typed, ergonomic requests. Reuse the Manager.
  • aeson is the JSON standard. Use eitherDecode not decode so you keep the parse error.
  • For files larger than memory, use conduit, pipes, or streamly to keep memory bounded.
  • Push business-level failure into Either and let library exceptions stay exceptions — the next topic explores why.