Files and Network
The textbook IO examples — getLine, putStrLn — are fine for hello-world, but the moment you write anything real you need files, HTTP, JSON, and bytes. Haskell's standard library handles the basics; the Hackage ecosystem covers everything else. This page is about the working set you actually use.
File handles vs convenience functions
System.IO exposes a Unix-flavoured handle API:
import System.IO
main :: IO ()
main = do
h <- openFile "data.txt" ReadMode
contents <- hGetContents h
putStr contents
hClose h
That code looks reasonable. It is also wrong, in a couple of ways that production code has learned to avoid:
hGetContentsis lazy. It does not read the whole file. It reads as you consume the result. If you forget to consume everything beforehClose, you get partial data or surprising behaviour.- If
putStrthrows,hClosenever runs and the handle leaks.
Both problems are fixed by using strict reads and bracket for safety.
Strict reads
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
readWholeFile :: FilePath -> IO T.Text
readWholeFile = TIO.readFile
Data.Text.IO.readFile reads the entire file into a Text value strictly. This is what you want 95% of the time. For binary, Data.ByteString.readFile. Both close the handle automatically.
For writing:
TIO.writeFile :: FilePath -> T.Text -> IO ()
TIO.appendFile :: FilePath -> T.Text -> IO ()
These exist for ByteString too.
When you need bracket
For anything where you open a resource and must close it — a file handle, a database connection, a socket — use bracket from Control.Exception:
import Control.Exception (bracket)
import System.IO
withLogFile :: FilePath -> (Handle -> IO a) -> IO a
withLogFile path = bracket (openFile path AppendMode) hClose
bracket acquire release use guarantees release runs even if use throws an exception or is killed by an async exception. This is the only safe pattern for resource handling. Library functions like withFile, withBinaryFile, and hSetBuffering exist precisely so you do not write bracket calls by hand for common cases:
import System.IO
logLine :: FilePath -> String -> IO ()
logLine path msg =
withFile path AppendMode $ \h -> do
hPutStrLn h msg
withFile is bracket with openFile/hClose baked in. Reach for it before bracket whenever it fits.
String, Text, ByteString — pick correctly
Three string-shaped types, each with a different job.
| Type | When |
|---|---|
String ([Char]) |
Examples, throwaway scripts, error messages from old APIs |
Text |
Anything that is human-readable text in production |
ByteString |
Raw bytes — files, network protocols, binary formats |
Text comes in Data.Text (strict) and Data.Text.Lazy (lazy). Default to strict. Same for ByteString.
Conversions:
import qualified Data.Text as T
import qualified Data.Text.Encoding as TE
import qualified Data.ByteString as BS
T.pack :: String -> T.Text
T.unpack :: T.Text -> String
TE.encodeUtf8 :: T.Text -> BS.ByteString
TE.decodeUtf8 :: BS.ByteString -> T.Text -- throws on invalid UTF-8
For network or binary, you start in ByteString, decode to Text for textual processing, then operate in Text-land. The OverloadedStrings extension lets string literals become Text or ByteString directly:
{-# LANGUAGE OverloadedStrings #-}
greeting :: T.Text
greeting = "hello" -- no T.pack needed
Most production codebases turn this on globally.
HTTP
Two libraries dominate.
http-client (with http-client-tls for HTTPS)
The lower-level option, used directly when you want explicit control:
{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Client
import Network.HTTP.Client.TLS (tlsManagerSettings)
import qualified Data.ByteString.Lazy.Char8 as L8
fetch :: String -> IO L8.ByteString
fetch url = do
manager <- newManager tlsManagerSettings
req <- parseRequest url
resp <- httpLbs req manager
pure (responseBody resp)
You create a Manager once (it pools connections) and reuse it. Calling newManager on every request is a common mistake that destroys throughput.
req
Higher-level, type-safer, easier:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DataKinds #-}
import Network.HTTP.Req
import Data.Aeson (FromJSON)
getUser :: Int -> IO (JsonResponse User)
getUser uid = runReq defaultHttpConfig $
req GET (https "api.example.com" /: "users" /: T.pack (show uid))
NoReqBody jsonResponse mempty
req builds the URL with combinators, picks the right body and response handlers based on type, and produces a typed result. For most application code, prefer it over http-client directly.
wreq is a third option, popular for ad-hoc scripts. It uses lens heavily, which some teams love and others avoid.
JSON with Aeson
aeson is the default JSON library. You parse and serialize through FromJSON and ToJSON typeclasses:
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
import Data.Aeson
import GHC.Generics
data User = User
{ userId :: Int
, userName :: T.Text
, userEmail :: T.Text
} deriving (Show, Generic)
instance FromJSON User
instance ToJSON User
Generic derivation gives you JSON encoding for free if your field names match the JSON keys. They almost never do — JSON usually wants "id", your record field is userId. Customize with deriveJSON or write the instance by hand:
instance FromJSON User where
parseJSON = withObject "User" $ \o ->
User <$> o .: "id"
<*> o .: "name"
<*> o .: "email"
That <$>/<*> shape is the applicative idiom from the previous topic — parseJSON returns a Parser, which is an Applicative.
For options on naming, deriveJSON from Data.Aeson.TH accepts a field-name modifier:
{-# LANGUAGE TemplateHaskell #-}
import Data.Aeson.TH
$(deriveJSON defaultOptions { fieldLabelModifier = drop 4 } ''User)
-- "userId" -> "Id" -> "id" with toLower applied separately
In practice, most projects converge on deriveGeneric plus a small Options record that strips a prefix and lowercases the first letter.
Decoding from a response body
import qualified Data.ByteString.Lazy as BL
import Data.Aeson (eitherDecode)
parseUsers :: BL.ByteString -> Either String [User]
parseUsers = eitherDecode
eitherDecode returns Left on parse error with a (sometimes terse) message. decode returns Maybe, throwing away the reason. Always prefer eitherDecode in real code.
Putting it together
A small "fetch and decode" function:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}
import Network.HTTP.Client
import Network.HTTP.Client.TLS
import Data.Aeson
import GHC.Generics
import qualified Data.ByteString.Lazy as BL
data Repo = Repo
{ repoName :: String
, stargazers :: Int
} deriving (Show, Generic)
instance FromJSON Repo where
parseJSON = withObject "Repo" $ \o ->
Repo <$> o .: "name"
<*> o .: "stargazers_count"
fetchRepo :: Manager -> String -> IO (Either String Repo)
fetchRepo mgr fullName = do
req <- parseRequest ("https://api.github.com/repos/" ++ fullName)
let req' = req { requestHeaders = [("User-Agent", "haskell")] }
resp <- httpLbs req' mgr
pure (eitherDecode (responseBody resp))
main :: IO ()
main = do
mgr <- newManager tlsManagerSettings
result <- fetchRepo mgr "haskell/cabal"
case result of
Left err -> putStrLn ("decode failed: " ++ err)
Right r -> print r
Note the shape:
Manageris created once.- The HTTP request returns a lazy bytestring.
- Aeson decodes; failures come back as
Left String. - The function returns
IO (Either String Repo)— IO because the network call happens,Eitherfor the decode failure that is not a true exception.
That last point is a recurring pattern in production Haskell: network errors are exceptions (the http-client family throws on connection failures), but business-level errors like "the JSON did not parse" are Either. The next topic covers when each is appropriate.
Streaming for large data
Reading a 50 GB log file with readFile is a bad idea — it loads the whole thing. Use a streaming library:
conduit— Michael Snoyman's library, used by Yesod, persistent, amazonka.pipes— Gabriel Gonzalez's library, smaller core, similar capability.streamly— newer, focuses on performance and concurrency.
A conduit example, reading lines and counting non-empty ones:
import Conduit
countNonEmpty :: FilePath -> IO Int
countNonEmpty path = runConduitRes $
sourceFile path
.| decodeUtf8C
.| linesUnboundedC
.| filterC (not . T.null)
.| lengthC
This processes the file in constant memory regardless of size. For anything large or unbounded (logs, network streams), reach for streaming.
Common pitfalls
Using hGetContents and closing the handle too early. Lazy IO and explicit handle management do not mix. Either go strict (Data.Text.IO.readFile) or stream (conduit). Avoid lazy hGetContents in production.
Creating an HTTP Manager per request. Connection pooling is the entire point of Manager. Create one at startup, share it through your application.
Ignoring HTTP error status codes. http-client by default throws on non-2xx responses. If you want to handle them yourself, set checkResponse = \_ _ -> pure () on the request.
Using String for binary data. String is [Char], where Char is a Unicode code point. Putting raw bytes in a String corrupts them silently. Use ByteString.
Trusting T.decodeUtf8 on untrusted input. It throws on invalid UTF-8. Use decodeUtf8' (returns Either) or decodeUtf8Lenient (replaces bad bytes) when you do not control the source.
Forgetting OverloadedStrings. Without it, "hello" :: Text is a type error. Most projects turn it on globally.
Reading the whole file when streaming would do. A request handler that reads a 100 MB upload into memory is a memory-spike waiting to happen. conduit or streamly keeps it bounded.
Key takeaways
Data.Text.IOandData.ByteStringfor strict reads and writes;withFilefor resource-safe handle work.bracketis the universal pattern for "acquire, use, release";withFile/withConnectionetc. are convenience wrappers.Textfor human-readable text,ByteStringfor bytes,Stringonly for examples and error messages.http-client(withtlsManagerSettings) for low-level HTTP;reqfor typed, ergonomic requests. Reuse theManager.aesonis the JSON standard. UseeitherDecodenotdecodeso you keep the parse error.- For files larger than memory, use
conduit,pipes, orstreamlyto keep memory bounded. - Push business-level failure into
Eitherand let library exceptions stay exceptions — the next topic explores why.