8 min read
On this page

Message Passing

Processes are useless without communication. Message passing is how Elixir processes coordinate, share results, and react to each other's lifecycles. The primitives are simple — send/2, receive, Process.link/1, Process.monitor/1 — but the patterns they enable scale all the way from a quick Task to a fault-tolerant supervision tree.

send/2 and receive

You send a message to a process by its pid:

send(pid, {:hello, "world"})

send/2 is non-blocking. It deposits the message in the target process's mailbox and returns immediately. The message can be any Elixir term — atoms, tuples, structs, functions, anything that can be copied across heaps.

The receiver pulls messages out of its mailbox with receive:

receive do
  {:hello, name} -> IO.puts("Hello, #{name}!")
end

receive blocks until a matching message arrives. If the mailbox already has a matching message, receive returns immediately. If not, the process is parked and woken when one arrives.

The basic round-trip pattern looks like this:

defmodule Echo do
  def loop do
    receive do
      {:echo, from, message} ->
        send(from, {:echoed, message})
        loop()
    end
  end
end

echo = spawn(&Echo.loop/0)
send(echo, {:echo, self(), "hi"})

receive do
  {:echoed, msg} -> IO.puts(msg)
end
# hi

Note the convention of including from in the message. Processes do not automatically know who sent them a message — send/2 is fire-and-forget. If you want a reply, you have to include your pid in the request.

The Mailbox

Each process has a mailbox: an unbounded FIFO queue of messages waiting to be received. Messages stay in the mailbox until receive consumes them. There is no limit other than available memory.

This is where one of the worst classes of Elixir bugs hides: an unbounded mailbox. If a process receives messages faster than it processes them, the mailbox grows without limit, memory usage climbs, and eventually the node falls over. Tools like Process.info(pid, :message_queue_len) and :observer show mailbox sizes in production, and any process whose mailbox is consistently growing is a bug to investigate.

GenServer's handle_call is synchronous specifically to give backpressure — the caller waits for the callee, so a slow callee is felt by the caller rather than silently piling up.

Selective Receive

receive does not have to take the first message in the mailbox. It can pattern match and skip non-matching messages.

receive do
  {:priority, msg} -> handle_priority(msg)
  {:normal, msg} -> handle_normal(msg)
end

If the mailbox has {:normal, "first"} followed by {:priority, "urgent"}, the priority message is taken first because it matches the first clause. The normal message stays in the mailbox until a later receive picks it up.

This is called selective receive, and it is powerful but dangerous. If a message in your mailbox never matches any clause, it stays there forever, scanning over it on every receive call. A mailbox full of 10,000 messages that never match makes every receive slow because the BEAM scans them all looking for matches.

The fix is to either add a wildcard clause that drains unknown messages, or to be careful that messages always eventually match.

receive do
  {:expected, msg} -> handle(msg)
  other ->
    Logger.warning("unexpected message: #{inspect(other)}")
    # discard, do not loop back into receive
end

GenServer handles this discipline for you — handle_info runs on any message that does not match a known call/cast, so unexpected messages do not pile up.

receive with after

You can give receive a timeout:

receive do
  {:reply, value} -> value
after
  5_000 -> :timeout
end

After 5 seconds with no matching message, the after clause runs. This is how you avoid blocking forever on a reply that may never come. GenServer.call/3 uses this internally with a default timeout of 5 seconds (which is famously the source of "GenServer call timeout" errors when a server is overloaded).

after 0 is a non-blocking poll — return immediately if there is no matching message. Useful for draining a mailbox without blocking.

A link is a bidirectional connection between two processes. If one of them crashes, the other crashes too — by default, with the same exit reason.

spawn_link(fn ->
  raise "boom"
end)
# The current process crashes when the spawned one does

spawn_link/1 is the linked version of spawn/1. You can also link an existing process with Process.link/1.

Links are how supervisors work. A supervisor links itself to its children. When a child crashes, the link triggers, the supervisor receives an exit signal, and (because supervisors trap exits) it runs its restart strategy instead of dying.

Without trapping, links are loud and fatal. That is on purpose — the default behavior is "if I depend on a process that died, I should also die so my supervisor can restart me." This is the let-it-crash philosophy in concrete form.

A process can opt into receiving exit signals as messages instead of dying:

Process.flag(:trap_exit, true)

spawn_link(fn ->
  raise "boom"
end)

receive do
  {:EXIT, pid, reason} ->
    IO.inspect(reason, label: "child died")
end

With trap_exit on, exits become normal messages of the form {:EXIT, pid, reason}. Supervisors do this. Most other processes should not — trapping exits everywhere subverts the let-it-crash model.

Monitors: Observation Without Coupling

A monitor is a one-way, non-fatal version of a link. You ask to be notified when a process dies, but its death does not affect you.

ref = Process.monitor(pid)

receive do
  {:DOWN, ^ref, :process, ^pid, reason} ->
    IO.inspect(reason, label: "monitored process died")
end

Process.monitor/1 returns a reference. When the monitored process dies, the monitor receives a {:DOWN, ref, :process, pid, reason} message. That is it. No exit signal, no propagation, no need to trap exits.

Monitors are what you reach for when one process needs to know about another's death without sharing its fate. Examples:

  • A connection pool monitors borrowed connections so it knows when to reclaim a slot.
  • A registry monitors registered processes so it can clean up dead entries.
  • A GenServer.call/3 monitors the callee so it can fail fast if the callee dies during the call.

Monitors are also automatically cleaned up when the monitoring process dies, so you do not leak references.

A rough guide:

Use a link when the two processes are part of the same logical unit and one cannot meaningfully continue without the other. A worker process linked to its coordinator, a connection process linked to its parser. If one dies, the other should die too so the supervisor can restart the unit.

Use a monitor when you need to know about a death but you can handle it gracefully and continue. A pool noticing one of its workers died, a server tracking which clients are still connected, code that wants to detect timeouts.

If you find yourself trapping exits to handle a link gracefully, that is usually a sign you wanted a monitor instead. Trapping exits inside a process that is not a supervisor is rare and almost always a smell.

Spawn Variants

A few spawning functions worth knowing:

  • spawn/1 and spawn/3: plain spawn, no link, no monitor.
  • spawn_link/1 and spawn_link/3: spawn with a link.
  • spawn_monitor/1 and spawn_monitor/3: spawn with a monitor, returns {pid, ref}.

In real code, you rarely call these directly. You use Task.async/1, GenServer.start_link/2, Supervisor.start_link/2, or DynamicSupervisor.start_child/2, all of which wrap these primitives with sensible defaults and OTP integration.

A Worked Example: Request With Timeout

Putting it all together — a process that asks another for a value, with a timeout and proper cleanup:

defmodule Asker do
  def request(server, payload, timeout \\ 5_000) do
    ref = Process.monitor(server)
    send(server, {:request, self(), ref, payload})

    receive do
      {:response, ^ref, value} ->
        Process.demonitor(ref, [:flush])
        {:ok, value}

      {:DOWN, ^ref, :process, _, reason} ->
        {:error, {:server_died, reason}}
    after
      timeout ->
        Process.demonitor(ref, [:flush])
        {:error, :timeout}
    end
  end
end

This is essentially what GenServer.call/3 does internally. Monitor the callee, send the request with a unique ref, wait for a tagged response, give up cleanly on timeout, detect callee death distinctly from timeout. The Process.demonitor(ref, [:flush]) cleans up the monitor and discards any {:DOWN, ...} message that may have arrived while we were processing the response.

Common Pitfalls

Forgetting to include self() or a reply ref in messages. Receivers do not know who sent them a message. If you want a reply, the sender must include its pid (and usually a unique ref so multiple in-flight requests do not get confused).

Mailboxes that grow unbounded. A slow process accepting messages from a fast producer will accumulate them in its mailbox until the node runs out of memory. Use synchronous calls (which create backpressure), bounded queues, or rate limiting at the source.

Selective receive over a large mailbox. Each receive scans the mailbox for matching messages. With a mailbox of thousands of unmatched messages, every receive is slow. Either drain unknown messages or design messages so they always match a clause.

Trapping exits in non-supervisors. Trapping exits subverts the let-it-crash philosophy. If a process needs to know about another's death without dying, use a monitor.

Assuming messages preserve global order across senders. Messages from a single sender to a single receiver arrive in the order they were sent. Across multiple senders or multiple receivers, ordering is not guaranteed. If you need a total order, you have to design for it.

Confusing Process.exit(pid, :normal) with Process.exit(pid, :kill). A :normal exit is handled gracefully and can be trapped. A :kill exit cannot be trapped — it bypasses trap_exit and unconditionally terminates the process. Use :kill only when you need the absolute guarantee that the process will go away.

Key Takeaways

  • send/2 is non-blocking and deposits a message in the target process's mailbox. Any term can be sent.
  • receive blocks until a message matches one of its clauses, or the after timeout fires.
  • Mailboxes are unbounded. A process that cannot keep up with its inbox is a production hazard.
  • Selective receive is powerful — receive matches against patterns, not just FIFO order — but a non-matching message stays in the mailbox.
  • Links are bidirectional and fatal. They make two processes share fate. Used by supervisors.
  • Monitors are unidirectional and non-fatal. They tell you when another process dies without affecting you.
  • Reach for monitors over links when you need observability without shared fate. Trapping exits is for supervisors, not normal code.