Binary Comprehensions
Binary comprehensions are the feature you do not appreciate until the first time you reach for them and realize there is no comparable construct in most other languages. The syntax extends for to iterate over a binary in chunks defined by bitstring patterns, pulling out fields by size and type the same way you would in a <<>> match. You can also use into: <<>> to pack data back into a binary in the same compact form. For protocol work, file format parsing, image processing, and anything where you are looking at raw bytes, this turns code that would be a tedious recursive parser in most languages into a few lines.
Binary comprehensions lean on Elixir's binary syntax (sizes, types, modifiers) which is its own topic. The shape <<r::8, g::8, b::8>> reads as "three 8-bit fields named r, g, b." If that syntax is unfamiliar, the topic on binaries and bitstrings covers it in depth; for here, just trust that the size/type annotations work the same inside a comprehension generator as they do in a regular pattern match.
The Basic Shape
A binary generator uses <<pattern <- binary>> (note the double angle brackets around the generator itself):
pixels = <<255, 0, 0, 0, 255, 0, 0, 0, 255>>
for <<r::8, g::8, b::8 <- pixels>>, do: {r, g, b}
# [{255, 0, 0}, {0, 255, 0}, {0, 0, 255}]
Read it as: "walk the binary in chunks where each chunk matches the pattern, bind the fields, run the body." The size of each chunk is determined by the total bit-width of the pattern — 8 + 8 + 8 = 24 bits per element, so a 72-bit binary yields three results.
Filters and the into: option work exactly the same way they do in regular comprehensions:
for <<r::8, g::8, b::8 <- pixels>>, r > 0, do: {r, g, b}
# [{255, 0, 0}]
Packing With into: <<>>
The reverse direction is just as clean. With into: <<>>, the comprehension produces a binary by concatenating each iteration's result.
rgb_triples = [{255, 0, 0}, {0, 255, 0}, {0, 0, 255}]
for {r, g, b} <- rgb_triples, into: <<>> do
<<r::8, g::8, b::8>>
end
# <<255, 0, 0, 0, 255, 0, 0, 0, 255>>
The body returns a bitstring per iteration; the into: <<>> accumulates them. You can use the same pattern to encode any field-based format: protocol frames, binary file headers, compact serialization.
Pixel Processing
The canonical binary comprehension example. Suppose you have a raw RGB image as a binary and want to convert it to grayscale using the standard luminance formula.
defmodule Image do
def to_grayscale(rgb_binary) do
for <<r::8, g::8, b::8 <- rgb_binary>>, into: <<>> do
lum = round(0.299 * r + 0.587 * g + 0.114 * b)
<<lum::8, lum::8, lum::8>>
end
end
end
For a 1920x1080 image, that is around 2 million iterations through for, each pulling three bytes and writing three. Elixir's optimizer recognizes the binary-append pattern when the accumulator is reused without rebinding, so the compiler can run this efficiently against a growing bitstring rather than reallocating each step. The result is code that looks like the formula on the whiteboard.
Inverting a channel:
def invert_red(rgb_binary) do
for <<r::8, g::8, b::8 <- rgb_binary>>, into: <<>> do
<<255 - r::8, g::8, b::8>>
end
end
Reducing color depth from 8-bit to 4-bit per channel:
def quantize_to_4bit(rgb_binary) do
for <<r::8, g::8, b::8 <- rgb_binary>>, into: <<>> do
<<div(r, 16)::4, div(g, 16)::4, div(b, 16)::4, 0::4>>
end
end
The output is a different size than the input. The comprehension does not care — it walks the input by 24-bit chunks and writes 16-bit chunks. As long as both sides use the binary syntax correctly, the construct adjusts.
Parsing Fixed-Width Records
The other classic application. Suppose you receive a binary file where each record is a fixed structure — say, sensor readings from an IoT device, each record being a 4-byte timestamp, a 2-byte sensor ID, and a 4-byte float value.
defmodule SensorLog do
def parse(binary) do
for <<ts::32, sensor_id::16, value::float-32 <- binary>> do
%{timestamp: ts, sensor_id: sensor_id, value: value}
end
end
end
Ten-byte records, walked end-to-end, each emitted as a struct. There is no manual loop, no offset arithmetic, no :binary.part/3 calls. The pattern describes the record shape, and the comprehension does the rest.
You can mix in filters to drop records you do not want:
def parse_for_sensor(binary, target_id) do
for <<ts::32, sensor_id::16, value::float-32 <- binary>>,
sensor_id == target_id do
%{timestamp: ts, value: value}
end
end
Or destructure further inside the pattern when the value itself has structure:
def parse_events(binary) do
for <<event_type::8, payload_size::16, payload::binary-size(payload_size) <- binary>> do
{event_type, payload}
end
end
This last one is doing real work: each event has a 1-byte type, a 2-byte length, and a variable-length payload sized by that length field. The generator walks variable-width chunks, advancing by 1 + 2 + payload_size bytes each time. Try writing that in Python without a buffer-position variable and you will appreciate the syntax.
Encoding a Small Protocol
A more complete example: encoding and decoding a tiny binary protocol where messages are [1-byte type][2-byte length][payload bytes].
defmodule WireProtocol do
def encode(messages) do
for {type, payload} <- messages, into: <<>> do
size = byte_size(payload)
<<type::8, size::16, payload::binary>>
end
end
def decode(binary) do
for <<type::8, size::16, payload::binary-size(size) <- binary>> do
{type, payload}
end
end
end
frames = [{1, "hello"}, {2, "world"}, {3, ""}]
encoded = WireProtocol.encode(frames)
WireProtocol.decode(encoded)
# [{1, "hello"}, {2, "world"}, {3, ""}]
Two functions, total maybe twelve lines. This is the level of work a binary comprehension covers cleanly. For anything more than the simplest framing — varints, optional fields, checksums, length-delimited substructures — you graduate to a custom recursive parser. But for the kind of one-off binary work that crops up around the edges of a Phoenix or Nerves project (parsing a device handshake, encoding a request to a hardware peripheral, reading a fixed-format export), comprehensions cover a surprising amount of ground.
Bitstrings, Not Just Bytes
The <<>> syntax operates on bitstrings — sequences of bits that may or may not align to byte boundaries. Binary comprehensions inherit this. You can pull sub-byte fields out of a packed format:
# A 16-bit color in RGB565 format: 5 bits red, 6 bits green, 5 bits blue.
def rgb565_to_rgb888(packed) do
for <<r::5, g::6, b::5 <- packed>>, into: <<>> do
<<r * 8::8, g * 4::8, b * 8::8>>
end
end
Every iteration consumes exactly 16 bits of input and produces 24 bits of output. The pattern handles the bit-level slicing without you reaching for shift and mask operations.
This shows up most often in formats designed before storage was free — old image formats, audio codecs, network packet headers — where every bit was being squeezed. Most modern formats are byte-aligned, but when you do encounter the older ones, the syntax is right there.
Real Uses You Will See
- Phoenix LiveView projects that handle uploaded images sometimes pre-process thumbnails with binary comprehensions for simple effects rather than shelling out to ImageMagick.
- Nerves projects (Elixir running on embedded hardware) lean on binary comprehensions constantly for talking to sensors and parsing device output.
- Discord's voice processing pipeline, written in Elixir, uses binary processing extensively — though the heaviest lifting moves to Rust NIFs, the higher-level framing and routing stays in Elixir partly because the binary syntax is this clean.
- Any project that has to parse a non-JSON, non-XML wire format — game protocols, custom telemetry, legacy file types — will find this useful at some point.
Performance Notes
The compiler optimizes binary comprehensions reasonably well for the common case where the accumulator grows by appending. Specifically, when you write into: <<>> and the body returns a bitstring, and you do not reassign the accumulator anywhere strange, the runtime uses a "growing binary" optimization that amortizes the cost.
A few things to keep in mind:
- For very large binaries (hundreds of megabytes), consider streaming: read in chunks, comprehend each chunk, write to an output stream. Holding multi-GB binaries in memory works, but is rarely what you actually want.
- If a binary comprehension is hot in your profile, look at whether you can move the work to a NIF. For pixel-level image processing on large images, a Rust NIF using something like Rustler is often the right answer. But profile first — most binary comprehensions are not bottlenecks.
- Pattern matching on dynamic sizes (
binary-size(n)wherenis a runtime value) is slower than fixed-size patterns. If your record size is known at compile time, write it as a literal.
Common Pitfalls
Forgetting the double angle brackets on the generator. It is <<pattern <- binary>>, not <<pattern>> <- binary or pattern <- <<binary>>. The whole generator, arrow and all, lives inside one pair of <<...>>.
Pattern size that does not divide the binary evenly. If your binary is 25 bytes and your pattern is 3 bytes wide, the comprehension stops after 8 iterations and silently ignores the trailing 1 byte. There is no error. If trailing data matters, check byte_size/1 explicitly before the comprehension or change the pattern.
Using into: <<>> with a body that returns a non-bitstring. The body must return something that can be appended to a bitstring. Returning a list, integer, or arbitrary term will fail at runtime with a Protocol.UndefinedError on Collectable. Wrap the body in <<...>> to be explicit.
Mixing up bit sizes and byte sizes. r::8 is 8 bits — one byte. payload::binary-size(n) is n bytes by default. If you write r::1 thinking you are pulling one byte, you are actually pulling one bit. The size modifier defaults to bits for integers and bytes for the binary type. Check the binaries topic if this trips you up.
Hand-rolling a recursive parser when a comprehension would do. People who have not internalized binary comprehensions often write a recursive function with <<head::8, rest::binary>> and a base case for <<>>. That works, but it is more code, more places for off-by-ones, and harder to read than the comprehension. If the format is fixed-shape records, reach for for first.
Believing this works for non-bitstring data. Binary comprehensions only operate on binaries and bitstrings. You cannot use the <<pattern <- list>> syntax on a list. It is a separate, specialized generator form.
Key Takeaways
- Binary comprehensions use
for <<pattern <- binary>>, do: ...to walk a binary in fixed-shape chunks defined by the pattern. - The pattern uses the same size/type syntax as regular binary pattern matching (
::8,::16,::float-32,binary-size(n)). into: <<>>packs results back into a binary by appending each iteration's bitstring result.- Filters and other
foroptions work normally inside binary comprehensions. - Canonical use cases: pixel/image processing, fixed-width record parsing, encoding and decoding small binary protocols.
- For very large binaries, stream in chunks; for hot paths, profile and consider a NIF.
- Mismatched sizes silently truncate — there is no error if the binary length is not a multiple of the pattern width.
- This feature depends on the binary syntax topic, but unlocks a class of problem that is genuinely awkward in most other languages.