The Built-In Sigils
The basics covered the shape and the four sigils you reach for constantly. This one goes deeper into the workhorses — regex with all its modifiers, ~w with its quiet a/c/s options, the calendar sigils, and heredocs. These are the parts of the standard library that turn ugly text-handling code into something you can read.
~r and the Regex Module
~r compiles a regex at compile time. The result is a Regex struct, which means every function in the Regex module works on it, and so do the higher-level wrappers in String.
pattern = ~r/^(\w+)@(\w+\.\w+)$/
Regex.run(pattern, "alice@example.com")
# ["alice@example.com", "alice", "example.com"]
Regex.named_captures(~r/^(?<user>\w+)@(?<host>\w+\.\w+)$/, "alice@example.com")
# %{"user" => "alice", "host" => "example.com"}
Named captures are underused. When you have more than two groups, naming them is the difference between a regex you can read in six months and a regex you have to mentally re-execute every time you look at it.
Modifiers
The trailing letters change how the regex behaves:
i— case-insensitive.~r/error/imatchesERROR,Error, andeRrOr.m— multiline mode.^and$match at line boundaries within the string, not just the start and end.s— dotall mode..matches newlines too. Default is for.to skip them.u— Unicode mode.\w,\d,\s, and character classes respect Unicode categories. Default is ASCII-only.x— extended mode. Whitespace and#comments inside the pattern are ignored, which lets you format complex regexes across multiple lines.U— ungreedy. Quantifiers become lazy by default, and?after them makes them greedy. Inverts the usual behavior.
The two you will use most are i and u. The x modifier is a hidden gem for any regex longer than three groups:
phone = ~r/
^
\+? # optional plus
(\d{1,3}) # country code
[\s\-]? # optional separator
(\d{3}) # area code
[\s\-]? # optional separator
(\d{3,4}) # local part
$
/x
Without x, this would be ~r/^\+?(\d{1,3})[\s\-]?(\d{3})[\s\-]?(\d{3,4})$/, which is fine for a one-time regex but punishing in a config file you have to maintain.
Working with the Regex Module
A handful of functions matter:
Regex.match?(~r/foo/, "food") # true
Regex.run(~r/(\w)(\w)/, "ab") # ["ab", "a", "b"]
Regex.scan(~r/\d+/, "a1 b22 c333")
# [["1"], ["22"], ["333"]]
Regex.replace(~r/\s+/, " hello world ", " ")
# " hello world "
String.replace/3, String.split/2, and String.match?/2 also accept a regex as their pattern argument, so you rarely have to call Regex functions directly. The pipe-friendly form is usually:
" hello WORLD "
|> String.trim()
|> String.replace(~r/\s+/, " ")
|> String.downcase()
# "hello world"
A Real Use Case
Log parsing is the canonical regex-in-Elixir example. Discord, which is built on Elixir, has talked publicly about pipelines that process millions of log lines per second. The shape of the code looks like this:
defmodule LogParser do
@line ~r/
^
\[(?<timestamp>[^\]]+)\]\s+
(?<level>DEBUG|INFO|WARN|ERROR)\s+
(?<service>[\w\-]+):\s+
(?<message>.*)
$
/x
def parse(line) do
case Regex.named_captures(@line, line) do
nil -> {:error, :malformed}
captures -> {:ok, captures}
end
end
end
Compile the regex once as a module attribute, use named captures, lean on x mode for readability. This pattern shows up in every log shipper, event consumer, and analytics ingestor written in Elixir.
~w and Its Modifiers
~w is a word list. Out of the box it produces strings:
~w(alpha beta gamma)
# ["alpha", "beta", "gamma"]
The modifier picks the type of each element:
s— strings (the default).a— atoms.c— charlists.
~w(read write admin)s
# ["read", "write", "admin"]
~w(read write admin)a
# [:read, :write, :admin]
~w(read write admin)c
# [~c"read", ~c"write", ~c"admin"]
In real Elixir code, you will see ~w()a constantly and the others almost never. Ecto changesets, Phoenix permitted-params lists, GenServer state keys — all atoms. The c variant exists for Erlang interop and is rarely used.
A common pattern in Ecto:
def changeset(user, attrs) do
user
|> cast(attrs, ~w(email name password role)a)
|> validate_required(~w(email password)a)
|> unique_constraint(:email)
end
Without ~w, you would write [:email, :name, :password, :role], which is fine for four fields but gets noisy at twenty. The sigil keeps the list dense and scannable.
The Calendar Sigils
Elixir ships with four calendar sigils corresponding to its four built-in calendar types:
~D—Date. Year, month, day.~T—Time. Hour, minute, second, microsecond.~N—NaiveDateTime. Date plus time with no timezone.~U—DateTime. Date plus time plus UTC offset (must end inZ).
~D[2026-05-12]
~T[10:30:00]
~T[10:30:00.123456]
~N[2026-05-12 10:30:00]
~U[2026-05-12 10:30:00Z]
~U[2026-05-12 10:30:00.123Z]
These are real structs, not strings. You can do arithmetic, comparison, and formatting:
Date.diff(~D[2026-05-12], ~D[2026-01-01]) # 131
Date.compare(~D[2026-05-12], ~D[2026-05-12]) # :eq
DateTime.add(~U[2026-05-12 10:00:00Z], 3600, :second)
# ~U[2026-05-12 11:00:00Z]
The standard library deliberately does not include timezone-aware datetimes beyond UTC. For that, you pull in the tz library or Tzdata, which plug into DateTime and give you DateTime.now/1 with arbitrary zones. Phoenix and Ecto interop is built around UTC DateTime values; the sigil reflects that.
A real use case: writing test fixtures.
test "calculates trial expiry 14 days after signup" do
user = %User{signed_up_at: ~U[2026-05-01 00:00:00Z]}
assert Billing.trial_expiry(user) == ~U[2026-05-15 00:00:00Z]
end
Without sigils, this test would be cluttered with DateTime.from_iso8601!/1 calls. With them, the dates read as dates.
Heredocs
Triple-quoted sigils give you heredocs — multiline strings with consistent leading-whitespace handling. Any sigil can be a heredoc by using """ or ''' as the delimiter.
~s"""
This is a multiline string.
The first newline after the opening quotes is stripped.
Trailing whitespace on each line is preserved.
"""
The most common use is module documentation:
defmodule MyApp.Accounts do
@moduledoc """
The Accounts context handles user registration, authentication,
and profile management.
See `register_user/1` and `authenticate/2` for the main entry points.
"""
end
@moduledoc and @doc accept any string, but the convention is heredoc-with-uppercase-S when the content has backslashes or interpolation-like syntax you do not want processed:
@doc ~S"""
Parses a string of the form "user@host" into a tuple.
iex> MyMod.parse("alice@example.com")
{:ok, "alice", "example.com"}
"""
def parse(string), do: # ...
The ~S is important here because the doctest contains iex> and {:ok, ...} patterns that you do not want Elixir to attempt interpolating.
Heredocs also work with regex (~r"""..."""), word lists (~w"""..."""), and any other sigil. In practice, only ~s/~S heredocs come up regularly.
A Real Pattern: Config Validation
Tying several sigils together, here is a validation function you might write for a config-loading module:
defmodule MyApp.Config do
@required_keys ~w(database_url secret_key_base port)a
@port_pattern ~r/^\d{4,5}$/
@url_pattern ~r{^postgres(?:ql)?://[\w\-\.]+(?::\d+)?/\w+$}
def validate(config) do
with :ok <- check_required(config),
:ok <- check_port(config),
:ok <- check_database_url(config) do
:ok
end
end
defp check_required(config) do
missing = @required_keys -- Map.keys(config)
if missing == [], do: :ok, else: {:error, {:missing, missing}}
end
defp check_port(config) do
if Regex.match?(@port_pattern, to_string(config.port)),
do: :ok,
else: {:error, :invalid_port}
end
defp check_database_url(config) do
if Regex.match?(@url_pattern, config.database_url),
do: :ok,
else: {:error, :invalid_database_url}
end
end
The ~w()a gives you a scannable list of required keys. The two ~r patterns are compiled once as module attributes. The curly-brace delimiter on the URL regex avoids escaping the forward slashes. This is the kind of code that shows up in runtime.exs validation or in an Application.start/2 callback.
Common Pitfalls
Compiling regex inside hot paths. ~r/.../ compiles once if it appears as a module attribute or top-level expression. But def find(line), do: Regex.run(~r/^\w+/, line) — depending on the form — can re-compile per call in some macros. The safe pattern is to assign to a @module_attribute.
Forgetting the u modifier for non-ASCII text. ~r/\w+/ only matches [A-Za-z0-9_]. To match accented characters, Cyrillic, CJK, or anything Unicode, you need ~r/\w+/u. This is a silent correctness bug — your code works fine until a customer with a name outside ASCII shows up.
Comparing ~D and ~N with ==. Struct equality works element-by-element, so ~D[2026-05-12] == ~D[2026-05-12] is true. But comparing values that are equivalent in time but stored differently — say a NaiveDateTime with microseconds and one without — will return false. Use Date.compare/2, DateTime.compare/2, etc.
Putting interpolation in a sigil heredoc and expecting ~S to interpolate. Uppercase sigils never interpolate, even in heredoc form. If you want #{name} to expand inside a """...""" block, use ~s (lowercase), or just write a plain """...""" string.
Reaching for ~r when String.contains?/2 would do. A literal substring match does not need a regex. String.contains?(line, "ERROR") is faster and clearer than Regex.match?(~r/ERROR/, line). Use regex for patterns, not for substring search.
Using ~w for a list with quoted strings or spaces. ~w(hello world) splits on whitespace, so an entry containing a space breaks the abstraction. ~w("hello world" "foo bar") gives you ["\"hello", "world\"", "\"foo", "bar\""], not what you want. Use a regular list literal for anything that contains spaces.
Key Takeaways
~ris for regex — modifiersi,u,xcarry the most weight; named captures pay for themselves on anything nontrivial.~wproduces a list of strings by default; theamodifier gives atoms and is the right call for schema field lists and permission sets.~D,~T,~N,~Uare the calendar sigils and produce real struct values you can compute with. Comparisons should go through theDate,Time,NaiveDateTime, orDateTimemodules, not==.- Heredoc sigils with
"""are the standard form for@moduledocand@doc, with~Spreferred when the content includes doctest examples or backslashes. - Compile regex once as module attributes when they appear in hot paths.