Hash Functions and MACs

Cryptographic hash functions are one-way functions that map arbitrary-length inputs to fixed-length outputs. They are used in digital signatures, integrity verification, password storage, and many protocols.

Cryptographic Hash Properties

A cryptographic hash function H must satisfy:

Preimage resistance (one-way): Given h, it is infeasible to find m such that H(m) = h. Cost: O(2^n) for an n-bit hash.

Second preimage resistance: Given m1, it is infeasible to find m2 != m1 such that H(m1) = H(m2). Cost: O(2^n).

Collision resistance: It is infeasible to find any pair (m1, m2) with m1 != m2 such that H(m1) = H(m2). Cost: O(2^(n/2)) due to the birthday paradox.

Additional properties for practical use:

Deterministic: Same input always produces same output.
Avalanche effect: A single-bit change in input flips ~50% of output bits.
Fast to compute: Efficient for large inputs.

Hash Function Constructions

Merkle-Damgard Construction

Used by MD5, SHA-1, SHA-2. Processes input in fixed-size blocks:

m = m_1 || m_2 || ... || m_L  (padded to block boundary)

H_0 = IV (fixed initialization vector)
H_i = f(H_{i-1}, m_i)    (compression function)
Hash = H_L

Length extension vulnerability: Given H(m) and len(m), an attacker can compute H(m || padding || m') without knowing m. Affects MD5, SHA-1, SHA-256. Does not affect SHA-3, BLAKE2, or truncated hashes (SHA-512/256).

Sponge Construction

Used by SHA-3 (Keccak). State is split into rate (r) and capacity (c):

Absorb: XOR input blocks into the rate portion, apply permutation
Squeeze: Read output from the rate portion, apply permutation

No length extension vulnerability. Capacity c determines security level (collision resistance = c/2 bits).

Hash Functions

MD5 (128-bit, BROKEN)

Designed 1991. Merkle-Damgard with 512-bit blocks, 64 rounds. Collisions found in seconds (Wang, 2004). Practical chosen-prefix collisions used to forge CA certificates (2008). Never use for security. Still seen in legacy checksums.

SHA-1 (160-bit, BROKEN)

Designed by NSA, 1995. Theoretical break in 2005 (Wang). Practical collision (SHAttered) in 2017 by Google/CWI. Deprecated by browsers (2017) and NIST. Do not use for new systems.

SHA-2 Family

Variant	Output	Block Size	Rounds	Security (collision)
SHA-224	224-bit	512-bit	64	112-bit
SHA-256	256-bit	512-bit	64	128-bit
SHA-384	384-bit	1024-bit	80	192-bit
SHA-512	512-bit	1024-bit	80	256-bit
SHA-512/256	256-bit	1024-bit	80	128-bit

SHA-512/256 is preferred over SHA-256 on 64-bit platforms (faster, immune to length extension).

SHA-3 (Keccak)

NIST standard 2015. Sponge construction with Keccak-f[1600] permutation (5x5x64-bit state). Completely different design from SHA-2 -- provides algorithm diversity.

SHA3-256: 256-bit output, 128-bit collision resistance.
SHAKE128/SHAKE256: Extendable-output functions (XOFs) -- variable-length output.

BLAKE2

Designed 2012, based on ChaCha. Faster than MD5 while being secure.

BLAKE2b: Optimized for 64-bit platforms, 1-64 byte output.
BLAKE2s: Optimized for 32-bit platforms, 1-32 byte output.
Supports keyed hashing (MAC), personalization, tree hashing.

BLAKE3

Designed 2020. Uses a Merkle tree structure internally, enabling unlimited parallelism. Single algorithm for hashing, MAC, KDF, and XOF. ~6x faster than BLAKE2b on multi-core systems.

// Using BLAKE3
hash ← BLAKE3_HASH("input data")
PRINT(hash AS hex string)

// Keyed MAC
key ← array of 32 zero bytes  // Use a proper key
mac ← BLAKE3_KEYED_HASH(key, "message")

// KDF
context ← "myapp 2024-01-01 session key"
output ← BLAKE3_DERIVE_KEY(context, "input keying material", output_length ← 32)

Attacks on Hash Functions

Birthday Attack

Based on the birthday paradox: in a group of 23 people, there is a >50% chance two share a birthday.

For an n-bit hash, a collision can be found in O(2^(n/2)) evaluations. This is why 128-bit hashes provide only 64-bit collision resistance.

Implication: SHA-256 has 128-bit collision resistance, not 256-bit.

Length Extension Attack

Given H(m) and the length of m (but not m itself), compute H(m || padding || suffix) for any suffix.

Affected: MD5, SHA-1, SHA-256, SHA-512. Not affected: SHA-3, BLAKE2/3, SHA-512/256, HMAC.

This is why H(secret || message) is not a secure MAC -- use HMAC instead.

Multicollision Attack (Joux)

Finding 2^k messages that all hash to the same value requires only k x 2^(n/2) work (not 2^(k*n/2) as naively expected). This weakens iterated hash constructions like H(H(m)).

Message Authentication Codes (MAC)

A MAC provides integrity and authentication (but not non-repudiation, since both parties share the key).

HMAC

Hash-based MAC. Works with any hash function H:

HMAC(K, m) = H((K' XOR opad) || H((K' XOR ipad) || m))

where K' is the key padded to block size, opad = 0x5c...5c, ipad = 0x36...36.

Properties: Provably secure if the underlying hash is a PRF. Resistant to length extension attacks (even when built on MD5 or SHA-1). Used in TLS, IPsec, JWT, OAuth.

mac ← HMAC_SHA256_INIT(key)
HMAC_UPDATE(mac, "message to authenticate")
tag ← HMAC_FINALIZE(mac)

// Verify (constant-time comparison)
mac ← HMAC_SHA256_INIT(key)
HMAC_UPDATE(mac, "message to authenticate")
IF NOT HMAC_VERIFY(mac, tag) THEN
    FAIL("MAC verification failed")

Poly1305

One-time authenticator by Bernstein. Evaluates a polynomial over a prime field (2^130 - 5). Must use a unique key per message (paired with ChaCha20 or AES to generate per-message keys).

Very fast: ~3 cycles/byte on modern hardware.

GMAC

The authentication component of AES-GCM. Polynomial evaluation over GF(2^128) using carry-less multiplication.

Merkle Trees

A binary tree of hashes. Each leaf is the hash of a data block. Each internal node is the hash of its two children.

        H(H01 || H23)         ← Root hash
       /              \
   H(H0||H1)       H(H2||H3)
   /      \         /      \
 H(D0)  H(D1)   H(D2)  H(D3)   ← Leaf hashes
  D0      D1      D2      D3    ← Data blocks

Membership proof: To prove D2 is in the tree, provide H(D3) and H(H0||H1). The verifier computes up to the root using O(log n) hashes.

Applications:

Git: Every commit is a Merkle tree of file hashes.
Bitcoin/Ethereum: Transaction Merkle trees in each block.
Certificate Transparency: Append-only Merkle tree of certificates.
IPFS: Content-addressed storage using Merkle DAGs.
ZFS/Btrfs: Data integrity verification.

Stateless: No need to track which keys have been used (unlike XMSS).
Conservative security: Relies only on hash function security.
Trade-off: Larger signatures (~17-50 KB) compared to lattice-based schemes.
NIST PQC standard (2024).

Practical Guidelines

Integrity checking: SHA-256 or BLAKE3 for file checksums and data integrity.
Password hashing: Use Argon2id, not raw hash functions (see topic 02).
Message authentication: HMAC-SHA-256 or use an AEAD cipher (which includes a MAC).
Digital signatures: Use the signature scheme's built-in hash (Ed25519 uses SHA-512 internally).
Constant-time comparison: Always use constant-time equality checks for MACs and hashes to prevent timing attacks.