Hash Functions and MACs
Cryptographic hash functions are one-way functions that map arbitrary-length inputs to fixed-length outputs. They are used in digital signatures, integrity verification, password storage, and many protocols.
Cryptographic Hash Properties
A cryptographic hash function H must satisfy:
Preimage resistance (one-way): Given h, it is infeasible to find m such that H(m) = h. Cost: O(2^n) for an n-bit hash.
Second preimage resistance: Given m1, it is infeasible to find m2 != m1 such that H(m1) = H(m2). Cost: O(2^n).
Collision resistance: It is infeasible to find any pair (m1, m2) with m1 != m2 such that H(m1) = H(m2). Cost: O(2^(n/2)) due to the birthday paradox.
Additional properties for practical use:
- Deterministic: Same input always produces same output.
- Avalanche effect: A single-bit change in input flips ~50% of output bits.
- Fast to compute: Efficient for large inputs.
Hash Function Constructions
Merkle-Damgard Construction
Used by MD5, SHA-1, SHA-2. Processes input in fixed-size blocks:
m = m_1 || m_2 || ... || m_L (padded to block boundary)
H_0 = IV (fixed initialization vector)
H_i = f(H_{i-1}, m_i) (compression function)
Hash = H_L
Length extension vulnerability: Given H(m) and len(m), an attacker can compute H(m || padding || m') without knowing m. Affects MD5, SHA-1, SHA-256. Does not affect SHA-3, BLAKE2, or truncated hashes (SHA-512/256).
Sponge Construction
Used by SHA-3 (Keccak). State is split into rate (r) and capacity (c):
Absorb: XOR input blocks into the rate portion, apply permutation
Squeeze: Read output from the rate portion, apply permutation
No length extension vulnerability. Capacity c determines security level (collision resistance = c/2 bits).
Hash Functions
MD5 (128-bit, BROKEN)
Designed 1991. Merkle-Damgard with 512-bit blocks, 64 rounds. Collisions found in seconds (Wang, 2004). Practical chosen-prefix collisions used to forge CA certificates (2008). Never use for security. Still seen in legacy checksums.
SHA-1 (160-bit, BROKEN)
Designed by NSA, 1995. Theoretical break in 2005 (Wang). Practical collision (SHAttered) in 2017 by Google/CWI. Deprecated by browsers (2017) and NIST. Do not use for new systems.
SHA-2 Family
Designed by NSA, 2001. Merkle-Damgard. Still considered secure.
| Variant | Output | Block Size | Rounds | Security (collision) | |---------|--------|------------|--------|---------------------| | SHA-224 | 224-bit | 512-bit | 64 | 112-bit | | SHA-256 | 256-bit | 512-bit | 64 | 128-bit | | SHA-384 | 384-bit | 1024-bit | 80 | 192-bit | | SHA-512 | 512-bit | 1024-bit | 80 | 256-bit | | SHA-512/256 | 256-bit | 1024-bit | 80 | 128-bit |
SHA-512/256 is preferred over SHA-256 on 64-bit platforms (faster, immune to length extension).
SHA-3 (Keccak)
NIST standard 2015. Sponge construction with Keccak-f[1600] permutation (5x5x64-bit state). Completely different design from SHA-2 -- provides algorithm diversity.
- SHA3-256: 256-bit output, 128-bit collision resistance.
- SHAKE128/SHAKE256: Extendable-output functions (XOFs) -- variable-length output.
BLAKE2
Designed 2012, based on ChaCha. Faster than MD5 while being secure.
- BLAKE2b: Optimized for 64-bit platforms, 1-64 byte output.
- BLAKE2s: Optimized for 32-bit platforms, 1-32 byte output.
- Supports keyed hashing (MAC), personalization, tree hashing.
BLAKE3
Designed 2020. Uses a Merkle tree structure internally, enabling unlimited parallelism. Single algorithm for hashing, MAC, KDF, and XOF. ~6x faster than BLAKE2b on multi-core systems.
// Using BLAKE3
hash ← BLAKE3_HASH("input data")
PRINT(hash AS hex string)
// Keyed MAC
key ← array of 32 zero bytes // Use a proper key
mac ← BLAKE3_KEYED_HASH(key, "message")
// KDF
context ← "myapp 2024-01-01 session key"
output ← BLAKE3_DERIVE_KEY(context, "input keying material", output_length ← 32)
Attacks on Hash Functions
Birthday Attack
Based on the birthday paradox: in a group of 23 people, there is a >50% chance two share a birthday.
For an n-bit hash, a collision can be found in O(2^(n/2)) evaluations. This is why 128-bit hashes provide only 64-bit collision resistance.
Implication: SHA-256 has 128-bit collision resistance, not 256-bit.
Length Extension Attack
Given H(m) and the length of m (but not m itself), compute H(m || padding || suffix) for any suffix.
Affected: MD5, SHA-1, SHA-256, SHA-512. Not affected: SHA-3, BLAKE2/3, SHA-512/256, HMAC.
This is why H(secret || message) is not a secure MAC -- use HMAC instead.
Multicollision Attack (Joux)
Finding 2^k messages that all hash to the same value requires only k x 2^(n/2) work (not 2^(k*n/2) as naively expected). This weakens iterated hash constructions like H(H(m)).
Message Authentication Codes (MAC)
A MAC provides integrity and authentication (but not non-repudiation, since both parties share the key).
HMAC
Hash-based MAC. Works with any hash function H:
HMAC(K, m) = H((K' XOR opad) || H((K' XOR ipad) || m))
where K' is the key padded to block size, opad = 0x5c...5c, ipad = 0x36...36.
Properties: Provably secure if the underlying hash is a PRF. Resistant to length extension attacks (even when built on MD5 or SHA-1). Used in TLS, IPsec, JWT, OAuth.
mac ← HMAC_SHA256_INIT(key)
HMAC_UPDATE(mac, "message to authenticate")
tag ← HMAC_FINALIZE(mac)
// Verify (constant-time comparison)
mac ← HMAC_SHA256_INIT(key)
HMAC_UPDATE(mac, "message to authenticate")
IF NOT HMAC_VERIFY(mac, tag) THEN
FAIL("MAC verification failed")
Poly1305
One-time authenticator by Bernstein. Evaluates a polynomial over a prime field (2^130 - 5). Must use a unique key per message (paired with ChaCha20 or AES to generate per-message keys).
Very fast: ~3 cycles/byte on modern hardware.
GMAC
The authentication component of AES-GCM. Polynomial evaluation over GF(2^128) using carry-less multiplication.
Merkle Trees
A binary tree of hashes. Each leaf is the hash of a data block. Each internal node is the hash of its two children.
H(H01 || H23) ← Root hash
/ \
H(H0||H1) H(H2||H3)
/ \ / \
H(D0) H(D1) H(D2) H(D3) ← Leaf hashes
D0 D1 D2 D3 ← Data blocks
Membership proof: To prove D2 is in the tree, provide H(D3) and H(H0||H1). The verifier computes up to the root using O(log n) hashes.
Applications:
- Git: Every commit is a Merkle tree of file hashes.
- Bitcoin/Ethereum: Transaction Merkle trees in each block.
- Certificate Transparency: Append-only Merkle tree of certificates.
- IPFS: Content-addressed storage using Merkle DAGs.
- ZFS/Btrfs: Data integrity verification.
Hash-Based Signatures
Digital signatures using only hash functions (no number theory). Quantum-resistant since hash security is not affected by Shor's algorithm.
Lamport One-Time Signatures
For an n-bit hash, generate 2n random values as private key. Publish their hashes as public key. To sign, reveal one value per bit based on message hash. One-time use only -- signing two messages reveals the full private key.
SPHINCS+ (SLH-DSA)
Stateless hash-based signature scheme. Uses a hypertree of many-time signature schemes (XMSS trees) with FORS (few-time) signatures at the leaves.
- Stateless: No need to track which keys have been used (unlike XMSS).
- Conservative security: Relies only on hash function security.
- Trade-off: Larger signatures (~17-50 KB) compared to lattice-based schemes.
- NIST PQC standard (2024).
Practical Guidelines
- Integrity checking: SHA-256 or BLAKE3 for file checksums and data integrity.
- Password hashing: Use Argon2id, not raw hash functions (see topic 02).
- Message authentication: HMAC-SHA-256 or use an AEAD cipher (which includes a MAC).
- Digital signatures: Use the signature scheme's built-in hash (Ed25519 uses SHA-512 internally).
- Constant-time comparison: Always use constant-time equality checks for MACs and hashes to prevent timing attacks.