Channel Coding
Channel Capacity

The channel capacity of a discrete memoryless channel (DMC) is the maximum rate at which information can be transmitted with arbitrarily low error probability:
where the maximization is over input distributions. This is a convex optimization problem (mutual information is concave in for fixed channel). The Blahut-Arimoto algorithm computes iteratively via alternating maximization.
For a channel with input , output , and transition probabilities :
- , with iff and are independent for all input distributions
- Feedback does not increase capacity for DMCs (but simplifies coding)
Shannon's Channel Coding Theorem (Second Theorem)
Achievability: For any rate , there exists a sequence of codes with probability of error as .
Converse: For any rate , any sequence of codes has .
Error exponent (reliability function): for , the best achievable error probability decays as where . The random coding exponent provides a lower bound; the sphere-packing exponent provides an upper bound. They coincide at high rates (above the critical rate).
Shannon's proof used random coding: generate codewords i.i.d. from the capacity-achieving input distribution, decode via joint typicality. Non-constructive but establishes the fundamental limit.
Binary Symmetric Channel (BSC)
Flips each bit independently with probability (crossover probability).
Achieved by uniform input distribution. At : bit. At : (completely noisy).
Binary Erasure Channel (BEC)
Each bit is either received correctly or erased (marked as unknown) with probability .
The BEC is analytically tractable and serves as the primary channel model for analyzing LDPC and polar codes. Capacity is achieved by uniform input. ML decoding reduces to solving linear equations over GF(2).
Additive White Gaussian Noise (AWGN) Channel
Continuous-valued: where , with average power constraint .
This is the Shannon-Hartley theorem. With bandwidth and noise spectral density :
In the wideband limit (): (power-limited regime). The capacity-achieving input distribution is Gaussian.
Shannon limit: the minimum for reliable communication is dB, achieved as .
Linear Block Codes
An linear code over GF() is a -dimensional subspace of GF()^n. Rate .
- Generator matrix : , codeword for message
- Parity-check matrix : , for all codewords
- Minimum distance : minimum Hamming weight of nonzero codewords
- Can detect errors and correct errors
- Singleton bound: . Codes achieving equality are MDS (maximum distance separable)
Syndrome decoding: compute ; syndrome determines the error pattern (for errors within correction capability).
Hamming Codes
Parameters: for any . Rate .
- Corrects any single-bit error
- Perfect code: every vector is within Hamming distance 1 of exactly one codeword (meets Hamming/sphere-packing bound)
- Parity-check matrix columns are all nonzero -bit vectors
- Extended Hamming: append overall parity bit, yielding , used in ECC memory (SECDED)
Reed-Solomon Codes
MDS codes over GF() with , typically , .
- Corrects up to symbol errors
- Particularly effective against burst errors (each symbol is bits)
- Encoding: evaluate message polynomial at points, or systematic via polynomial division
- Decoding: Berlekamp-Massey algorithm (finds error-locator polynomial), then Chien search (finds error locations), then Forney's formula (finds error values). Complexity
Applications: CDs, DVDs, QR codes, deep-space communication, RAID-6, digital television. Often concatenated with an inner code (e.g., convolutional) for compound error correction.
Convolutional Codes
Encode a continuous stream of bits using shift registers. Characterized by:
- Rate (input/output bits per time step)
- Constraint length : number of shift register stages + 1
- Memory
- Generator polynomials: define connections from shift register to output
The code has states, naturally represented as a trellis. State transitions define a finite-state machine.
Viterbi Algorithm
Maximum-likelihood sequence decoding via dynamic programming on the trellis.
- Complexity: per decoded bit (exponential in constraint length)
- Optimal (ML) for the given code
- Practical constraint lengths: (state space )
- Traceback: store surviving paths, output decision after traceback depth
- Soft-decision Viterbi: uses channel LLRs instead of hard bits, gaining ~2 dB
BCJR Algorithm
Computes a posteriori probabilities (APP) for each bit via forward-backward on the trellis. Essential for iterative (turbo) decoding. Complexity similar to Viterbi but computes marginals rather than MAP sequence.
Turbo Codes
Berrou, Glavieux, Thitimajshima (1993). Two parallel concatenated convolutional codes with a random interleaver between them.
Encoder: systematic bits + parity from encoder 1 + parity from encoder 2 (fed interleaved data). Rate before puncturing.
Iterative decoding: two BCJR decoders exchange extrinsic information (soft bit estimates) iteratively. Each decoder uses the other's output as prior information. Typically 6-18 iterations.
Performance:
- Approach capacity within 0.5 dB on AWGN at moderate block lengths (~)
- First practical near-capacity codes
- Used in 3G/4G cellular (UMTS, LTE), deep-space (CCSDS)
- Error floor: residual errors at high SNR due to low-weight codewords from specific interleaver patterns
LDPC Codes
Low-density parity-check codes (Gallager, 1962; rediscovered by MacKay & Neal, 1996).
Defined by a sparse parity-check matrix with few 1s per row/column. Represented by a Tanner graph (bipartite: variable nodes and check nodes).
Regular LDPC: each variable node has degree , each check node has degree . Rate .
Irregular LDPC: variable and check node degrees follow optimized distributions . Density evolution (Richardson-Urbanke) tracks message distributions through iterations to find the decoding threshold. Optimized irregular codes achieve within 0.0045 dB of capacity on BEC.
Belief Propagation Decoding
Message-passing on the Tanner graph:
- Variable-to-check messages: sum of incoming check-to-variable messages (in LLR domain)
- Check-to-variable messages: or min-sum approximation
- Iterate until convergence or max iterations (typically 50-100)
Exact on trees (cycle-free graphs); approximate on graphs with cycles. Min-sum and offset min-sum are practical low-complexity approximations.
Decoding complexity: where is average degree and is iteration count. Highly parallelizable.
Used in: DVB-S2, 802.11n/ac/ax (WiFi), 5G NR (data channels), 10GBASE-T Ethernet.
Polar Codes
Arikan (2009). First provably capacity-achieving codes with explicit construction and encoding/decoding complexity.
Channel polarization: apply recursive butterfly transform where . As , synthetic channels polarize: fraction become perfect (capacity 1), fraction become useless (capacity 0).
Encoding: place information bits on the most reliable synthetic channels, fix remaining positions to known values ("frozen bits").
Successive cancellation (SC) decoding: decode bits sequentially in order , using previously decoded bits. Complexity .
SC List (SCL) decoding: maintain candidate paths, prune after each bit decision. With CRC-aided selection (CA-SCL), performance matches or exceeds turbo/LDPC at short-to-moderate block lengths. Used in 5G NR control channels.
Polarization-adjusted convolutional (PAC) codes: concatenate convolutional pre-transform with polar transform, achieving near-optimal performance at short block lengths.
Capacity-Achieving Code Families
| Code Family | Capacity-Achieving? | Complexity (Encoding) | Complexity (Decoding) | Practical Gap to Capacity | |---|---|---|---|---| | Random codes | Yes (Shannon) | Exponential | Exponential | Theoretical only | | LDPC (irregular) | Yes (BEC, near for AWGN) | | | < 0.1 dB | | Polar | Yes (any symmetric DMC) | | (SC) | ~0.5 dB (SC), <0.1 dB (SCL) | | Turbo | Empirically near | | | ~0.3-0.5 dB | | Spatially coupled LDPC | Yes (threshold saturation) | | | < 0.05 dB |
Threshold saturation (spatially coupled LDPC): coupling regular LDPC codes in a chain causes the BP threshold to equal the MAP threshold, which equals capacity for BMS channels.
Coded Modulation
For bandwidth-limited channels, combine coding with higher-order modulation:
- Trellis-coded modulation (TCM): Ungerboeck (1982). Combine convolutional code with set-partitioning of signal constellation. Achieves coding gain without bandwidth expansion
- Bit-interleaved coded modulation (BICM): separate binary code from modulation via bit-level interleaving. Simpler, flexible, near-optimal with iterative demapping (BICM-ID)
- Multilevel coding (MLC): protect different bit levels of modulation with different-rate codes. Optimal with multistage decoding
Rateless Codes
Transmit potentially infinite stream of coded symbols; receiver collects enough for decoding.
- LT codes (Luby, 2002): first practical rateless codes. Random sparse bipartite graph, peeling decoder. Require symbols for information symbols. Robust Soliton distribution optimizes degree profile
- Raptor codes (Shokrollahi, 2006): concatenate high-rate LDPC pre-code with LT code. Linear-time encoding/decoding, approach capacity of BEC. Standardized in 3GPP MBMS, ATSC 3.0
- Fountain codes: general term for rateless codes. Particularly useful for broadcast/multicast (no feedback needed)
Sphere Packing and Fundamental Limits
The sphere-packing (Hamming) bound: where is the volume of a Hamming sphere of radius . Perfect codes meet this bound: Hamming codes, Golay codes, trivial codes.
The Gilbert-Varshamov bound shows good codes exist: there exists an code if .
Plotkin bound, Elias-Bassalygo bound, and the linear programming bound (Delsarte) provide tighter limits in different parameter regimes.