9 min read
On this page

Channel Coding

Channel Capacity

Communication Channel Model: Source to Sink

The channel capacity of a discrete memoryless channel (DMC) is the maximum rate at which information can be transmitted with arbitrarily low error probability:

C=maxp(x)I(X;Y)C = \max_{p(x)} I(X; Y)

where the maximization is over input distributions. This is a convex optimization problem (mutual information is concave in p(x)p(x) for fixed channel). The Blahut-Arimoto algorithm computes CC iteratively via alternating maximization.

For a channel with input XX, output YY, and transition probabilities p(yx)p(y|x):

  • C0C \geq 0, with C=0C = 0 iff XX and YY are independent for all input distributions
  • Cmin(logX,logY)C \leq \min(\log|\mathcal{X}|, \log|\mathcal{Y}|)
  • Feedback does not increase capacity for DMCs (but simplifies coding)

Shannon's Channel Coding Theorem (Second Theorem)

Achievability: For any rate R<CR < C, there exists a sequence of (2nR,n)(2^{nR}, n) codes with probability of error Pe(n)0P_e^{(n)} \to 0 as nn \to \infty.

Converse: For any rate R>CR > C, any sequence of codes has Pe(n)1P_e^{(n)} \to 1.

Error exponent (reliability function): for R<CR < C, the best achievable error probability decays as Pe(n)2nE(R)P_e^{(n)} \sim 2^{-nE(R)} where E(R)>0E(R) > 0. The random coding exponent provides a lower bound; the sphere-packing exponent provides an upper bound. They coincide at high rates (above the critical rate).

Shannon's proof used random coding: generate 2nR2^{nR} codewords i.i.d. from the capacity-achieving input distribution, decode via joint typicality. Non-constructive but establishes the fundamental limit.

Binary Symmetric Channel (BSC)

Flips each bit independently with probability pp (crossover probability).

CBSC=1Hb(p)C_{\text{BSC}} = 1 - H_b(p)

Achieved by uniform input distribution. At p=0p = 0: C=1C = 1 bit. At p=1/2p = 1/2: C=0C = 0 (completely noisy).

Binary Erasure Channel (BEC)

Each bit is either received correctly or erased (marked as unknown) with probability ϵ\epsilon.

CBEC=1ϵC_{\text{BEC}} = 1 - \epsilon

The BEC is analytically tractable and serves as the primary channel model for analyzing LDPC and polar codes. Capacity is achieved by uniform input. ML decoding reduces to solving linear equations over GF(2).

Additive White Gaussian Noise (AWGN) Channel

Continuous-valued: Y=X+ZY = X + Z where ZN(0,N)Z \sim \mathcal{N}(0, N), with average power constraint 1nxi2P\frac{1}{n}\sum x_i^2 \leq P.

CAWGN=12log(1+PN)bits/channel useC_{\text{AWGN}} = \frac{1}{2}\log\left(1 + \frac{P}{N}\right) \quad \text{bits/channel use}

This is the Shannon-Hartley theorem. With bandwidth WW and noise spectral density N0/2N_0/2:

C=Wlog2(1+PN0W)bits/secondC = W \log_2\left(1 + \frac{P}{N_0 W}\right) \quad \text{bits/second}

In the wideband limit (WW \to \infty): CPN0ln2C \to \frac{P}{N_0 \ln 2} (power-limited regime). The capacity-achieving input distribution is Gaussian.

Shannon limit: the minimum Eb/N0E_b/N_0 for reliable communication is 1.59-1.59 dB, achieved as R0R \to 0.

Linear Block Codes

An (n,k)(n, k) linear code over GF(qq) is a kk-dimensional subspace of GF(qq)^n. Rate R=k/nR = k/n.

  • Generator matrix GG: k×nk \times n, codeword c=uGc = uG for message uu
  • Parity-check matrix HH: (nk)×n(n-k) \times n, HcT=0Hc^T = 0 for all codewords
  • Minimum distance dmind_{\min}: minimum Hamming weight of nonzero codewords
  • Can detect dmin1d_{\min} - 1 errors and correct (dmin1)/2\lfloor(d_{\min}-1)/2\rfloor errors
  • Singleton bound: dminnk+1d_{\min} \leq n - k + 1. Codes achieving equality are MDS (maximum distance separable)

Syndrome decoding: compute s=HyTs = Hy^T; syndrome determines the error pattern (for errors within correction capability).

Hamming Codes

Parameters: (2r1,2r1r,3)(2^r - 1, 2^r - 1 - r, 3) for any r2r \geq 2. Rate R=1r/(2r1)1R = 1 - r/(2^r - 1) \to 1.

  • Corrects any single-bit error
  • Perfect code: every vector is within Hamming distance 1 of exactly one codeword (meets Hamming/sphere-packing bound)
  • Parity-check matrix columns are all nonzero rr-bit vectors
  • Extended Hamming: append overall parity bit, yielding (2r,2r1r,4)(2^r, 2^r - 1 - r, 4), used in ECC memory (SECDED)

Reed-Solomon Codes

(n,k,nk+1)(n, k, n-k+1) MDS codes over GF(qq) with nqn \leq q, typically q=2mq = 2^m, n=q1=2m1n = q - 1 = 2^m - 1.

  • Corrects up to t=(nk)/2t = \lfloor(n-k)/2\rfloor symbol errors
  • Particularly effective against burst errors (each symbol is mm bits)
  • Encoding: evaluate message polynomial at nn points, or systematic via polynomial division
  • Decoding: Berlekamp-Massey algorithm (finds error-locator polynomial), then Chien search (finds error locations), then Forney's formula (finds error values). Complexity O(nt)O(nt)

Applications: CDs, DVDs, QR codes, deep-space communication, RAID-6, digital television. Often concatenated with an inner code (e.g., convolutional) for compound error correction.

Convolutional Codes

Encode a continuous stream of bits using shift registers. Characterized by:

  • Rate R=k/nR = k/n (input/output bits per time step)
  • Constraint length KK: number of shift register stages + 1
  • Memory m=K1m = K - 1
  • Generator polynomials: define connections from shift register to output

The code has 2m2^m states, naturally represented as a trellis. State transitions define a finite-state machine.

Viterbi Algorithm

Maximum-likelihood sequence decoding via dynamic programming on the trellis.

  • Complexity: O(2mn)O(2^m \cdot n) per decoded bit (exponential in constraint length)
  • Optimal (ML) for the given code
  • Practical constraint lengths: K10K \leq 10 (state space 2m5122^m \leq 512)
  • Traceback: store surviving paths, output decision after traceback depth 5K\approx 5K
  • Soft-decision Viterbi: uses channel LLRs instead of hard bits, gaining ~2 dB

BCJR Algorithm

Computes a posteriori probabilities (APP) for each bit via forward-backward on the trellis. Essential for iterative (turbo) decoding. Complexity similar to Viterbi but computes marginals rather than MAP sequence.

Turbo Codes

Berrou, Glavieux, Thitimajshima (1993). Two parallel concatenated convolutional codes with a random interleaver between them.

Encoder: systematic bits + parity from encoder 1 + parity from encoder 2 (fed interleaved data). Rate 1/3\approx 1/3 before puncturing.

Iterative decoding: two BCJR decoders exchange extrinsic information (soft bit estimates) iteratively. Each decoder uses the other's output as prior information. Typically 6-18 iterations.

Performance:

  • Approach capacity within 0.5 dB on AWGN at moderate block lengths (~10410^4)
  • First practical near-capacity codes
  • Used in 3G/4G cellular (UMTS, LTE), deep-space (CCSDS)
  • Error floor: residual errors at high SNR due to low-weight codewords from specific interleaver patterns

LDPC Codes

Low-density parity-check codes (Gallager, 1962; rediscovered by MacKay & Neal, 1996).

Defined by a sparse parity-check matrix HH with few 1s per row/column. Represented by a Tanner graph (bipartite: variable nodes and check nodes).

Regular LDPC(dv,dc)(d_v, d_c): each variable node has degree dvd_v, each check node has degree dcd_c. Rate R1dv/dcR \geq 1 - d_v/d_c.

Irregular LDPC: variable and check node degrees follow optimized distributions (λ(x),ρ(x))(\lambda(x), \rho(x)). Density evolution (Richardson-Urbanke) tracks message distributions through iterations to find the decoding threshold. Optimized irregular codes achieve within 0.0045 dB of capacity on BEC.

Belief Propagation Decoding

Message-passing on the Tanner graph:

  1. Variable-to-check messages: sum of incoming check-to-variable messages (in LLR domain)
  2. Check-to-variable messages: tanh1(tanh(m/2))\tanh^{-1}\left(\prod \tanh(m/2)\right) or min-sum approximation
  3. Iterate until convergence or max iterations (typically 50-100)

Exact on trees (cycle-free graphs); approximate on graphs with cycles. Min-sum and offset min-sum are practical low-complexity approximations.

Decoding complexity: O(ndˉI)O(n \cdot \bar{d} \cdot I) where dˉ\bar{d} is average degree and II is iteration count. Highly parallelizable.

Used in: DVB-S2, 802.11n/ac/ax (WiFi), 5G NR (data channels), 10GBASE-T Ethernet.

Polar Codes

Arikan (2009). First provably capacity-achieving codes with explicit construction and O(nlogn)O(n \log n) encoding/decoding complexity.

Channel polarization: apply recursive butterfly transform GN=BNFnG_N = B_N F^{\otimes n} where F=[1011]F = \begin{bmatrix} 1 & 0 \\ 1 & 1 \end{bmatrix}. As N=2nN = 2^n \to \infty, synthetic channels polarize: fraction CC become perfect (capacity 1), fraction 1C1-C become useless (capacity 0).

Encoding: place information bits on the KK most reliable synthetic channels, fix remaining NKN-K positions to known values ("frozen bits").

Successive cancellation (SC) decoding: decode bits sequentially in order u1,u2,,uNu_1, u_2, \ldots, u_N, using previously decoded bits. Complexity O(NlogN)O(N \log N).

SC List (SCL) decoding: maintain LL candidate paths, prune after each bit decision. With CRC-aided selection (CA-SCL), performance matches or exceeds turbo/LDPC at short-to-moderate block lengths. Used in 5G NR control channels.

Polarization-adjusted convolutional (PAC) codes: concatenate convolutional pre-transform with polar transform, achieving near-optimal performance at short block lengths.

Capacity-Achieving Code Families

| Code Family | Capacity-Achieving? | Complexity (Encoding) | Complexity (Decoding) | Practical Gap to Capacity | |---|---|---|---|---| | Random codes | Yes (Shannon) | Exponential | Exponential | Theoretical only | | LDPC (irregular) | Yes (BEC, near for AWGN) | O(n)O(n) | O(nI)O(n \cdot I) | < 0.1 dB | | Polar | Yes (any symmetric DMC) | O(nlogn)O(n \log n) | O(nlogn)O(n \log n) (SC) | ~0.5 dB (SC), <0.1 dB (SCL) | | Turbo | Empirically near | O(n)O(n) | O(nI)O(n \cdot I) | ~0.3-0.5 dB | | Spatially coupled LDPC | Yes (threshold saturation) | O(n)O(n) | O(nI)O(n \cdot I) | < 0.05 dB |

Threshold saturation (spatially coupled LDPC): coupling regular LDPC codes in a chain causes the BP threshold to equal the MAP threshold, which equals capacity for BMS channels.

Coded Modulation

For bandwidth-limited channels, combine coding with higher-order modulation:

  • Trellis-coded modulation (TCM): Ungerboeck (1982). Combine convolutional code with set-partitioning of signal constellation. Achieves coding gain without bandwidth expansion
  • Bit-interleaved coded modulation (BICM): separate binary code from modulation via bit-level interleaving. Simpler, flexible, near-optimal with iterative demapping (BICM-ID)
  • Multilevel coding (MLC): protect different bit levels of modulation with different-rate codes. Optimal with multistage decoding

Rateless Codes

Transmit potentially infinite stream of coded symbols; receiver collects enough for decoding.

  • LT codes (Luby, 2002): first practical rateless codes. Random sparse bipartite graph, peeling decoder. Require (1+ϵ)k(1+\epsilon)k symbols for kk information symbols. Robust Soliton distribution optimizes degree profile
  • Raptor codes (Shokrollahi, 2006): concatenate high-rate LDPC pre-code with LT code. Linear-time encoding/decoding, approach capacity of BEC. Standardized in 3GPP MBMS, ATSC 3.0
  • Fountain codes: general term for rateless codes. Particularly useful for broadcast/multicast (no feedback needed)

Sphere Packing and Fundamental Limits

The sphere-packing (Hamming) bound: MV(n,t)2nM \cdot V(n, t) \leq 2^n where V(n,t)=i=0t(ni)V(n,t) = \sum_{i=0}^t \binom{n}{i} is the volume of a Hamming sphere of radius tt. Perfect codes meet this bound: Hamming codes, Golay codes, trivial codes.

The Gilbert-Varshamov bound shows good codes exist: there exists an (n,k,d)(n, k, d) code if i=0d2(n1i)<2nk\sum_{i=0}^{d-2}\binom{n-1}{i} < 2^{n-k}.

Plotkin bound, Elias-Bassalygo bound, and the linear programming bound (Delsarte) provide tighter limits in different parameter regimes.