Image Filtering
Convolution
1D Convolution
The discrete convolution of signal f with kernel h:
(f * h)[n] = sum_{k=-inf}^{inf} f[k] * h[n - k]
Key properties:
- Commutative:
f * h = h * f - Associative:
f * (g * h) = (f * g) * h-- chain filters by convolving kernels - Distributive:
f * (g + h) = f * g + f * h
2D Convolution
For image I and kernel K of size (2a+1) x (2b+1):
(I * K)[x, y] = sum_{i=-a}^{a} sum_{j=-b}^{b} I[x-i, y-j] * K[i, j]
Correlation differs from convolution by not flipping the kernel. For symmetric kernels they are identical.
Boundary Handling
| Method | Description | |--------|-------------| | Zero padding | Extend with zeros (darkens borders) | | Replicate | Extend with nearest edge pixel | | Reflect | Mirror pixels at boundary | | Wrap | Periodic extension (assumes tiled image) |
Separable Filters
A 2D kernel K is separable if K = k_col * k_row^T. Reduces complexity from O(m*n*k^2) to O(m*n*2k) where k is kernel width. Gaussian kernels are separable.
Linear Filters
Box (Mean) Filter
K_box = (1/k^2) * ones(k, k)
Simple averaging. Fast with integral images (summed area table) in O(1) per pixel regardless of kernel size. Produces boxy artifacts.
Gaussian Filter
G(x, y) = (1 / (2*pi*sigma^2)) * exp(-(x^2 + y^2) / (2*sigma^2))
- Kernel size typically
6*sigma + 1(captures 99.7% of distribution) - Separable: apply 1D Gaussian along rows then columns
- Removes high-frequency noise while preserving low-frequency structure
- Cascade property:
G(sigma_1) * G(sigma_2) = G(sqrt(sigma_1^2 + sigma_2^2))
Sharpening
Unsharp mask: enhance edges by subtracting a blurred version:
I_sharp = I + alpha * (I - G * I) = (1 + alpha) * I - alpha * G * I
Laplacian sharpening: add the Laplacian (second derivative) to the original:
I_sharp = I - alpha * nabla^2(I)
Laplacian kernel (4-connected):
[ 0 1 0]
[ 1 -4 1]
[ 0 1 0]
Non-Linear Filters
Median Filter
Replaces each pixel with the median of its neighborhood. Excellent for salt-and-pepper noise removal while preserving edges. Not separable in general but fast approximations exist.
Bilateral Filter
Smooths while preserving edges by weighting based on both spatial and intensity distance:
BF[I](x) = (1/W) * sum_{y in N(x)} G_s(||x-y||) * G_r(|I(x)-I(y)|) * I(y)
G_s: spatial Gaussian (sigma_s controls neighborhood size)G_r: range Gaussian (sigma_r controls edge sensitivity)W: normalization factor
Complexity: O(n * k^2) per pixel. Approximations: grid-based (Paris & Durand), permutohedral lattice.
Guided Filter
Uses a guidance image G to filter input I. Assumes a local linear model:
q_i = a_k * G_i + b_k, for all i in window w_k
Coefficients solved by minimizing sum(q_i - I_i)^2 + epsilon * a_k^2. Properties:
- O(n) complexity regardless of window size (using integral images)
- Edge-preserving when guidance = input
- Transfer structure from guidance to input (e.g., flash/no-flash photography)
Non-Local Means (NLM)
Averages pixels with similar patches anywhere in the image:
NLM(x) = (1/Z(x)) * sum_y w(x, y) * I(y)
w(x, y) = exp(-||P(x) - P(y)||^2 / h^2)
where P(x) is the patch centered at x. Superior denoising at high computational cost. BM3D extends this idea with 3D collaborative filtering.
Edge Detection
Image Gradients
First derivatives approximate edges. For discrete images:
G_x = dI/dx ~ I(x+1,y) - I(x-1,y) (central difference)
G_y = dI/dy ~ I(x,y+1) - I(x,y-1)
Gradient magnitude: |G| = sqrt(G_x^2 + G_y^2)
Gradient direction: theta = atan2(G_y, G_x)
Sobel Operator
Combines smoothing with differentiation:
S_x = [-1 0 1] S_y = [-1 -2 -1]
[-2 0 2] [ 0 0 0]
[-1 0 1] [ 1 2 1]
Separable: S_x = [1; 2; 1] * [-1, 0, 1]. Scharr operator provides better rotational symmetry.
Canny Edge Detector
The gold standard multi-stage edge detector:
- Gaussian smoothing: reduce noise
- Gradient computation: magnitude and direction (Sobel)
- Non-maximum suppression (NMS): thin edges to 1-pixel width by suppressing pixels that are not local maxima along the gradient direction
- Double thresholding: classify pixels as strong (
> T_high), weak (T_low < . < T_high), or suppressed (< T_low) - Hysteresis: weak pixels connected to strong pixels are kept; isolated weak pixels are removed
Typical ratio: T_high / T_low ~ 2:1 or 3:1.
Laplacian of Gaussian (LoG)
Second-derivative operator that detects edges at zero crossings:
LoG(x,y) = -(1/(pi*sigma^4)) * (1 - (x^2+y^2)/(2*sigma^2)) * exp(-(x^2+y^2)/(2*sigma^2))
Approximated by Difference of Gaussians (DoG): LoG ~ G(k*sigma) - G(sigma).
Morphological Operations
Operations on binary (or grayscale) images using a structuring element B.
Binary Morphology
| Operation | Definition | Effect |
|-----------|------------|--------|
| Erosion | (I ⊖ B)(x) = 1 iff B ⊆ I at x | Shrinks foreground, removes small objects |
| Dilation | (I ⊕ B)(x) = 1 iff B ∩ I ≠ ∅ at x | Expands foreground, fills small holes |
| Opening | I ∘ B = (I ⊖ B) ⊕ B | Removes small protrusions |
| Closing | I • B = (I ⊕ B) ⊖ B | Fills small gaps |
Derived Operations
- Morphological gradient:
(I ⊕ B) - (I ⊖ B)-- edge detection - Top-hat:
I - (I ∘ B)-- extracts bright details on dark background - Black-hat:
(I • B) - I-- extracts dark details on bright background - Hit-or-miss: matches specific patterns using two structuring elements
Grayscale Morphology
Extends to grayscale by replacing set operations with min/max:
- Erosion: local minimum (darkens)
- Dilation: local maximum (brightens)
Structuring Elements
Common shapes: disk, square, cross, diamond. Size controls the scale of features affected. Decomposable structuring elements (e.g., large disks approximated by lines) reduce computation.
Frequency Domain Filtering
Convolution Theorem
F{f * g} = F{f} . F{g}
Convolution in spatial domain equals element-wise multiplication in frequency domain. For large kernels, FFT-based filtering is faster: O(n log n) vs O(n * k^2).
Low-Pass and High-Pass
- Ideal low-pass: sharp cutoff at frequency
D_0-- causes ringing (Gibbs phenomenon) - Butterworth: smooth rolloff, parameterized by order
nH(u,v) = 1 / (1 + (D(u,v)/D_0)^(2n)) - Gaussian low-pass:
H = exp(-D^2 / (2*D_0^2))-- no ringing - High-pass:
H_hp = 1 - H_lp
Band-Pass and Notch Filters
Used for periodic noise removal (e.g., stripe patterns from sensors). Identify noise spikes in the Fourier spectrum and suppress them with notch reject filters.
Practical Considerations
- Gaussian blur before edge detection to suppress noise (Canny does this internally)
- Bilateral filter is the default choice for edge-preserving smoothing in photography apps
- Morphological operations require careful structuring element selection -- too large destroys features
- For real-time applications, separable and integral image methods are essential
- Non-local means and BM3D remain competitive with deep learning denoisers on many benchmarks
Key Takeaways
| Concept | Core Idea | |---------|-----------| | Convolution | Foundation of filtering; spatial domain sliding window | | Gaussian filter | Optimal smoothing; separable, cascade property | | Bilateral filter | Edge-preserving smoothing via joint spatial-range weighting | | Canny | Multi-stage edge detection with hysteresis | | Morphology | Set-theoretic operations for shape analysis | | Frequency domain | Convolution theorem enables fast large-kernel filtering |