5 min read
On this page

Nonparametric Methods

Nonparametric methods make minimal assumptions about the underlying distribution. They are essential when data doesn't meet parametric assumptions (normality, homoscedasticity) or when the distribution is unknown.

Why Nonparametric?

Parametric methods assume data follows a specific distribution (e.g., normal). When assumptions are violated:

  • Tests may give incorrect p-values
  • Confidence intervals may have wrong coverage
  • Conclusions may be misleading

Nonparametric methods:

  • Make fewer assumptions (often only symmetry or continuity)
  • Work with ranks instead of raw values
  • Are robust to outliers
  • Work with ordinal data
  • Sacrifice some power when parametric assumptions actually hold

Tests for One Sample

Sign Test

Tests whether the median equals a hypothesized value m₀.

Count observations above m₀ (successes) and below m₀. Under H₀, this follows Binomial(n, 0.5).

  • Very few assumptions (only continuity)
  • Less powerful than Wilcoxon signed-rank

Wilcoxon Signed-Rank Test

Tests whether the distribution is symmetric about m₀.

  1. Compute |xᵢ - m₀| for each observation.
  2. Rank the absolute differences.
  3. W⁺ = sum of ranks where xᵢ > m₀.
  4. W⁻ = sum of ranks where xᵢ < m₀.
  5. Test statistic: W = min(W⁺, W⁻).

Under H₀ (symmetric about m₀), E[W⁺] = n(n+1)/4.

More powerful than the sign test (uses magnitude information, not just direction).

Paired data: Apply to differences dᵢ = xᵢ - yᵢ. Nonparametric alternative to paired t-test.

Tests for Two Independent Samples

Mann-Whitney U Test (Wilcoxon Rank-Sum)

Tests whether two populations have the same distribution.

  1. Combine and rank all observations.
  2. U₁ = sum of ranks in sample 1 - n₁(n₁+1)/2.
  3. U = min(U₁, U₂).

Equivalent to testing P(X > Y) = 0.5.

Alternative to: Two-sample t-test (when normality is violated).

For large samples, U is approximately normal: Z = (U - n₁n₂/2) / √(n₁n₂(n₁+n₂+1)/12).

Kolmogorov-Smirnov Two-Sample Test

Tests whether two samples come from the same continuous distribution.

D = max_x |F̂₁(x) - F̂₂(x)|

Maximum distance between the two empirical CDFs.

Sensitive to differences in location, scale, and shape.

Tests for Multiple Groups

Kruskal-Wallis Test

Nonparametric alternative to one-way ANOVA.

Rank all observations across groups:

H = (12/(N(N+1))) Σ nⱼR̄ⱼ² - 3(N+1)

where R̄ⱼ is the mean rank in group j.

Under H₀: H ~ χ²(k-1) approximately.

If significant, follow up with pairwise comparisons (Dunn's test with Bonferroni correction).

Friedman Test

Nonparametric alternative to repeated-measures ANOVA.

Rank within each block (subject), then compare rank sums across treatments.

Goodness-of-Fit Tests

Chi-Squared Goodness-of-Fit

Tests whether observed frequencies match expected frequencies.

χ² = Σ (Oᵢ - Eᵢ)² / Eᵢ

df = k - 1 - (number of estimated parameters).

Requires: expected frequencies ≥ 5 (merge categories if needed).

Kolmogorov-Smirnov One-Sample Test

Tests whether a sample comes from a specified continuous distribution.

D = max_x |F̂(x) - F₀(x)|

Maximum distance between empirical CDF and theoretical CDF.

Lilliefors test: Modified KS test specifically for testing normality (adjusts critical values since parameters are estimated from data).

Anderson-Darling Test

Like KS but gives more weight to the tails. More powerful for detecting tail deviations.

Shapiro-Wilk Test

Specifically tests normality. Generally most powerful test for normality for small to moderate samples.

Bootstrap Methods

The bootstrap resamples from the observed data to estimate the sampling distribution of a statistic.

Nonparametric Bootstrap

  1. From sample of size n, draw B bootstrap samples (sample with replacement, size n each).
  2. Compute the statistic θ̂ for each bootstrap sample: θ̂₁*, θ̂₂*, ..., θ̂_B*.
  3. Use the distribution of θ̂* to estimate standard error, bias, or confidence intervals.

Bootstrap Confidence Intervals

Percentile method: Use the α/2 and 1-α/2 quantiles of the bootstrap distribution.

BCa (Bias-Corrected and Accelerated): Adjusts for bias and skewness. More accurate.

Basic bootstrap: 2θ̂ - q_{1-α/2}, 2θ̂ - q_{α/2} (reverse percentile).

When Bootstrap Works

  • Works well for smooth statistics (mean, median, variance, correlation)
  • Fails for extreme order statistics (min, max)
  • Requires the sample to be representative of the population
  • Typical B = 1000-10000 resamples

Parametric Bootstrap

  1. Fit a parametric model to the data.
  2. Simulate B datasets from the fitted model.
  3. Compute the statistic on each simulated dataset.

Useful when you believe the parametric model but want better interval estimates.

Kernel Density Estimation

Estimate the PDF nonparametrically:

f̂(x) = (1/nh) Σᵢ K((x - xᵢ)/h)

Kernel choices: Gaussian (most common), Epanechnikov (optimal MSE), uniform (box).

Bandwidth h:

  • Too small: noisy, overfitting (captures noise)
  • Too large: over-smoothed, underfitting (misses features)
  • Silverman's rule of thumb: h = 1.06 · s · n^(-1/5) (optimal for normal data)
  • Cross-validation: Select h minimizing integrated squared error

Multivariate KDE

f̂(x) = (1/n) Σᵢ K_H(x - xᵢ)

where K_H uses a bandwidth matrix H. Curse of dimensionality makes this impractical beyond ~6 dimensions.

Permutation Tests

Idea: Under H₀, the labels are exchangeable. Generate the null distribution by permuting labels.

Algorithm

  1. Compute test statistic T on observed data.
  2. For b = 1 to B:
    • Randomly permute the group labels.
    • Compute T on permuted data: T_b*.
  3. p-value = proportion of T_b* ≥ T (or as extreme as T).

Exact permutation test: Enumerate all C(n₁+n₂, n₁) permutations. Feasible only for small samples.

Advantages: Exact (no distributional assumptions), works for any test statistic, easy to implement.

Example: Test if treatment group has higher mean than control. Permute treatment/control labels 10000 times, compute mean difference each time, see how often permuted difference exceeds observed difference.

Rank-Based Methods Summary

| Parametric Test | Nonparametric Alternative | |---|---| | One-sample t-test | Wilcoxon signed-rank, sign test | | Paired t-test | Wilcoxon signed-rank on differences | | Two-sample t-test | Mann-Whitney U | | One-way ANOVA | Kruskal-Wallis | | Repeated-measures ANOVA | Friedman test | | Pearson correlation | Spearman, Kendall |

Applications in CS

  • A/B testing: Permutation tests and bootstrap CIs for non-normal metrics (revenue, time-on-page). Bootstrap for ratio metrics.
  • Performance testing: Latency distributions are often skewed/heavy-tailed. Use Mann-Whitney for comparisons, bootstrap for CIs.
  • Anomaly detection: KDE for density estimation. Points in low-density regions are anomalies.
  • ML evaluation: Bootstrap for confidence intervals on accuracy, AUC. Permutation tests for feature importance.
  • Data visualization: KDE plots in seaborn/matplotlib. Violin plots use KDE.
  • Simulation validation: KS test to check if simulated data matches theoretical distributions.
  • Network analysis: Permutation tests for community structure significance.