4 min read
On this page

Bias & Fairness

Overview

AI systems amplify the biases present in their training data. A hiring algorithm trained on a decade of hiring decisions learns not just what makes a good candidate but also which candidates the company historically preferred, including any discriminatory patterns. A loan approval model trained on historical approvals inherits redlining practices from decades past. A facial recognition system trained predominantly on light-skinned faces performs dramatically worse on dark-skinned faces.

These are not hypothetical scenarios. They are documented failures from real systems deployed by real companies. The root cause is always the same: the model learned what the data taught it, and the data reflected a biased world.

How Bias Enters AI Systems

Training Data Bias

The most common source. Your model can only learn from the data it sees:

def demonstrate_training_data_bias():
    """
    A resume screening model trained on 10 years of hiring data.
    
    The data: 85% of hired engineers were male (reflecting
    industry demographics, not candidate quality).
    
    The model learns: male-associated features (names, 
    hobbies, universities) correlate with 'hired' label.
    
    Result: The model scores female candidates lower,
    not because they are less qualified, but because
    the historical data contained fewer examples of
    hired women.
    """
    # The model doesn't see "gender" directly
    # But it picks up proxy signals
    proxy_features = {
        "played lacrosse": "male-correlated",
        "women's chess club": "female-correlated",
        "captain of team": "male-correlated",
        "volunteer coordinator": "female-correlated",
    }
    # These features become proxies for gender
    # The model penalizes them without explicit gender input

Representation Bias

Some groups are underrepresented in training data:

ImageNet (widely used image dataset):
  45% of images come from the United States
  Images from China and India (37% of world population) 
  are significantly underrepresented
  
  Result: Models trained on ImageNet are less accurate
  for objects, scenes, and people from underrepresented regions

Common Crawl (widely used text dataset):
  ~56% English content
  Languages spoken by billions (Hindi, Swahili, Bengali)
  have minimal representation
  
  Result: LLMs trained on this data are dramatically
  less capable in underrepresented languages

Measurement Bias

How you define and measure the target variable introduces bias:

def demonstrate_measurement_bias():
    """
    Predicting 'employee performance' using performance reviews.
    
    Problem: Performance reviews themselves are biased.
    Studies show that women and minorities receive
    systematically different feedback language.
    
    Women: 'collaborative', 'supportive', 'helpful'
    Men: 'analytical', 'competent', 'independent'
    
    Training a model on these reviews doesn't predict
    actual performance -- it predicts how managers
    perceive performance through a biased lens.
    """
    pass

Feedback Loop Bias

The model's predictions influence future data, which trains the next model:

Predictive policing example:

1. Model predicts high crime in neighborhood A
2. More police deployed to neighborhood A
3. More arrests made in neighborhood A
4. Arrest data confirms neighborhood A has high crime
5. Next model version doubles down on neighborhood A

The model created the data that "proves" it was right.
This is a self-fulfilling prophecy, not evidence of
actual crime distribution.

Real-World Failures

Amazon's Hiring Algorithm (2018)

Amazon built a resume screening tool trained on 10 years of hiring decisions. The model systematically penalized resumes containing the word "women's" (as in "women's chess club") and downgraded graduates of all-women's colleges. Amazon scrapped the tool.

COMPAS Recidivism Prediction

The COMPAS system, used by US courts to predict recidivism, was found to be twice as likely to falsely label Black defendants as high-risk compared to white defendants. The system scored 45% of Black defendants as high-risk who did not reoffend, compared to 23% of white defendants.

Healthcare Algorithm Bias (2019)

A widely used healthcare algorithm determined which patients needed extra care. It used healthcare spending as a proxy for health needs. Because Black patients historically had less access to healthcare (and therefore lower spending), the algorithm systematically underestimated their health needs. It affected an estimated 70 million patients.

Measuring Fairness

Demographic Parity

Equal prediction rates across groups:

def demographic_parity(predictions, sensitive_attribute):
    """Check if positive prediction rates are equal across groups.
    
    Demographic parity: P(Y_hat=1 | A=a) = P(Y_hat=1 | A=b)
    
    In plain English: the approval rate should be the same
    regardless of group membership.
    """
    groups = {}
    for pred, attr in zip(predictions, sensitive_attribute):
        if attr not in groups:
            groups[attr] = {"total": 0, "positive": 0}
        groups[attr]["total"] += 1
        if pred == 1:
            groups[attr]["positive"] += 1
    
    rates = {
        group: data["positive"] / data["total"]
        for group, data in groups.items()
    }
    
    max_rate = max(rates.values())
    min_rate = min(rates.values())
    disparity = max_rate - min_rate
    
    print("Positive prediction rates by group:")
    for group, rate in rates.items():
        print(f"  {group}: {rate:.3f}")
    print(f"  Disparity: {disparity:.3f}")
    
    # The 80% rule (EEOC guideline): the lowest rate should be
    # at least 80% of the highest rate
    ratio = min_rate / max_rate if max_rate > 0 else 0
    print(f"  80% rule ratio: {ratio:.3f} ({'PASS' if ratio >= 0.8 else 'FAIL'})")
    
    return rates, disparity

Equal Opportunity

Equal true positive rates across groups:

def equal_opportunity(predictions, labels, sensitive_attribute):
    """Check if true positive rates are equal across groups.
    
    Equal opportunity: P(Y_hat=1 | Y=1, A=a) = P(Y_hat=1 | Y=1, A=b)
    
    In plain English: among people who actually deserve approval,
    the approval rate should be the same regardless of group.
    """
    groups = {}
    for pred, label, attr in zip(predictions, labels, sensitive_attribute):
        if attr not in groups:
            groups[attr] = {"true_positive": 0, "actual_positive": 0}
        if label == 1:
            groups[attr]["actual_positive"] += 1
            if pred == 1:
                groups[attr]["true_positive"] += 1
    
    tpr = {
        group: (data["true_positive"] / data["actual_positive"]
                if data["actual_positive"] > 0 else 0)
        for group, data in groups.items()
    }
    
    print("True positive rates by group:")
    for group, rate in tpr.items():
        print(f"  {group}: {rate:.3f}")
    
    return tpr

Calibration

Calibration checks whether predicted probabilities match actual outcomes for all groups. If the model says 70% chance of default, then 70% of those people should actually default, for every demographic group. Bin predictions into ranges (0.0-0.1, 0.1-0.2, etc.) and compare predicted vs actual rates per group. Gaps above 10% signal calibration problems.

The Impossibility of Fairness

Fairness trade-offs (Chouldechova 2017, Kleinberg et al. 2016):

You cannot simultaneously satisfy all fairness criteria
unless the base rates are equal across groups.

If Group A has a 10% base rate and Group B has a 20% base rate:
  - Demographic parity: equal approval rates (ignores base rates)
  - Equal opportunity: equal TPR (may produce unequal approval rates)
  - Calibration: accurate probabilities (may produce unequal TPR)

You must choose which fairness definition matters most
for your specific application. There is no universal answer.

Hiring: equal opportunity often preferred (equal chance for 
        qualified candidates)
Lending: calibration often preferred (accurate risk assessment)
Criminal justice: demographic parity often argued 
        (equal treatment under the law)

Mitigation Strategies

Pre-processing: Fix the Data

def rebalance_dataset(dataset, sensitive_col, label_col):
    """Balance the dataset across demographic groups.
    
    Ensure each group has similar representation
    and similar label distributions.
    """
    groups = dataset.groupby(sensitive_col)
    
    balanced_dfs = []
    target_size = min(len(g) for _, g in groups)
    
    for group_name, group_df in groups:
        # Sample to target size, stratified by label
        sampled = group_df.groupby(label_col).apply(
            lambda x: x.sample(
                n=min(len(x), target_size // group_df[label_col].nunique()),
                random_state=42,
            )
        ).reset_index(drop=True)
        balanced_dfs.append(sampled)
    
    return pd.concat(balanced_dfs, ignore_index=True)

In-processing: Constrained Training

Add a fairness penalty to the loss function: loss = task_loss + lambda * fairness_penalty. This trades some accuracy for fairness. The weight controls the trade-off.

Post-processing: Adjust Outputs

Use different classification thresholds per group. If Group A needs a threshold of 0.3 and Group B needs 0.5 to achieve the same positive rate, use group-specific thresholds. This is controversial because it explicitly treats groups differently, but it can achieve demographic parity.

Fairness Auditing in Practice

Run a comprehensive fairness audit before deployment: compute demographic parity, equal opportunity, and calibration metrics for every sensitive attribute. Apply the 80% rule (EEOC guideline): the lowest group's positive rate should be at least 80% of the highest group's rate. Flag any column that fails.

Common Pitfalls

  • Assuming removing protected attributes fixes bias: Models learn proxy features. Removing "gender" doesn't help if "played on women's lacrosse team" is still in the data.
  • Optimizing for one fairness metric while ignoring others: Achieving demographic parity can worsen calibration. Understand the trade-offs for your specific use case.
  • Treating fairness as a one-time check: Bias can emerge over time as data distributions shift. Monitor fairness metrics continuously in production.
  • Ignoring intersectional bias: A model might be fair for women overall and fair for Black applicants overall, but unfair for Black women specifically. Check subgroup intersections.
  • Using biased data to define "ground truth": If your labels come from a biased process (e.g., biased performance reviews), a "fair" model trained on those labels still encodes the original bias.
  • Believing AI is objective: AI is as objective as the data and decisions that created it. The appearance of mathematical objectivity makes bias harder to detect, not less present.

Key Takeaways

  • AI amplifies biases in training data. A model trained on biased hiring decisions will make biased hiring recommendations.
  • Bias enters through training data, measurement processes, representation gaps, and feedback loops. All four must be addressed.
  • There are multiple definitions of fairness (demographic parity, equal opportunity, calibration), and they cannot all be satisfied simultaneously. Choose the definition that best fits your use case.
  • Fairness auditing should happen before deployment and continuously in production. Use quantitative metrics, not assumptions.
  • Mitigation strategies exist at every stage: fix the data, constrain the training, or adjust the outputs. Each approach has trade-offs.
  • Removing protected attributes does not remove bias. Models learn proxy features that correlate with the attributes you removed.