Bias & Fairness
Overview
AI systems amplify the biases present in their training data. A hiring algorithm trained on a decade of hiring decisions learns not just what makes a good candidate but also which candidates the company historically preferred, including any discriminatory patterns. A loan approval model trained on historical approvals inherits redlining practices from decades past. A facial recognition system trained predominantly on light-skinned faces performs dramatically worse on dark-skinned faces.
These are not hypothetical scenarios. They are documented failures from real systems deployed by real companies. The root cause is always the same: the model learned what the data taught it, and the data reflected a biased world.
How Bias Enters AI Systems
Training Data Bias
The most common source. Your model can only learn from the data it sees:
def demonstrate_training_data_bias():
"""
A resume screening model trained on 10 years of hiring data.
The data: 85% of hired engineers were male (reflecting
industry demographics, not candidate quality).
The model learns: male-associated features (names,
hobbies, universities) correlate with 'hired' label.
Result: The model scores female candidates lower,
not because they are less qualified, but because
the historical data contained fewer examples of
hired women.
"""
# The model doesn't see "gender" directly
# But it picks up proxy signals
proxy_features = {
"played lacrosse": "male-correlated",
"women's chess club": "female-correlated",
"captain of team": "male-correlated",
"volunteer coordinator": "female-correlated",
}
# These features become proxies for gender
# The model penalizes them without explicit gender input
Representation Bias
Some groups are underrepresented in training data:
ImageNet (widely used image dataset):
45% of images come from the United States
Images from China and India (37% of world population)
are significantly underrepresented
Result: Models trained on ImageNet are less accurate
for objects, scenes, and people from underrepresented regions
Common Crawl (widely used text dataset):
~56% English content
Languages spoken by billions (Hindi, Swahili, Bengali)
have minimal representation
Result: LLMs trained on this data are dramatically
less capable in underrepresented languages
Measurement Bias
How you define and measure the target variable introduces bias:
def demonstrate_measurement_bias():
"""
Predicting 'employee performance' using performance reviews.
Problem: Performance reviews themselves are biased.
Studies show that women and minorities receive
systematically different feedback language.
Women: 'collaborative', 'supportive', 'helpful'
Men: 'analytical', 'competent', 'independent'
Training a model on these reviews doesn't predict
actual performance -- it predicts how managers
perceive performance through a biased lens.
"""
pass
Feedback Loop Bias
The model's predictions influence future data, which trains the next model:
Predictive policing example:
1. Model predicts high crime in neighborhood A
2. More police deployed to neighborhood A
3. More arrests made in neighborhood A
4. Arrest data confirms neighborhood A has high crime
5. Next model version doubles down on neighborhood A
The model created the data that "proves" it was right.
This is a self-fulfilling prophecy, not evidence of
actual crime distribution.
Real-World Failures
Amazon's Hiring Algorithm (2018)
Amazon built a resume screening tool trained on 10 years of hiring decisions. The model systematically penalized resumes containing the word "women's" (as in "women's chess club") and downgraded graduates of all-women's colleges. Amazon scrapped the tool.
COMPAS Recidivism Prediction
The COMPAS system, used by US courts to predict recidivism, was found to be twice as likely to falsely label Black defendants as high-risk compared to white defendants. The system scored 45% of Black defendants as high-risk who did not reoffend, compared to 23% of white defendants.
Healthcare Algorithm Bias (2019)
A widely used healthcare algorithm determined which patients needed extra care. It used healthcare spending as a proxy for health needs. Because Black patients historically had less access to healthcare (and therefore lower spending), the algorithm systematically underestimated their health needs. It affected an estimated 70 million patients.
Measuring Fairness
Demographic Parity
Equal prediction rates across groups:
def demographic_parity(predictions, sensitive_attribute):
"""Check if positive prediction rates are equal across groups.
Demographic parity: P(Y_hat=1 | A=a) = P(Y_hat=1 | A=b)
In plain English: the approval rate should be the same
regardless of group membership.
"""
groups = {}
for pred, attr in zip(predictions, sensitive_attribute):
if attr not in groups:
groups[attr] = {"total": 0, "positive": 0}
groups[attr]["total"] += 1
if pred == 1:
groups[attr]["positive"] += 1
rates = {
group: data["positive"] / data["total"]
for group, data in groups.items()
}
max_rate = max(rates.values())
min_rate = min(rates.values())
disparity = max_rate - min_rate
print("Positive prediction rates by group:")
for group, rate in rates.items():
print(f" {group}: {rate:.3f}")
print(f" Disparity: {disparity:.3f}")
# The 80% rule (EEOC guideline): the lowest rate should be
# at least 80% of the highest rate
ratio = min_rate / max_rate if max_rate > 0 else 0
print(f" 80% rule ratio: {ratio:.3f} ({'PASS' if ratio >= 0.8 else 'FAIL'})")
return rates, disparity
Equal Opportunity
Equal true positive rates across groups:
def equal_opportunity(predictions, labels, sensitive_attribute):
"""Check if true positive rates are equal across groups.
Equal opportunity: P(Y_hat=1 | Y=1, A=a) = P(Y_hat=1 | Y=1, A=b)
In plain English: among people who actually deserve approval,
the approval rate should be the same regardless of group.
"""
groups = {}
for pred, label, attr in zip(predictions, labels, sensitive_attribute):
if attr not in groups:
groups[attr] = {"true_positive": 0, "actual_positive": 0}
if label == 1:
groups[attr]["actual_positive"] += 1
if pred == 1:
groups[attr]["true_positive"] += 1
tpr = {
group: (data["true_positive"] / data["actual_positive"]
if data["actual_positive"] > 0 else 0)
for group, data in groups.items()
}
print("True positive rates by group:")
for group, rate in tpr.items():
print(f" {group}: {rate:.3f}")
return tpr
Calibration
Calibration checks whether predicted probabilities match actual outcomes for all groups. If the model says 70% chance of default, then 70% of those people should actually default, for every demographic group. Bin predictions into ranges (0.0-0.1, 0.1-0.2, etc.) and compare predicted vs actual rates per group. Gaps above 10% signal calibration problems.
The Impossibility of Fairness
Fairness trade-offs (Chouldechova 2017, Kleinberg et al. 2016):
You cannot simultaneously satisfy all fairness criteria
unless the base rates are equal across groups.
If Group A has a 10% base rate and Group B has a 20% base rate:
- Demographic parity: equal approval rates (ignores base rates)
- Equal opportunity: equal TPR (may produce unequal approval rates)
- Calibration: accurate probabilities (may produce unequal TPR)
You must choose which fairness definition matters most
for your specific application. There is no universal answer.
Hiring: equal opportunity often preferred (equal chance for
qualified candidates)
Lending: calibration often preferred (accurate risk assessment)
Criminal justice: demographic parity often argued
(equal treatment under the law)
Mitigation Strategies
Pre-processing: Fix the Data
def rebalance_dataset(dataset, sensitive_col, label_col):
"""Balance the dataset across demographic groups.
Ensure each group has similar representation
and similar label distributions.
"""
groups = dataset.groupby(sensitive_col)
balanced_dfs = []
target_size = min(len(g) for _, g in groups)
for group_name, group_df in groups:
# Sample to target size, stratified by label
sampled = group_df.groupby(label_col).apply(
lambda x: x.sample(
n=min(len(x), target_size // group_df[label_col].nunique()),
random_state=42,
)
).reset_index(drop=True)
balanced_dfs.append(sampled)
return pd.concat(balanced_dfs, ignore_index=True)
In-processing: Constrained Training
Add a fairness penalty to the loss function: loss = task_loss + lambda * fairness_penalty. This trades some accuracy for fairness. The weight controls the trade-off.
Post-processing: Adjust Outputs
Use different classification thresholds per group. If Group A needs a threshold of 0.3 and Group B needs 0.5 to achieve the same positive rate, use group-specific thresholds. This is controversial because it explicitly treats groups differently, but it can achieve demographic parity.
Fairness Auditing in Practice
Run a comprehensive fairness audit before deployment: compute demographic parity, equal opportunity, and calibration metrics for every sensitive attribute. Apply the 80% rule (EEOC guideline): the lowest group's positive rate should be at least 80% of the highest group's rate. Flag any column that fails.
Common Pitfalls
- Assuming removing protected attributes fixes bias: Models learn proxy features. Removing "gender" doesn't help if "played on women's lacrosse team" is still in the data.
- Optimizing for one fairness metric while ignoring others: Achieving demographic parity can worsen calibration. Understand the trade-offs for your specific use case.
- Treating fairness as a one-time check: Bias can emerge over time as data distributions shift. Monitor fairness metrics continuously in production.
- Ignoring intersectional bias: A model might be fair for women overall and fair for Black applicants overall, but unfair for Black women specifically. Check subgroup intersections.
- Using biased data to define "ground truth": If your labels come from a biased process (e.g., biased performance reviews), a "fair" model trained on those labels still encodes the original bias.
- Believing AI is objective: AI is as objective as the data and decisions that created it. The appearance of mathematical objectivity makes bias harder to detect, not less present.
Key Takeaways
- AI amplifies biases in training data. A model trained on biased hiring decisions will make biased hiring recommendations.
- Bias enters through training data, measurement processes, representation gaps, and feedback loops. All four must be addressed.
- There are multiple definitions of fairness (demographic parity, equal opportunity, calibration), and they cannot all be satisfied simultaneously. Choose the definition that best fits your use case.
- Fairness auditing should happen before deployment and continuously in production. Use quantitative metrics, not assumptions.
- Mitigation strategies exist at every stage: fix the data, constrain the training, or adjust the outputs. Each approach has trade-offs.
- Removing protected attributes does not remove bias. Models learn proxy features that correlate with the attributes you removed.