When to Use AI
Overview
AI is a powerful tool, but it is not the right tool for every problem. Engineers who understand when AI adds value and when it adds unnecessary complexity make better architectural decisions. This document provides a practical decision framework for evaluating whether AI belongs in your solution.
The core question is not "can AI do this?" but "should AI do this?" A language model can multiply two numbers, but a calculator is faster, cheaper, and always correct.
Good Fits for AI
Natural Language Processing
Any task that involves understanding, generating, or transforming human language is a strong candidate for AI. Language is inherently ambiguous, context-dependent, and impossible to handle with deterministic rules at scale.
# Sentiment analysis: a perfect AI use case
# Language is ambiguous. "This product is sick!" could be positive or negative.
# Rules can't cover the full range of human expression.
def analyze_customer_feedback(feedback: str) -> dict:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Analyze the sentiment of this customer feedback. Return JSON with 'sentiment' (positive/negative/neutral), 'confidence' (0-1), and 'key_issues' (list of strings)."},
{"role": "user", "content": feedback}
],
temperature=0,
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content)
# "The app works great but crashes when I upload large files"
# -> {"sentiment": "mixed", "confidence": 0.85, "key_issues": ["crash on large file upload"]}
Strong NLP use cases:
- Text classification: Spam detection, ticket routing, content moderation
- Information extraction: Pulling structured data from unstructured text (names, dates, amounts from emails)
- Summarization: Condensing long documents into key points
- Translation: Converting between languages, including domain-specific terminology
- Conversational interfaces: Chatbots, virtual assistants, natural language database queries
Classification & Pattern Recognition
When the decision space is large and the patterns are complex, ML outperforms hand-written rules.
Good classification problems for AI:
Medical imaging — Is this X-ray showing a fracture?
Thousands of possible patterns, subtle differences.
Fraud detection — Is this transaction fraudulent?
Combines dozens of signals: amount, location,
time, merchant type, user history.
Content moderation — Does this image violate policies?
Policies are nuanced and context-dependent.
Code review — Does this code have potential bugs?
Patterns are complex and language-specific.
Recommendation & Personalization
Recommendations work by finding patterns in user behavior that are too complex for explicit rules. When you have enough user interaction data, ML-based recommendations significantly outperform rule-based approaches.
Recommendation use cases:
E-commerce — "Customers who bought X also bought Y"
Content platforms — "Videos you might enjoy based on watch history"
Search ranking — Personalized ordering of search results
Email prioritization — Which emails should be at the top of the inbox
Ad targeting — Which users are most likely to click
Search & Retrieval
Semantic search understands meaning, not just keywords. "How do I cancel my subscription?" matches "Steps to end your membership" even though they share no words.
# Semantic search: AI finds meaning, not just keywords
def search_knowledge_base(query: str, documents: list[dict]) -> list[dict]:
"""Search documents by meaning, not just keywords.
"password reset" matches "how to change my login credentials"
because the meaning is similar, even though no words overlap.
"""
query_embedding = get_embedding(query)
results = []
for doc in documents:
similarity = cosine_similarity(query_embedding, doc["embedding"])
if similarity > 0.75:
results.append({"document": doc, "score": similarity})
return sorted(results, key=lambda x: x["score"], reverse=True)
Anomaly Detection
Finding unusual patterns in large datasets where "unusual" is hard to define explicitly.
- Network intrusion detection
- Manufacturing quality control
- Infrastructure monitoring (unusual CPU/memory patterns)
- Financial transaction monitoring
Bad Fits for AI
Deterministic Logic
If there is exactly one correct answer and the rules to compute it are known, do not use AI. AI introduces uncertainty where none is needed.
# BAD: Using AI for deterministic logic
def calculate_shipping_cost_bad(weight_kg: float, zone: str) -> float:
"""Don't do this. The model might hallucinate a price."""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"What is the shipping cost for {weight_kg}kg to zone {zone}?"}]
)
return float(response.choices[0].message.content) # Unreliable
# GOOD: Deterministic lookup
def calculate_shipping_cost_good(weight_kg: float, zone: str) -> float:
"""Lookup table. Always correct. No API call needed."""
rates = {
"domestic": 5.00 + (weight_kg * 1.50),
"europe": 12.00 + (weight_kg * 3.00),
"international": 20.00 + (weight_kg * 5.00),
}
return rates[zone]
More examples of deterministic logic that should not use AI:
- Tax calculations
- Currency conversion (use current exchange rates)
- Date/time arithmetic
- Access control checks (does user X have permission Y?)
- Data format transformations (CSV to JSON)
- Mathematical computations
Small Datasets
ML needs patterns, and patterns need data. If you have 20 examples, you do not have enough signal for a model to learn meaningfully.
Rough data requirements:
Task Minimum examples Ideal examples
──────────────────────────────────────────────────────────────
Binary classification 500 5,000+
Multi-class (10 classes) 1,000 10,000+
Named entity recognition 2,000 20,000+
Custom fine-tuning 500 5,000+
Few-shot with LLM API 3-10 3-10 (in prompt)
Exception: Modern LLMs with few-shot prompting can work with
just 3-10 examples in the prompt. This is the approach to use
when you have very little data.
Regulatory & Compliance Requirements
Some domains require explainability, auditability, or deterministic behavior that ML systems cannot guarantee.
Regulatory considerations:
Healthcare diagnosis — Regulators may require explainable decisions.
"The model said so" is not an acceptable
justification for a treatment decision.
Financial lending — Fair lending laws require you to explain why
a loan was denied. Black-box models struggle here.
Criminal justice — Risk assessment models have documented bias
issues. Legal challenges are ongoing.
Safety-critical — Autonomous vehicles, medical devices,
aviation systems. "Usually correct" is
not good enough when lives are at stake.
This does not mean AI cannot be used in these domains. It means AI should augment human decision-making, not replace it. A radiologist using AI to highlight potential issues is good. An AI system making treatment decisions without human oversight is not.
When "Sometimes Wrong" Is Unacceptable
Tasks where errors have severe consequences:
Financial settlement — Sending $1M to the wrong account
Medication dosing — A 10x dose error could be fatal
Legal document filing — Wrong jurisdiction, wrong court
Cryptographic operations — One wrong bit and everything breaks
Database migrations — Incorrect schema changes corrupt data
These tasks require deterministic correctness.
AI should not be the decision-maker here.
The Decision Framework
Step 1: Define the problem precisely
┌─────────────────────────────────────────────────┐
│ What is the input? What is the expected output? │
│ What does "correct" mean? How will you measure? │
└─────────────────────────────────────────────────┘
↓
Step 2: Can you solve it with rules?
┌─────────────────────────────────────────────────┐
│ YES → Write rules. They are faster, cheaper, │
│ more predictable, and easier to debug. │
│ NO → Continue to Step 3. │
└─────────────────────────────────────────────────┘
↓
Step 3: What is the cost of errors?
┌─────────────────────────────────────────────────┐
│ HIGH → Use AI as an assistant, not the decider. │
│ Human-in-the-loop for final decisions. │
│ LOW → AI can operate autonomously. │
└─────────────────────────────────────────────────┘
↓
Step 4: Do you have data or access to a capable API?
┌─────────────────────────────────────────────────┐
│ NO DATA + NO API → You cannot do ML. Stop here. │
│ API AVAILABLE → Use the API (most common). │
│ DATA AVAILABLE → Consider fine-tuning or RAG. │
└─────────────────────────────────────────────────┘
↓
Step 5: What are the latency and cost constraints?
┌─────────────────────────────────────────────────┐
│ < 50ms required → Self-host a small model or │
│ use embeddings + fast search. │
│ < 500ms required → API call works. │
│ Seconds OK → Any approach works. │
└─────────────────────────────────────────────────┘
Hybrid Approaches
The best solutions often combine AI with deterministic logic.
def process_insurance_claim(claim: dict) -> dict:
"""Hybrid approach: AI extracts, rules validate, humans decide."""
# Step 1: AI extracts information from claim documents
extracted = ai_extract_claim_details(claim["documents"])
# Step 2: Deterministic rules check for obvious issues
validation_errors = []
if extracted["claim_amount"] > claim["policy_limit"]:
validation_errors.append("Claim exceeds policy limit")
if extracted["incident_date"] < claim["policy_start_date"]:
validation_errors.append("Incident before policy start")
# Step 3: AI classifies complexity
complexity = ai_classify_complexity(extracted)
# Step 4: Route based on deterministic rules
if validation_errors:
return {"status": "rejected", "reasons": validation_errors}
elif complexity == "simple" and extracted["claim_amount"] < 1000:
return {"status": "auto_approved", "amount": extracted["claim_amount"]}
else:
return {"status": "needs_review", "assigned_to": "claims_team"}
This pattern appears everywhere in production AI systems:
- AI handles the ambiguous parts (extraction, classification, generation)
- Rules handle the deterministic parts (validation, routing, calculations)
- Humans handle the high-stakes decisions
Common Pitfalls
- Reaching for AI when a regex would work: If you need to extract email addresses from a fixed-format report, a regular expression is faster, cheaper, and 100% reliable. AI is for when patterns are complex and variable.
- Using AI because it's trendy: "We should add AI to this" is not a requirement. Start with the user problem, then evaluate solutions. Sometimes the best solution is a well-designed form.
- Underestimating the "sometimes wrong" problem: A model that is 95% accurate sounds good until you realize that means 1 in 20 users gets a wrong answer. At scale, that is thousands of errors per day.
- Not defining success criteria upfront: "We want AI to make this better" is not measurable. Define specific metrics: "AI should correctly classify 90% of tickets within 2 seconds."
- Ignoring the maintenance burden: AI systems need ongoing monitoring, evaluation dataset updates, prompt tuning, and model upgrades. They are not "set and forget."
- All-or-nothing thinking: AI does not have to make the final decision. Using AI to assist humans (highlight, suggest, pre-fill) captures most of the value with much less risk.
Key Takeaways
- AI excels at tasks involving natural language, pattern recognition, recommendations, semantic search, and anomaly detection. These are problems where rules-based approaches fail.
- AI is a poor fit for deterministic logic, small datasets, regulatory-heavy domains, and any task where "sometimes wrong" has severe consequences.
- Use the decision framework: define the problem, check if rules work, assess error cost, evaluate data availability, then consider constraints.
- Hybrid approaches (AI + rules + human oversight) are the most common and most effective pattern in production systems.
- Always define measurable success criteria before starting. "Add AI" is not a goal; "reduce ticket routing errors by 60%" is.