Applied AI vs ML Theory
Overview
Computer science covers the math behind machine learning: gradient descent, loss functions, backpropagation, regularization. That knowledge matters. But knowing how a neural network learns weights does not tell you how to build an AI-powered product that users rely on every day.
Applied AI is the discipline of taking ML capabilities and turning them into reliable, maintainable software. The gap between "I trained a model on a Jupyter notebook" and "this system handles 10,000 requests per minute with 99.9% uptime" is enormous. This document covers that gap.
The Theory-Practice Gap
What ML Theory Teaches You
- How gradient descent optimizes a loss function
- Why convolutional layers work for images
- The bias-variance tradeoff
- How attention mechanisms enable transformers
- Mathematical properties of different activation functions
What Applied AI Requires
- Choosing between building a model and calling an API
- Handling malformed input gracefully at inference time
- Monitoring model performance in production and detecting drift
- Managing costs when every API call costs money
- Building fallback paths when the model fails or returns nonsense
- Versioning prompts, models, and evaluation datasets together
The theory gives you the "why." Applied AI gives you the "how" and the "what happens when things go wrong."
The 90% That Isn't Training Models
Most AI work in production systems has nothing to do with training. A typical breakdown of effort in an AI-powered feature:
Data collection & cleaning: 25%
Integration & infrastructure: 25%
Evaluation & testing: 20%
Monitoring & maintenance: 15%
Prompt engineering / modeling: 10%
Actual model training: 5%
This surprises people who come from an academic background. In a research lab, model architecture and training dominate. In production, they are a small piece.
Data Work
The most time-consuming part of any AI project is getting the data right. This means:
- Collecting representative examples of the problem you are solving
- Cleaning data: removing duplicates, fixing encoding issues, handling missing fields
- Labeling data: often requires domain experts, not just engineers
- Building pipelines that keep data fresh as your product evolves
Integration Work
The model is one component in a larger system. Integration includes:
- Building API endpoints that wrap model inference
- Handling authentication, rate limiting, and request validation
- Managing timeouts (LLM calls can take 5-30 seconds)
- Implementing caching to avoid redundant expensive calls
- Building retry logic with exponential backoff
Evaluation Work
You cannot ship what you cannot measure. Evaluation includes:
- Building test datasets that represent real user inputs
- Defining metrics that correlate with user satisfaction
- Running automated evals on every prompt or model change
- Comparing model versions before deploying updates
When ML Is the Right Tool
Good Fits for ML
# Natural language understanding — ML excels here
def classify_support_ticket(ticket_text: str) -> str:
"""Route a support ticket to the right team.
Why ML: Language is ambiguous. "My payment didn't go through"
could be billing, fraud, or technical. Rules can't cover all cases.
"""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Classify this support ticket into one of: billing, technical, fraud, general. Return only the category."},
{"role": "user", "content": ticket_text}
],
temperature=0
)
return response.choices[0].message.content.strip().lower()
Good fits include:
- Natural language tasks: classification, extraction, summarization, translation
- Recommendation: "users who liked X also liked Y"
- Search & retrieval: semantic search, ranking results by relevance
- Image/audio processing: object detection, transcription, generation
- Anomaly detection: fraud, intrusion detection, quality control
Bad Fits for ML
# Tax calculation — don't use ML for this
def calculate_sales_tax(amount: float, state: str) -> float:
"""Calculate sales tax. This is deterministic.
Why NOT ML: Tax rates are defined by law. There is exactly one
correct answer. An ML model that is "usually right" about tax
calculations will get you audited.
"""
tax_rates = {
"CA": 0.0725,
"TX": 0.0625,
"NY": 0.08,
"OR": 0.0, # Oregon has no sales tax
}
return amount * tax_rates.get(state, 0.0)
Bad fits include:
- Deterministic logic: tax calculations, access control rules, data transformations
- Small, well-defined rule sets: if you can write it as a lookup table, do that
- Regulatory requirements: anywhere "sometimes wrong" is unacceptable
- Low-data domains: if you have 50 examples, you don't have enough signal
- Real-time deterministic control: safety-critical systems, financial settlement
The Decision Framework
Should you use AI/ML?
1. Is the problem well-defined with deterministic rules?
YES → Write rules. Skip ML.
NO → Continue.
2. Do you have data (or can you generate/buy it)?
NO → You can't do ML. Use rules or heuristics.
YES → Continue.
3. Is "sometimes wrong" acceptable?
NO → ML is risky. Consider ML + human review.
YES → Continue.
4. Can a human do this task in under 10 seconds?
YES → ML can probably do it too.
NO → ML might struggle. Break the task down.
5. Does a pre-trained model or API already solve this?
YES → Use the API. Don't train anything.
NO → Consider fine-tuning or building.
Real-World Example: Document Processing
A company needs to extract key fields from invoices (vendor name, total amount, due date, line items).
The theory-only approach: Train a custom document understanding model. Collect 10,000 labeled invoices. Fine-tune a layout-aware transformer. Build training infrastructure. Three months of work.
The applied approach:
- Start with an LLM API call: send the invoice text to GPT-4o with extraction instructions
- Validate the output against expected formats (is the amount a valid number? is the date parseable?)
- Build a test set of 100 invoices with known correct answers
- Measure accuracy: 94% on first attempt
- Add few-shot examples for the edge cases: accuracy goes to 97%
- Add a human review queue for low-confidence extractions
- Ship it. Total development time: two weeks
The applied approach is not less sophisticated. It is more pragmatic. If accuracy needs to reach 99.5%, you can invest in fine-tuning later, with a working system already in production generating the training data you need.
The Build vs Integrate Decision
Almost always start here:
API call to a foundation model (OpenAI, Anthropic, Google)
↓ Not good enough?
Better prompts + few-shot examples
↓ Still not good enough?
RAG (retrieve relevant context)
↓ Still not good enough?
Fine-tune an existing model
↓ Still not good enough?
Train a custom model from scratch (you probably don't need this)
Each step up this ladder is 5-10x more expensive in time and money. Most production AI features live in the first two levels.
Common Pitfalls
- Starting with training instead of prompting: The most common mistake. Try the simplest approach first. You can always add complexity later; removing it is harder.
- Treating ML as a black box: "We'll throw ML at it" is not a plan. You need to define what success looks like, how you'll measure it, and what happens when the model is wrong.
- Ignoring the data pipeline: The model is 5% of the system. If your data pipeline is fragile, your AI feature is fragile.
- Optimizing for the wrong metric: High accuracy on a test set means nothing if users hate the output. Measure what matters to users.
- No fallback path: Every ML system needs a graceful degradation strategy. What happens when the API is down? When the model returns garbage? When latency spikes to 30 seconds?
- Skipping evaluation: If you don't have automated evals, you are shipping blind. Every prompt change, every model upgrade, every data change should be tested against a known-good dataset.
Key Takeaways
- Applied AI is about building reliable systems, not training models. The gap between theory and production is where most of the work lives.
- 90% of AI work is data, integration, evaluation, and monitoring. Model training or prompt writing is a small fraction.
- Use ML when the problem involves ambiguity, natural language, or pattern recognition. Use rules when the logic is deterministic.
- Always start with the simplest approach: an API call with a good prompt. Escalate complexity only when measurement proves you need it.
- Every AI feature needs evaluation, monitoring, and a fallback path. These are not optional extras.