5 min read
On this page

Applied AI vs ML Theory

Overview

Computer science covers the math behind machine learning: gradient descent, loss functions, backpropagation, regularization. That knowledge matters. But knowing how a neural network learns weights does not tell you how to build an AI-powered product that users rely on every day.

Applied AI is the discipline of taking ML capabilities and turning them into reliable, maintainable software. The gap between "I trained a model on a Jupyter notebook" and "this system handles 10,000 requests per minute with 99.9% uptime" is enormous. This document covers that gap.

The Theory-Practice Gap

What ML Theory Teaches You

  • How gradient descent optimizes a loss function
  • Why convolutional layers work for images
  • The bias-variance tradeoff
  • How attention mechanisms enable transformers
  • Mathematical properties of different activation functions

What Applied AI Requires

  • Choosing between building a model and calling an API
  • Handling malformed input gracefully at inference time
  • Monitoring model performance in production and detecting drift
  • Managing costs when every API call costs money
  • Building fallback paths when the model fails or returns nonsense
  • Versioning prompts, models, and evaluation datasets together

The theory gives you the "why." Applied AI gives you the "how" and the "what happens when things go wrong."

The 90% That Isn't Training Models

Most AI work in production systems has nothing to do with training. A typical breakdown of effort in an AI-powered feature:

Data collection & cleaning:     25%
Integration & infrastructure:   25%
Evaluation & testing:           20%
Monitoring & maintenance:       15%
Prompt engineering / modeling:   10%
Actual model training:            5%

This surprises people who come from an academic background. In a research lab, model architecture and training dominate. In production, they are a small piece.

Data Work

The most time-consuming part of any AI project is getting the data right. This means:

  • Collecting representative examples of the problem you are solving
  • Cleaning data: removing duplicates, fixing encoding issues, handling missing fields
  • Labeling data: often requires domain experts, not just engineers
  • Building pipelines that keep data fresh as your product evolves

Integration Work

The model is one component in a larger system. Integration includes:

  • Building API endpoints that wrap model inference
  • Handling authentication, rate limiting, and request validation
  • Managing timeouts (LLM calls can take 5-30 seconds)
  • Implementing caching to avoid redundant expensive calls
  • Building retry logic with exponential backoff

Evaluation Work

You cannot ship what you cannot measure. Evaluation includes:

  • Building test datasets that represent real user inputs
  • Defining metrics that correlate with user satisfaction
  • Running automated evals on every prompt or model change
  • Comparing model versions before deploying updates

When ML Is the Right Tool

Good Fits for ML

# Natural language understanding — ML excels here
def classify_support_ticket(ticket_text: str) -> str:
    """Route a support ticket to the right team.
    
    Why ML: Language is ambiguous. "My payment didn't go through"
    could be billing, fraud, or technical. Rules can't cover all cases.
    """
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Classify this support ticket into one of: billing, technical, fraud, general. Return only the category."},
            {"role": "user", "content": ticket_text}
        ],
        temperature=0
    )
    return response.choices[0].message.content.strip().lower()

Good fits include:

  • Natural language tasks: classification, extraction, summarization, translation
  • Recommendation: "users who liked X also liked Y"
  • Search & retrieval: semantic search, ranking results by relevance
  • Image/audio processing: object detection, transcription, generation
  • Anomaly detection: fraud, intrusion detection, quality control

Bad Fits for ML

# Tax calculation — don't use ML for this
def calculate_sales_tax(amount: float, state: str) -> float:
    """Calculate sales tax. This is deterministic.
    
    Why NOT ML: Tax rates are defined by law. There is exactly one
    correct answer. An ML model that is "usually right" about tax
    calculations will get you audited.
    """
    tax_rates = {
        "CA": 0.0725,
        "TX": 0.0625,
        "NY": 0.08,
        "OR": 0.0,  # Oregon has no sales tax
    }
    return amount * tax_rates.get(state, 0.0)

Bad fits include:

  • Deterministic logic: tax calculations, access control rules, data transformations
  • Small, well-defined rule sets: if you can write it as a lookup table, do that
  • Regulatory requirements: anywhere "sometimes wrong" is unacceptable
  • Low-data domains: if you have 50 examples, you don't have enough signal
  • Real-time deterministic control: safety-critical systems, financial settlement

The Decision Framework

Should you use AI/ML?

1. Is the problem well-defined with deterministic rules?
   YES → Write rules. Skip ML.
   NO  → Continue.

2. Do you have data (or can you generate/buy it)?
   NO  → You can't do ML. Use rules or heuristics.
   YES → Continue.

3. Is "sometimes wrong" acceptable?
   NO  → ML is risky. Consider ML + human review.
   YES → Continue.

4. Can a human do this task in under 10 seconds?
   YES → ML can probably do it too.
   NO  → ML might struggle. Break the task down.

5. Does a pre-trained model or API already solve this?
   YES → Use the API. Don't train anything.
   NO  → Consider fine-tuning or building.

Real-World Example: Document Processing

A company needs to extract key fields from invoices (vendor name, total amount, due date, line items).

The theory-only approach: Train a custom document understanding model. Collect 10,000 labeled invoices. Fine-tune a layout-aware transformer. Build training infrastructure. Three months of work.

The applied approach:

  1. Start with an LLM API call: send the invoice text to GPT-4o with extraction instructions
  2. Validate the output against expected formats (is the amount a valid number? is the date parseable?)
  3. Build a test set of 100 invoices with known correct answers
  4. Measure accuracy: 94% on first attempt
  5. Add few-shot examples for the edge cases: accuracy goes to 97%
  6. Add a human review queue for low-confidence extractions
  7. Ship it. Total development time: two weeks

The applied approach is not less sophisticated. It is more pragmatic. If accuracy needs to reach 99.5%, you can invest in fine-tuning later, with a working system already in production generating the training data you need.

The Build vs Integrate Decision

Almost always start here:
  API call to a foundation model (OpenAI, Anthropic, Google)
  ↓ Not good enough?
  Better prompts + few-shot examples
  ↓ Still not good enough?
  RAG (retrieve relevant context)
  ↓ Still not good enough?
  Fine-tune an existing model
  ↓ Still not good enough?
  Train a custom model from scratch (you probably don't need this)

Each step up this ladder is 5-10x more expensive in time and money. Most production AI features live in the first two levels.

Common Pitfalls

  • Starting with training instead of prompting: The most common mistake. Try the simplest approach first. You can always add complexity later; removing it is harder.
  • Treating ML as a black box: "We'll throw ML at it" is not a plan. You need to define what success looks like, how you'll measure it, and what happens when the model is wrong.
  • Ignoring the data pipeline: The model is 5% of the system. If your data pipeline is fragile, your AI feature is fragile.
  • Optimizing for the wrong metric: High accuracy on a test set means nothing if users hate the output. Measure what matters to users.
  • No fallback path: Every ML system needs a graceful degradation strategy. What happens when the API is down? When the model returns garbage? When latency spikes to 30 seconds?
  • Skipping evaluation: If you don't have automated evals, you are shipping blind. Every prompt change, every model upgrade, every data change should be tested against a known-good dataset.

Key Takeaways

  • Applied AI is about building reliable systems, not training models. The gap between theory and production is where most of the work lives.
  • 90% of AI work is data, integration, evaluation, and monitoring. Model training or prompt writing is a small fraction.
  • Use ML when the problem involves ambiguity, natural language, or pattern recognition. Use rules when the logic is deterministic.
  • Always start with the simplest approach: an API call with a good prompt. Escalate complexity only when measurement proves you need it.
  • Every AI feature needs evaluation, monitoring, and a fallback path. These are not optional extras.