4 min read
On this page

The AI Landscape for Engineers

Overview

The AI landscape changes fast, but the categories of tools are stable. As an engineer building AI-powered features, you need to know what is available, what each category is good at, what it costs, and how to make the build-vs-buy decision. This document is a practical map of the terrain.

Foundation Models

Foundation models are large language models trained on massive datasets. They are general-purpose: you give them text, they give you text back. The quality, speed, and cost vary significantly.

Proprietary Models

Model Family     Provider      Strengths                          Cost (approx per 1M tokens)
─────────────────────────────────────────────────────────────────────────────────────────────
GPT-4o           OpenAI        Strong all-around, fast            $2.50 input / $10 output
GPT-4o-mini      OpenAI        Good for simple tasks, cheap       $0.15 input / $0.60 output
Claude Opus      Anthropic     Excellent reasoning, long context  $15 input / $75 output
Claude Sonnet    Anthropic     Strong balance of quality/cost     $3 input / $15 output
Claude Haiku     Anthropic     Fast, cheap, good for routing      $0.25 input / $1.25 output
Gemini Pro       Google        Multimodal, large context          $1.25 input / $5 output
Gemini Flash     Google        Very fast, cost-effective          $0.075 input / $0.30 output

Prices change frequently. The trend is consistently downward. What cost 60/1Mtokensin2023costs60/1M tokens in 2023 costs 3 in 2025.

Open Source Models

Model            Parameters    Strengths                     License
────────────────────────────────────────────────────────────────────────
Llama 3.1        8B-405B       Strong general purpose         Meta license
Llama 3.2        1B-90B        Multimodal, small variants     Meta license
Mistral Large    123B          Strong reasoning               Apache 2.0
Mixtral 8x22B    141B (MoE)    Fast, efficient architecture   Apache 2.0
Qwen 2.5         0.5B-72B     Multilingual, code             Apache 2.0
DeepSeek V3      685B (MoE)    Competitive with proprietary   MIT
Gemma 2          2B-27B        Good for fine-tuning           Google license
Phi-3            3.8B-14B     Small but capable               MIT

Open source models are useful when you need:

  • Data privacy: Data never leaves your infrastructure
  • Cost control at scale: No per-token API fees once deployed
  • Customization: Full control over fine-tuning and serving
  • Offline/edge deployment: No internet connection required

The tradeoff: you manage the infrastructure. GPU hosting, model serving, scaling, monitoring. This is significant operational overhead.

When to Choose What

Use proprietary APIs when:
  - You are prototyping or in early stages
  - Your volume is low to moderate (< 1M requests/day)
  - You need the best quality available
  - You don't have GPU infrastructure expertise

Use open source models when:
  - Data cannot leave your infrastructure (healthcare, finance, government)
  - You need to fine-tune for a specific domain
  - Your volume is very high and cost matters
  - You need sub-50ms latency (self-hosted, optimized)
  - You need to run offline or at the edge

Embedding Models

Embedding models convert text into numerical vectors that capture meaning. Two sentences with similar meaning will have similar vectors. This enables semantic search, clustering, and similarity comparison.

from openai import OpenAI

client = OpenAI()

# Generate embeddings for semantic search
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I reset my password?"
)

vector = response.data[0].embedding  # List of 1536 floats
# Similar questions will have vectors that are close together
# "I forgot my login credentials" -> similar vector
# "What is the weather today?" -> very different vector

Key Embedding Models

Model                      Dimensions    Cost/1M tokens    Notes
───────────────────────────────────────────────────────────────────────
text-embedding-3-small     1536          $0.02             Best value for most use cases
text-embedding-3-large     3072          $0.13             Higher quality, 2x storage
Cohere Embed v3            1024          $0.10             Multilingual strength
Voyage AI                  1024          $0.12             Code-specific variants
BGE (open source)          768-1024      Free (self-host)  Strong for its size
E5 (open source)           768-1024      Free (self-host)  Microsoft, multilingual

Embeddings are the foundation of RAG (Retrieval-Augmented Generation), semantic search, and recommendation systems. Almost every AI application that works with documents needs an embedding model.

Image Generation

Model              Provider     Strengths                          Cost per image
─────────────────────────────────────────────────────────────────────────────────
DALL-E 3           OpenAI       Good text rendering, safe          $0.04-$0.12
Stable Diffusion   Stability    Open source, customizable          Free (self-host)
Midjourney         Midjourney   Highest aesthetic quality           Subscription
Flux               Black Forest Strong prompt adherence             Varies
Imagen 3           Google       Photorealistic, good composition   API pricing

Image generation is mature for creative and marketing use cases. It is less reliable for precise, technical images. If you need exact diagrams or charts, use a drawing library.

Speech & Audio

Task                Model/Service           Cost                  Notes
──────────────────────────────────────────────────────────────────────────
Speech-to-text      Whisper (OpenAI)        $0.006/minute         Open source, self-hostable
Speech-to-text      Deepgram                $0.0043/minute        Real-time streaming
Speech-to-text      AssemblyAI              $0.01/minute          Speaker diarization
Text-to-speech      OpenAI TTS              $15/1M chars          6 voices, natural
Text-to-speech      ElevenLabs              $0.18/1K chars        Voice cloning, emotional
Text-to-speech      Coqui (open source)     Free (self-host)      Good quality, customizable

Whisper is the default choice for speech-to-text. It handles accents, background noise, and multiple languages well. For real-time transcription (live calls, meetings), Deepgram or AssemblyAI are better because they support streaming.

The Build vs Buy Decision

This is the most important decision framework for an engineer working with AI.

Almost Always Buy (Use an API)

# This is the right approach for 90% of AI features
import anthropic

client = anthropic.Anthropic()

def summarize_article(article_text: str) -> str:
    """Summarize an article. Uses Claude API.
    
    Why API and not custom model:
    - Works immediately, no training needed
    - Handles any topic without domain-specific data
    - Maintained and improved by Anthropic
    - Cost: ~$0.003 per article (negligible)
    """
    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=300,
        messages=[
            {"role": "user", "content": f"Summarize this article in 3 bullet points:\n\n{article_text}"}
        ]
    )
    return message.content[0].text

Use an API when:

  • The task is general (summarization, classification, extraction, translation)
  • You are building an MVP or prototype
  • You don't have labeled training data
  • You need to ship in days, not months
  • The API quality meets your requirements

Consider Building When

  • Data privacy is non-negotiable: Regulated industries where data cannot leave your infrastructure
  • Latency requirements are extreme: Sub-10ms inference that API round-trips cannot meet
  • Cost at scale is prohibitive: Processing millions of requests per day where API costs exceed infrastructure costs
  • You need deep customization: The model needs to behave in ways that prompting alone cannot achieve

The Cost Crossover

API costs scale linearly. Infrastructure costs are mostly fixed.

  Requests/day    API cost/month     Self-hosted cost/month
  ─────────────────────────────────────────────────────────
  1,000           $30                $2,000 (GPU instance)
  10,000          $300               $2,000
  100,000         $3,000             $2,000-$4,000
  1,000,000       $30,000            $4,000-$8,000
  
  The crossover point is typically 50K-200K requests/day,
  depending on the specific model and task complexity.

Below the crossover, APIs win on total cost because you pay nothing for infrastructure, ops, or model maintenance. Above it, self-hosting can be dramatically cheaper, but only if you have the team to operate it.

Putting It Together

A realistic AI-powered application might use multiple models:

Customer Support Bot Architecture:

  User message
    ↓
  Intent classification (GPT-4o-mini, cheap & fast)
    ↓
  Route to handler:
    ├─ FAQ → RAG with embeddings (text-embedding-3-small)
    │        + generation (Claude Haiku, fast & cheap)
    ├─ Complex issue → Full reasoning (Claude Sonnet)
    ├─ Voice call → Transcription (Whisper)
    │               + same pipeline as text
    └─ Image attachment → Vision model (GPT-4o)
    
  Total cost per conversation: $0.01-$0.05

Use cheap, fast models for routing and simple tasks. Use expensive, capable models only when the task demands it. This tiered approach can reduce costs by 80% compared to using the best model for everything.

Common Pitfalls

  • Using the biggest model for everything: GPT-4o or Claude Opus for simple classification is like using a sledgehammer to hang a picture. Use the smallest model that gets the job done.
  • Ignoring open source: For many tasks, a fine-tuned 8B parameter open source model outperforms a general-purpose API model at a fraction of the cost.
  • Not accounting for latency: API calls add 200ms-5s of latency. If your feature is latency-sensitive, factor this into the design from the start.
  • Vendor lock-in without abstraction: Wrap your AI calls behind an interface. Switching from OpenAI to Anthropic should be a configuration change, not a rewrite.
  • Ignoring rate limits: Every API has rate limits. At scale, you need queuing, retries, and backoff. Design for this from day one.
  • Assuming prices are fixed: AI API pricing drops 30-50% per year. A "too expensive" model today may be affordable in six months. Revisit cost assumptions quarterly.

Key Takeaways

  • The AI landscape has clear categories: foundation models (proprietary and open source), embedding models, image generation, and speech/audio. Know what each category does.
  • Almost always start with a proprietary API. It is faster to integrate, requires no infrastructure, and quality is high. Build custom only when you have a specific reason.
  • Use tiered model strategies: cheap models for simple tasks, expensive models for complex ones. This optimizes both cost and latency.
  • Embedding models are the foundation of semantic search and RAG. They are cheap and essential for most AI applications.
  • Wrap AI calls behind abstractions to avoid vendor lock-in. The landscape shifts fast and you need the ability to switch providers.