The AI Landscape for Engineers
Overview
The AI landscape changes fast, but the categories of tools are stable. As an engineer building AI-powered features, you need to know what is available, what each category is good at, what it costs, and how to make the build-vs-buy decision. This document is a practical map of the terrain.
Foundation Models
Foundation models are large language models trained on massive datasets. They are general-purpose: you give them text, they give you text back. The quality, speed, and cost vary significantly.
Proprietary Models
Model Family Provider Strengths Cost (approx per 1M tokens)
─────────────────────────────────────────────────────────────────────────────────────────────
GPT-4o OpenAI Strong all-around, fast $2.50 input / $10 output
GPT-4o-mini OpenAI Good for simple tasks, cheap $0.15 input / $0.60 output
Claude Opus Anthropic Excellent reasoning, long context $15 input / $75 output
Claude Sonnet Anthropic Strong balance of quality/cost $3 input / $15 output
Claude Haiku Anthropic Fast, cheap, good for routing $0.25 input / $1.25 output
Gemini Pro Google Multimodal, large context $1.25 input / $5 output
Gemini Flash Google Very fast, cost-effective $0.075 input / $0.30 output
Prices change frequently. The trend is consistently downward. What cost 3 in 2025.
Open Source Models
Model Parameters Strengths License
────────────────────────────────────────────────────────────────────────
Llama 3.1 8B-405B Strong general purpose Meta license
Llama 3.2 1B-90B Multimodal, small variants Meta license
Mistral Large 123B Strong reasoning Apache 2.0
Mixtral 8x22B 141B (MoE) Fast, efficient architecture Apache 2.0
Qwen 2.5 0.5B-72B Multilingual, code Apache 2.0
DeepSeek V3 685B (MoE) Competitive with proprietary MIT
Gemma 2 2B-27B Good for fine-tuning Google license
Phi-3 3.8B-14B Small but capable MIT
Open source models are useful when you need:
- Data privacy: Data never leaves your infrastructure
- Cost control at scale: No per-token API fees once deployed
- Customization: Full control over fine-tuning and serving
- Offline/edge deployment: No internet connection required
The tradeoff: you manage the infrastructure. GPU hosting, model serving, scaling, monitoring. This is significant operational overhead.
When to Choose What
Use proprietary APIs when:
- You are prototyping or in early stages
- Your volume is low to moderate (< 1M requests/day)
- You need the best quality available
- You don't have GPU infrastructure expertise
Use open source models when:
- Data cannot leave your infrastructure (healthcare, finance, government)
- You need to fine-tune for a specific domain
- Your volume is very high and cost matters
- You need sub-50ms latency (self-hosted, optimized)
- You need to run offline or at the edge
Embedding Models
Embedding models convert text into numerical vectors that capture meaning. Two sentences with similar meaning will have similar vectors. This enables semantic search, clustering, and similarity comparison.
from openai import OpenAI
client = OpenAI()
# Generate embeddings for semantic search
response = client.embeddings.create(
model="text-embedding-3-small",
input="How do I reset my password?"
)
vector = response.data[0].embedding # List of 1536 floats
# Similar questions will have vectors that are close together
# "I forgot my login credentials" -> similar vector
# "What is the weather today?" -> very different vector
Key Embedding Models
Model Dimensions Cost/1M tokens Notes
───────────────────────────────────────────────────────────────────────
text-embedding-3-small 1536 $0.02 Best value for most use cases
text-embedding-3-large 3072 $0.13 Higher quality, 2x storage
Cohere Embed v3 1024 $0.10 Multilingual strength
Voyage AI 1024 $0.12 Code-specific variants
BGE (open source) 768-1024 Free (self-host) Strong for its size
E5 (open source) 768-1024 Free (self-host) Microsoft, multilingual
Embeddings are the foundation of RAG (Retrieval-Augmented Generation), semantic search, and recommendation systems. Almost every AI application that works with documents needs an embedding model.
Image Generation
Model Provider Strengths Cost per image
─────────────────────────────────────────────────────────────────────────────────
DALL-E 3 OpenAI Good text rendering, safe $0.04-$0.12
Stable Diffusion Stability Open source, customizable Free (self-host)
Midjourney Midjourney Highest aesthetic quality Subscription
Flux Black Forest Strong prompt adherence Varies
Imagen 3 Google Photorealistic, good composition API pricing
Image generation is mature for creative and marketing use cases. It is less reliable for precise, technical images. If you need exact diagrams or charts, use a drawing library.
Speech & Audio
Task Model/Service Cost Notes
──────────────────────────────────────────────────────────────────────────
Speech-to-text Whisper (OpenAI) $0.006/minute Open source, self-hostable
Speech-to-text Deepgram $0.0043/minute Real-time streaming
Speech-to-text AssemblyAI $0.01/minute Speaker diarization
Text-to-speech OpenAI TTS $15/1M chars 6 voices, natural
Text-to-speech ElevenLabs $0.18/1K chars Voice cloning, emotional
Text-to-speech Coqui (open source) Free (self-host) Good quality, customizable
Whisper is the default choice for speech-to-text. It handles accents, background noise, and multiple languages well. For real-time transcription (live calls, meetings), Deepgram or AssemblyAI are better because they support streaming.
The Build vs Buy Decision
This is the most important decision framework for an engineer working with AI.
Almost Always Buy (Use an API)
# This is the right approach for 90% of AI features
import anthropic
client = anthropic.Anthropic()
def summarize_article(article_text: str) -> str:
"""Summarize an article. Uses Claude API.
Why API and not custom model:
- Works immediately, no training needed
- Handles any topic without domain-specific data
- Maintained and improved by Anthropic
- Cost: ~$0.003 per article (negligible)
"""
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=300,
messages=[
{"role": "user", "content": f"Summarize this article in 3 bullet points:\n\n{article_text}"}
]
)
return message.content[0].text
Use an API when:
- The task is general (summarization, classification, extraction, translation)
- You are building an MVP or prototype
- You don't have labeled training data
- You need to ship in days, not months
- The API quality meets your requirements
Consider Building When
- Data privacy is non-negotiable: Regulated industries where data cannot leave your infrastructure
- Latency requirements are extreme: Sub-10ms inference that API round-trips cannot meet
- Cost at scale is prohibitive: Processing millions of requests per day where API costs exceed infrastructure costs
- You need deep customization: The model needs to behave in ways that prompting alone cannot achieve
The Cost Crossover
API costs scale linearly. Infrastructure costs are mostly fixed.
Requests/day API cost/month Self-hosted cost/month
─────────────────────────────────────────────────────────
1,000 $30 $2,000 (GPU instance)
10,000 $300 $2,000
100,000 $3,000 $2,000-$4,000
1,000,000 $30,000 $4,000-$8,000
The crossover point is typically 50K-200K requests/day,
depending on the specific model and task complexity.
Below the crossover, APIs win on total cost because you pay nothing for infrastructure, ops, or model maintenance. Above it, self-hosting can be dramatically cheaper, but only if you have the team to operate it.
Putting It Together
A realistic AI-powered application might use multiple models:
Customer Support Bot Architecture:
User message
↓
Intent classification (GPT-4o-mini, cheap & fast)
↓
Route to handler:
├─ FAQ → RAG with embeddings (text-embedding-3-small)
│ + generation (Claude Haiku, fast & cheap)
├─ Complex issue → Full reasoning (Claude Sonnet)
├─ Voice call → Transcription (Whisper)
│ + same pipeline as text
└─ Image attachment → Vision model (GPT-4o)
Total cost per conversation: $0.01-$0.05
Use cheap, fast models for routing and simple tasks. Use expensive, capable models only when the task demands it. This tiered approach can reduce costs by 80% compared to using the best model for everything.
Common Pitfalls
- Using the biggest model for everything: GPT-4o or Claude Opus for simple classification is like using a sledgehammer to hang a picture. Use the smallest model that gets the job done.
- Ignoring open source: For many tasks, a fine-tuned 8B parameter open source model outperforms a general-purpose API model at a fraction of the cost.
- Not accounting for latency: API calls add 200ms-5s of latency. If your feature is latency-sensitive, factor this into the design from the start.
- Vendor lock-in without abstraction: Wrap your AI calls behind an interface. Switching from OpenAI to Anthropic should be a configuration change, not a rewrite.
- Ignoring rate limits: Every API has rate limits. At scale, you need queuing, retries, and backoff. Design for this from day one.
- Assuming prices are fixed: AI API pricing drops 30-50% per year. A "too expensive" model today may be affordable in six months. Revisit cost assumptions quarterly.
Key Takeaways
- The AI landscape has clear categories: foundation models (proprietary and open source), embedding models, image generation, and speech/audio. Know what each category does.
- Almost always start with a proprietary API. It is faster to integrate, requires no infrastructure, and quality is high. Build custom only when you have a specific reason.
- Use tiered model strategies: cheap models for simple tasks, expensive models for complex ones. This optimizes both cost and latency.
- Embedding models are the foundation of semantic search and RAG. They are cheap and essential for most AI applications.
- Wrap AI calls behind abstractions to avoid vendor lock-in. The landscape shifts fast and you need the ability to switch providers.