Use Cases & Patterns
Embeddings are not a solution looking for a problem. They solve a specific class of problems: anything where you need to find things based on meaning rather than exact match. Once you internalize the pattern — embed everything, compare vectors — you start seeing applications everywhere.
Semantic Search
The most common use case. Traditional keyword search fails when users and documents use different words for the same concept. Semantic search with embeddings bridges this vocabulary gap.
from openai import OpenAI
import numpy as np
client = OpenAI()
def embed(text):
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return np.array(response.data[0].embedding)
# Index time: embed your documents
documents = [
"How to optimize PostgreSQL query performance",
"Debugging memory leaks in Python applications",
"Setting up CI/CD with GitHub Actions",
]
doc_embeddings = [embed(doc) for doc in documents]
# Query time: embed the query and find nearest neighbors
query = "my database queries are slow"
query_embedding = embed(query)
similarities = [
np.dot(query_embedding, doc_emb) / (np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb))
for doc_emb in doc_embeddings
]
# "How to optimize PostgreSQL query performance" ranks first
# despite sharing zero keywords with the query
Hybrid Search
In practice, you want both semantic and keyword search. Some queries are conceptual ("how do I make my app faster"), others are exact ("error code ERR_CONNECTION_REFUSED"). Hybrid search combines both signals.
# Hybrid search with pgvector + tsvector in PostgreSQL
# Combine full-text search score with vector similarity
query_sql = """
WITH semantic AS (
SELECT id, 1 - (embedding <=> %(vec)s::vector) AS vec_score
FROM documents
ORDER BY embedding <=> %(vec)s::vector
LIMIT 50
),
keyword AS (
SELECT id, ts_rank(tsv, plainto_tsquery(%(query)s)) AS text_score
FROM documents
WHERE tsv @@ plainto_tsquery(%(query)s)
LIMIT 50
)
SELECT COALESCE(s.id, k.id) AS id,
COALESCE(s.vec_score, 0) * 0.7 + COALESCE(k.text_score, 0) * 0.3 AS score
FROM semantic s
FULL OUTER JOIN keyword k ON s.id = k.id
ORDER BY score DESC
LIMIT 10
"""
The 0.7/0.3 weighting is a starting point. Tune it based on your data and user behavior.
Recommendation Systems
Recommendations are similarity search in disguise. If a user liked item A, find items whose embeddings are close to A's embedding.
# Content-based recommendations
# Embed product descriptions, find similar products
def get_recommendations(product_id, all_products, all_embeddings, top_k=5):
product_embedding = all_embeddings[product_id]
similarities = []
for pid, emb in all_embeddings.items():
if pid == product_id:
continue
sim = np.dot(product_embedding, emb) / (
np.linalg.norm(product_embedding) * np.linalg.norm(emb)
)
similarities.append((pid, sim))
similarities.sort(key=lambda x: x[1], reverse=True)
return similarities[:top_k]
# User-item recommendations: embed user behavior as a vector
# Average the embeddings of items a user interacted with
def user_profile_embedding(user_interactions, item_embeddings):
"""Create a user embedding from their interaction history."""
vectors = [item_embeddings[item_id] for item_id in user_interactions]
return np.mean(vectors, axis=0)
This approach works surprisingly well as a cold-start recommendation system. You do not need collaborative filtering data to get started — just item descriptions.
Duplicate & Near-Duplicate Detection
Exact duplicate detection is trivial (hash comparison). Near-duplicate detection is where embeddings shine: finding documents, support tickets, or bug reports that say the same thing differently.
from itertools import combinations
def find_near_duplicates(texts, threshold=0.92):
"""Find pairs of texts that are semantically near-duplicates."""
embeddings = [embed(t) for t in texts]
duplicates = []
for i, j in combinations(range(len(texts)), 2):
sim = np.dot(embeddings[i], embeddings[j]) / (
np.linalg.norm(embeddings[i]) * np.linalg.norm(embeddings[j])
)
if sim > threshold:
duplicates.append((i, j, sim))
return duplicates
# Real-world application: deduplicating support tickets
tickets = [
"I can't log in to my account",
"Login page shows error when I enter my password",
"Unable to access my account after password reset",
"How do I change my billing address?",
]
dupes = find_near_duplicates(tickets, threshold=0.85)
# Finds tickets 0, 1, and 2 as near-duplicates
At scale, you do not compare all pairs. Use a vector database to find neighbors within a threshold, which is much more efficient.
Clustering & Categorization
Embed your documents, then cluster the embeddings. This discovers natural groupings in your data without predefined categories.
from sklearn.cluster import KMeans
# Embed all documents
texts = load_all_documents() # your data
embeddings = np.array([embed(t) for t in texts])
# Cluster into groups
kmeans = KMeans(n_clusters=10, random_state=42)
labels = kmeans.fit_predict(embeddings)
# Inspect clusters: look at a sample from each cluster
for cluster_id in range(10):
cluster_docs = [texts[i] for i, l in enumerate(labels) if l == cluster_id]
print(f"Cluster {cluster_id}: {cluster_docs[:3]}")
This is useful for:
- Auto-tagging content — cluster, then label each cluster
- Understanding support tickets — what categories of issues exist?
- Content audit — find redundant or orphaned content in a knowledge base
- Customer segmentation — embed customer behavior descriptions, cluster them
Zero-Shot Classification
Instead of clustering, you can classify by comparing an input's embedding to embeddings of category descriptions:
categories = {
"billing": embed("Questions about payment, invoices, pricing, and subscriptions"),
"technical": embed("Technical issues, bugs, errors, and integration problems"),
"feature_request": embed("Requests for new features or improvements"),
"account": embed("Account management, login, password, and profile settings"),
}
def classify(text):
text_emb = embed(text)
scores = {}
for cat, cat_emb in categories.items():
scores[cat] = np.dot(text_emb, cat_emb) / (
np.linalg.norm(text_emb) * np.linalg.norm(cat_emb)
)
return max(scores, key=scores.get)
classify("I was charged twice this month") # "billing"
classify("The API returns a 500 error") # "technical"
No training data required. Adjust category descriptions to tune accuracy.
Anomaly Detection
If you have a collection of "normal" items, anything whose embedding is far from all existing embeddings is potentially anomalous.
def detect_anomalies(new_item_embedding, existing_embeddings, threshold=0.5):
"""Flag items that are far from everything in the existing collection."""
max_similarity = max(
np.dot(new_item_embedding, existing) / (
np.linalg.norm(new_item_embedding) * np.linalg.norm(existing)
)
for existing in existing_embeddings
)
return max_similarity < threshold
# Application: detecting off-topic content in a moderated forum
# Application: flagging unusual support tickets for manual review
# Application: identifying data quality issues in a pipeline
Multi-Modal Search
CLIP and similar models embed both text and images into the same vector space. This enables cross-modal search: search images with text queries, or find similar images to a text description.
# Using CLIP for text-to-image search
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("clip-ViT-B-32")
# Embed images (pass file paths or PIL images)
from PIL import Image
image_paths = ["photo1.jpg", "photo2.jpg", "photo3.jpg"]
image_embeddings = model.encode([Image.open(p) for p in image_paths])
# Embed a text query
text_embedding = model.encode("a dog playing in the snow")
# Compare text embedding to image embeddings
similarities = [
np.dot(text_embedding, img_emb) / (
np.linalg.norm(text_embedding) * np.linalg.norm(img_emb)
)
for img_emb in image_embeddings
]
# The image most similar to "a dog playing in the snow" ranks first
Multi-modal search extends to audio (embed podcast transcripts alongside text documents), video (embed frame descriptions), and code (embed code and natural language into the same space for code search).
The Embedding-First Architecture
The most powerful pattern is to embed everything and make similarity search the foundation of your application.
Architecture:
1. Every piece of content gets embedded on ingest
- Documents, images, user profiles, product listings, support tickets
2. Embeddings stored in a vector database alongside metadata
- pgvector for most teams, dedicated DB at scale
3. Every query path uses vector search as a primitive
- Search: embed query, find nearest neighbors
- Recommendations: embed item, find similar items
- Classification: embed input, compare to category embeddings
- Deduplication: embed new item, check for near-neighbors
- RAG: embed question, retrieve relevant context, send to LLM
4. Combine with traditional systems
- Vector search for recall, keyword search for precision
- Vector similarity for ranking, business rules for filtering
- Embeddings for ML features, SQL for business logic
This architecture is not theoretical. Companies like Spotify (podcast search), Airbnb (listing recommendations), Pinterest (visual search), and Notion (AI-powered search) use embedding-first architectures in production.
Common Pitfalls
- Using embeddings for everything. Embeddings excel at fuzzy semantic matching. For exact lookups (user IDs, order numbers, enum values), use traditional indexes. Combining both is the right approach.
- Not evaluating retrieval quality. Build a test set of queries with known-relevant documents. Measure recall@k and precision@k. Without this, you are guessing about quality.
- Stale embeddings. If the source content changes, the embedding must be regenerated. Build this into your update pipeline, not as an afterthought.
- One-size-fits-all chunking. A chunking strategy that works for technical documentation will fail for conversational chat logs. Tune chunk size and overlap per content type.
- Ignoring the cost of embedding at scale. Embedding 10 million documents with OpenAI's API costs real money. Calculate costs upfront. Consider open-source models for large-scale batch embedding.
- Skipping hybrid search. Pure vector search misses exact-match queries. Pure keyword search misses semantic queries. Combine both.
Key Takeaways
- Semantic search is the entry-point use case, but embeddings enable recommendations, deduplication, classification, anomaly detection, and multi-modal search with the same underlying primitive.
- Hybrid search (vector + keyword) outperforms either approach alone in nearly every benchmark and real-world test.
- Zero-shot classification with embeddings requires no training data and is good enough for many production routing and tagging tasks.
- The embedding-first architecture treats vector similarity as a fundamental building block, not a bolted-on feature.
- Always measure retrieval quality. Build evaluation sets early and track recall and precision as you iterate.
- Multi-modal embeddings (CLIP and successors) unlock cross-modal search, which is one of the most underused capabilities available today.