Effective Prompting

Overview

Prompt engineering is the practice of crafting inputs to language models that produce reliable, useful outputs. It is the most cost-effective way to improve AI system quality: better prompts are free, while better models cost more money.

The core principle is simple: be specific. Vague instructions produce vague outputs. Clear, detailed instructions with explicit format requirements produce consistent, usable results.

System Prompts vs User Prompts

Every LLM API call has at least two roles: system and user. Understanding the difference is essential.

System Prompts

The system prompt sets the model's behavior, persona, constraints, and output format for the entire conversation. It is your instruction manual to the model.

# BAD: Wasting tokens on nothing
system_prompt = "You are a helpful assistant."

# GOOD: Specific, actionable instructions
system_prompt = """You are a medical billing code classifier.

Your task: Given a description of a medical procedure, return the most 
likely CPT code and a confidence score.

Rules:
- Return ONLY valid CPT codes (5-digit numeric codes)
- If the description is ambiguous, return the top 3 candidates
- If the description does not match any known procedure, return "UNKNOWN"
- Never guess. If confidence is below 0.6, say so explicitly.

Output format:
{"code": "99213", "confidence": 0.92, "description": "Office visit, established patient, moderate complexity"}
"""

The system prompt should include:

Role: What the model is (not "helpful assistant" but "senior Python developer reviewing code for security vulnerabilities")
Task: What it should do, precisely
Constraints: What it should not do, edge cases to handle
Output format: Exact structure of the response
Examples: If applicable, one or two examples of ideal output

User Prompts

The user prompt contains the specific input for this request. Keep the variable content in the user message and the stable instructions in the system message.

# The system prompt (set once, reused across all requests):
system = """Extract all monetary amounts from the provided text.
Return them as a JSON array of objects with 'amount' (float), 
'currency' (ISO 4217 code), and 'context' (what the amount refers to).
If no amounts are found, return an empty array."""

# The user prompt (changes per request):
user = """Our Q3 revenue was $4.2M, up from EUR 3.1 million in Q2. 
We allocated 500,000 GBP for the London office renovation and 
set aside JPY 50 million for the Tokyo expansion."""

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": system},
        {"role": "user", "content": user}
    ],
    temperature=0
)

The Role of Context

Models perform dramatically better when given relevant context. Context means the information the model needs to answer correctly that it would not otherwise have.

# WITHOUT context: model guesses or hallucinates
user_prompt = "What is the refund policy?"
# Model: "Typically, refund policies allow returns within 30 days..."
# This is generic and probably wrong for your company.

# WITH context: model answers accurately
user_prompt = """Based on the following refund policy document, answer the 
customer's question.

REFUND POLICY:
- Full refund within 14 days of purchase for unused items
- 50% refund within 14-30 days
- No refunds after 30 days
- Digital products are non-refundable after download
- Shipping costs are non-refundable

CUSTOMER QUESTION: I bought a physical product 20 days ago and haven't 
opened it. Can I get a refund?"""
# Model: "Yes, you can get a 50% refund since your purchase was 20 days 
# ago, which falls within the 14-30 day window..."

What Makes Good Context

Effective context:
  - Directly relevant to the question
  - Specific enough to answer correctly
  - Not so long that it overwhelms the model
  - Placed before the question (models attend to earlier text more)

Ineffective context:
  - Entire database dumps (too much noise)
  - Irrelevant documents included "just in case"
  - Context placed after the question
  - Contradictory information without prioritization

Being Specific

The single most impactful improvement to any prompt is specificity. Compare:

VAGUE: "Summarize this article."
→ Model chooses length, format, focus. Results vary wildly.

SPECIFIC: "Summarize this article in exactly 3 bullet points. 
Each bullet should be one sentence. Focus on actionable findings, 
not background context."
→ Consistent format, predictable output, easy to validate.

VAGUE: "Write a function to process this data."
→ Which language? What processing? What's the input format?

SPECIFIC: "Write a Python function that takes a list of dicts with 
'name' (str) and 'score' (int) keys, filters to scores above 80, 
sorts by score descending, and returns the top 5 names as a list 
of strings."
→ Clear contract. Output can be verified.

Specificity Checklist

When writing a prompt, check that you have specified:

Input format: What does the data look like?
Output format: JSON? Markdown? Plain text? How structured?
Length: How long should the response be?
Focus: What aspects matter most?
Edge cases: What should happen with unexpected input?
Negative constraints: What should the model NOT do?

Temperature & Top-p

These parameters control the randomness of the model's output.

Temperature

Temperature 0.0:
  - Deterministic (nearly). Same input → same output.
  - Use for: classification, extraction, code generation,
    anything where consistency matters.

Temperature 0.3-0.7:
  - Balanced. Some variety but mostly predictable.
  - Use for: writing assistance, summarization, chat responses.

Temperature 1.0+:
  - Creative, unpredictable. Higher chance of novel (and wrong) output.
  - Use for: brainstorming, creative writing, generating diverse options.

Top-p (Nucleus Sampling)

Top-p limits the model's choices to the most likely tokens whose cumulative probability reaches p.

Top-p 0.1:  Model only considers the very top tokens. Very focused.
Top-p 0.5:  Moderate diversity.
Top-p 1.0:  All tokens considered (default).

General rule: adjust temperature OR top-p, not both.
For most production use cases, set temperature=0 and leave top-p at 1.0.

Practical Guidance

# Classification task: temperature=0, we want consistency
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[...],
    temperature=0  # Always returns the same classification for the same input
)

# Creative writing: higher temperature for variety
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    temperature=0.8  # More creative, less predictable
)

# Code generation: low temperature for correctness
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    temperature=0.2  # Slightly varied but mostly correct
)

Prompt Structure Patterns

The Instruction-Context-Input Pattern

This is the most reliable pattern for production prompts:

system_prompt = """
[INSTRUCTION]
You are a data extraction specialist. Extract structured information 
from invoices.

[OUTPUT FORMAT]
Return a JSON object with these fields:
- vendor_name: string
- invoice_number: string  
- date: string (YYYY-MM-DD format)
- line_items: array of {description: string, quantity: int, unit_price: float}
- total: float
- currency: string (ISO 4217)

[RULES]
- If a field is not found in the invoice, set it to null
- Dates should be normalized to YYYY-MM-DD regardless of input format
- Currency should be inferred from symbols ($=USD, EUR=EUR, etc.)
"""

user_prompt = """[INPUT]
INVOICE #4821
From: Acme Industrial Supply
Date: March 15th, 2025

2x Widget A @ $12.50 each
5x Widget B @ $8.00 each  
1x Shipping

Subtotal: $65.00
Tax: $5.53
Total: $70.53
"""

The Persona Pattern

Assigning a specific persona improves quality for domain-specific tasks. "Answer the user's question about databases" gives textbook answers. "You are a database administrator with 15 years of experience managing PostgreSQL at scale" gives practical, experienced answers with specific commands.

Real-World Example: Iterative Prompt Development

A code review assistant, iterated from vague to production-ready:

# V1: Too vague — model chooses what to review and how to format it
v1 = "Review this code."

# V2: Production-ready — specific, structured, constrained
v2 = """Review this Python code for issues. You are a senior Python 
developer conducting a code review.

For each issue found, return a JSON array of objects:
{"line": int, "severity": "critical"|"warning"|"suggestion", 
 "category": "bug"|"security"|"performance"|"style",
 "description": string, "fix": string}

Priority order: security > bugs > performance > style.
Return at most 10 issues, prioritized by severity.
If no issues: return an empty array [].
Do not comment on code that is correct and well-written."""

The production version specifies output format, priority order, limits, and negative constraints. Its output is machine-parseable and predictable.

Common Pitfalls

"You are a helpful assistant": This tells the model nothing it does not already know. Every token in your system prompt should add information. If you removed a sentence and the output would be the same, remove it.
Prompt stuffing: Cramming every possible instruction into one prompt. If your prompt is 2,000 words, the model will forget parts of it. Keep instructions focused. Use multiple calls if needed.
Not testing with adversarial inputs: Your prompt works great with clean data. What about empty strings, HTML, code injection, or a 100KB wall of text? Test edge cases.
Ignoring token costs: Long system prompts are sent with every request. A 1,000-token system prompt at 100K requests/day adds up. Be concise.
Not versioning prompts: Prompts are code. Version them, test them, review changes. A one-word change in a prompt can dramatically alter behavior.
Assuming determinism at temperature 0: Even at temperature 0, LLM outputs can vary slightly between API calls or model versions. Always validate output format programmatically.

Key Takeaways

Be specific. Every vague word in your prompt is a chance for the model to do something unexpected. Specify format, length, focus, edge cases, and constraints.
System prompts define behavior; user prompts provide input. Keep stable instructions in the system prompt and variable content in the user prompt.
Context is critical. Models with relevant context dramatically outperform models without it. But context must be relevant and concise.
Use temperature 0 for production tasks where consistency matters. Save higher temperatures for creative or exploratory tasks.
Iterate on prompts like you iterate on code. Start simple, test, identify failures, add specificity. Version and track your prompts.