Agents & Tool Use

A chatbot generates text. An agent takes actions. The difference is that an agent can call external tools — search the web, query a database, run code, hit an API — and incorporate the results into its reasoning. This turns an LLM from a text generator into something that can actually do things in the world.

How Tool Use Works

You give the LLM a list of available tools (functions) with descriptions of what they do and what parameters they take. The LLM decides when to call a tool, generates the appropriate parameters, and then incorporates the tool's response into its next step.

# Tool definition for an LLM (OpenAI function calling format)
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_knowledge_base",
            "description": "Search the company knowledge base for relevant articles. Use this when the user asks about company policies, procedures, or product documentation.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    },
                    "category": {
                        "type": "string",
                        "enum": ["policy", "product", "engineering", "hr"],
                        "description": "Optional category filter"
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "run_sql_query",
            "description": "Execute a read-only SQL query against the analytics database. Use this when the user asks about metrics, counts, or data that requires a database lookup.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The SQL query to execute (SELECT only)"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

The LLM receives these tool definitions as part of its system context. When a user asks a question, the model can choose to call a tool, process the result, and optionally call more tools before producing a final answer.

from openai import OpenAI
import json

client = OpenAI()

def run_agent(user_message, tools, tool_implementations):
    messages = [
        {"role": "system", "content": "You are a helpful assistant. Use the provided tools when needed."},
        {"role": "user", "content": user_message}
    ]
    
    while True:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
        )
        
        message = response.choices[0].message
        messages.append(message)
        
        # If the model wants to call tools
        if message.tool_calls:
            for tool_call in message.tool_calls:
                func_name = tool_call.function.name
                func_args = json.loads(tool_call.function.arguments)
                
                # Execute the tool
                result = tool_implementations[func_name](**func_args)
                
                # Add the result to the conversation
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(result)
                })
        else:
            # No more tool calls, return the final response
            return message.content

The Agent Loop: Observe, Think, Act

An agent is not a single LLM call. It is a loop:

1. OBSERVE:  Receive input (user message + any tool results from previous step)
2. THINK:    Reason about what to do next
3. ACT:      Either call a tool or produce a final response
4. REPEAT:   If a tool was called, go back to OBSERVE with the tool result

This is the ReAct (Reasoning + Acting) pattern. The model alternates between reasoning about the situation and taking action:

User: "How many orders did we ship last week, and what was our on-time rate?"

Agent thinking: I need to query the orders database for last week's shipments.
Agent action: run_sql_query("SELECT COUNT(*) as total, 
    SUM(CASE WHEN shipped_on_time THEN 1 ELSE 0 END) as on_time 
    FROM orders WHERE ship_date >= '2026-04-11' AND ship_date <= '2026-04-17'")
Tool result: {"total": 4832, "on_time": 4516}

Agent thinking: I have the numbers. Let me calculate the on-time rate and respond.
Agent response: "Last week we shipped 4,832 orders with a 93.5% on-time rate 
    (4,516 out of 4,832 shipped on time)."

Practical Tool Implementations

Database Query Tool

import sqlite3

def run_sql_query(query):
    """Execute a read-only SQL query."""
    if not query.strip().upper().startswith("SELECT"):
        return {"error": "Only SELECT queries are allowed"}
    
    conn = sqlite3.connect("analytics.db")
    try:
        cursor = conn.execute(query)
        columns = [desc[0] for desc in cursor.description]
        rows = cursor.fetchall()
        # Limit results to prevent overwhelming the LLM context
        if len(rows) > 50:
            rows = rows[:50]
            truncated = True
        else:
            truncated = False
        return {
            "columns": columns,
            "rows": rows,
            "row_count": len(rows),
            "truncated": truncated,
        }
    except Exception as e:
        return {"error": str(e)}
    finally:
        conn.close()

Web Search Tool

import requests

def search_web(query, num_results=5):
    """Search the web and return snippets."""
    # Using a search API (Serper, Brave, Tavily, etc.)
    response = requests.post(
        "https://api.tavily.com/search",
        json={"query": query, "max_results": num_results},
        headers={"Authorization": f"Bearer {TAVILY_API_KEY}"}
    )
    results = response.json().get("results", [])
    return [
        {"title": r["title"], "snippet": r["content"], "url": r["url"]}
        for r in results
    ]

When Agents Work

Agents work well when:

- The task has clear, verifiable sub-steps (lookup, calculate, compare)
- Tools return structured, unambiguous results
- The action space is bounded (5-10 tools, not 500)
- Errors are recoverable (retry a failed search, rephrase a query)
- The task requires 1-5 tool calls, not 50

Real-world examples where agents consistently perform well:

Customer support with knowledge base lookup. User asks a question, agent searches docs, answers with citations.
Data analysis. User asks a question about data, agent writes and executes SQL, summarizes results.
Research assistant. User asks about a topic, agent searches multiple sources, synthesizes a summary.
Code generation with execution. Agent writes code, runs it, sees the output, fixes errors.

When Agents Spiral

Agents fail in predictable ways:

1. Infinite loops:     Agent keeps calling the same tool with the same query,
                       getting the same unhelpful result, trying again.

2. Goal drift:         Agent starts solving a sub-problem and forgets the
                       original question.

3. Hallucinated tools: Agent tries to call a tool that does not exist or
                       invents parameters.

4. Cascading errors:   One tool returns an error. Agent misinterprets the
                       error as useful data and builds on it.

5. Over-decomposition: Agent breaks a simple task into 15 sub-tasks,
                       each requiring a tool call, when a single prompt
                       would have worked.

Defensive Patterns

Limit Iterations

MAX_ITERATIONS = 10

def run_agent_safe(user_message, tools, tool_implementations):
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": user_message}
    ]
    
    for iteration in range(MAX_ITERATIONS):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
        )
        message = response.choices[0].message
        messages.append(message)
        
        if not message.tool_calls:
            return message.content
        
        for tool_call in message.tool_calls:
            func_name = tool_call.function.name
            if func_name not in tool_implementations:
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps({"error": f"Unknown tool: {func_name}"})
                })
                continue
            
            func_args = json.loads(tool_call.function.arguments)
            result = tool_implementations[func_name](**func_args)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result)
            })
    
    return "I was unable to complete this request within the allowed number of steps."

Giving agents too many tools. Each tool is a decision the model must make. More tools means more chances for the wrong choice. Start with 3-5 tools and add more only when you have evidence they are needed.
Tool descriptions that are vague. "Search for stuff" is a bad tool description. "Search the company knowledge base for policy documents, product documentation, and engineering guides. Returns the top 5 most relevant results." is much better. The quality of your tool descriptions directly affects how well the agent uses them.
No iteration limit. Without a maximum number of steps, agents can loop indefinitely, consuming tokens and money. Always set a cap.
Not logging the full agent trace. When an agent gives a wrong answer, you need to see every step: what it thought, what tools it called, what results it got, and where it went wrong. Log the entire message history.
Treating agents as reliable. Agents are probabilistic systems. They will sometimes call the wrong tool, misinterpret results, or hallucinate actions. Design your system to handle failures gracefully — confirm destructive actions with the user, validate outputs, and have fallback paths.
Skipping the simple approach. Many tasks that seem to need agents can be solved with a single well-crafted prompt plus a retrieval step. Try the simple version first.

Key Takeaways

Agents are LLMs in a loop: observe, think, act, repeat. Tool use is what makes them agents rather than chatbots.
Good tool descriptions matter more than model choice. A clear description of when and how to use a tool prevents most agent failures.
Limit the iteration count, limit the tool set, validate tool inputs. Defensive patterns are not optional.
Agents work well for bounded tasks with clear sub-steps (lookup, calculate, compare). They struggle with open-ended, multi-step reasoning that requires dozens of actions.
Log the full agent trace. You cannot debug what you cannot see.
Start simple. A retrieval step plus a single LLM call often beats a multi-step agent for straightforward tasks.