LLM Function Calling: The Complete Guide with Examples

Quick answer: Function calling lets LLMs output structured JSON to invoke external functions instead of (or in addition to) generating text. The model decides when to call a function, what arguments to pass, and synthesizes a final response from the function's output. It's the foundation of all modern LLM agents.

How function calling works

The flow is:

You define tools (functions with JSON schema descriptions)
You send a user message
The model decides whether to call a tool (or answer directly)
If it calls a tool, it returns a structured JSON object with the function name and arguments
You execute the function and return the result
The model synthesizes a final response using the tool result

Defining tools with OpenAI

weather_tool = {
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather conditions for a city. Use when the user asks about weather in a specific location.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'London' or 'New York'"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature units"
                }
            },
            "required": ["city"]
        }
    }
}

Key insight: The description fields are critical — they're what the model reads to decide whether to call the function and how to call it. Write descriptions as if explaining to a smart person who doesn't know your system.

Executing a tool call (OpenAI)

from openai import OpenAI
import json

client = OpenAI()

def get_weather(city: str, units: str = "celsius") -> dict:
    # Your actual weather API call here
    return {"city": city, "temp": 18, "conditions": "partly cloudy", "units": units}

def run_agent(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]
    tools = [weather_tool]
    
    # First call - model may call a tool
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )
    
    # If no tool call, return the direct response
    if response.choices[0].finish_reason == "stop":
        return response.choices[0].message.content
    
    # Execute the tool call
    tool_call = response.choices[0].message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    result = get_weather(**args)
    
    # Send result back and get final response
    messages.append(response.choices[0].message)
    messages.append({
        "role": "tool",
        "content": json.dumps(result),
        "tool_call_id": tool_call.id
    })
    
    final_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )
    return final_response.choices[0].message.content

print(run_agent("What's the weather like in Tokyo?"))

Parallel tool calls

Modern LLMs can call multiple tools in parallel in a single response:

# Response may contain multiple tool_calls
if response.choices[0].finish_reason == "tool_calls":
    tool_calls = response.choices[0].message.tool_calls
    
    # Execute all tool calls in parallel
    import asyncio
    async def execute_all(calls):
        tasks = [execute_tool(call) for call in calls]
        return await asyncio.gather(*tasks)
    
    results = asyncio.run(execute_all(tool_calls))

Anthropic tool use syntax

import anthropic

client = anthropic.Anthropic()

tools = [{
    "name": "search_database",
    "description": "Search the product database for items matching a query",
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {"type": "string"},
            "limit": {"type": "integer", "default": 10}
        },
        "required": ["query"]
    }
}]

response = client.messages.create(
    model="claude-sonnet-4",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "Find me waterproof hiking boots under $200"}]
)

if response.stop_reason == "tool_use":
    tool_use = next(b for b in response.content if b.type == "tool_use")
    result = search_database(**tool_use.input)
    
    # Continue conversation with result
    response2 = client.messages.create(
        model="claude-sonnet-4",
        max_tokens=1024,
        tools=tools,
        messages=[
            {"role": "user", "content": "Find me waterproof hiking boots under $200"},
            {"role": "assistant", "content": response.content},
            {"role": "user", "content": [{"type": "tool_result", "tool_use_id": tool_use.id, "content": json.dumps(result)}]}
        ]
    )

Best practices

Limit your tool count: Models perform worse with >10 tools. Start with 3-5 and add more only when needed.
Write great descriptions: The description determines when and how the model calls your function. Test edge cases.
Validate outputs: Always validate tool call arguments before executing. Treat LLM-generated JSON as untrusted input.
Add error handling: Return meaningful error messages from tools — the model uses them to self-correct.

For agentic applications, see best LLMs for automation for the top models by function calling reliability.