LLM Function Calling: The Complete Guide with Examples
Quick answer: Function calling lets LLMs output structured JSON to invoke external functions instead of (or in addition to) generating text. The model decides when to call a function, what arguments to pass, and synthesizes a final response from the function's output. It's the foundation of all modern LLM agents.
How function calling works
The flow is:
- You define tools (functions with JSON schema descriptions)
- You send a user message
- The model decides whether to call a tool (or answer directly)
- If it calls a tool, it returns a structured JSON object with the function name and arguments
- You execute the function and return the result
- The model synthesizes a final response using the tool result
Defining tools with OpenAI
weather_tool = {
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather conditions for a city. Use when the user asks about weather in a specific location.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'London' or 'New York'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
},
"required": ["city"]
}
}
}
Key insight: The description fields are critical — they're what the model reads to decide whether to call the function and how to call it. Write descriptions as if explaining to a smart person who doesn't know your system.
Executing a tool call (OpenAI)
from openai import OpenAI
import json
client = OpenAI()
def get_weather(city: str, units: str = "celsius") -> dict:
# Your actual weather API call here
return {"city": city, "temp": 18, "conditions": "partly cloudy", "units": units}
def run_agent(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
tools = [weather_tool]
# First call - model may call a tool
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
# If no tool call, return the direct response
if response.choices[0].finish_reason == "stop":
return response.choices[0].message.content
# Execute the tool call
tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
result = get_weather(**args)
# Send result back and get final response
messages.append(response.choices[0].message)
messages.append({
"role": "tool",
"content": json.dumps(result),
"tool_call_id": tool_call.id
})
final_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)
return final_response.choices[0].message.content
print(run_agent("What's the weather like in Tokyo?"))
Parallel tool calls
Modern LLMs can call multiple tools in parallel in a single response:
# Response may contain multiple tool_calls
if response.choices[0].finish_reason == "tool_calls":
tool_calls = response.choices[0].message.tool_calls
# Execute all tool calls in parallel
import asyncio
async def execute_all(calls):
tasks = [execute_tool(call) for call in calls]
return await asyncio.gather(*tasks)
results = asyncio.run(execute_all(tool_calls))
Anthropic tool use syntax
import anthropic
client = anthropic.Anthropic()
tools = [{
"name": "search_database",
"description": "Search the product database for items matching a query",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"},
"limit": {"type": "integer", "default": 10}
},
"required": ["query"]
}
}]
response = client.messages.create(
model="claude-sonnet-4",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "Find me waterproof hiking boots under $200"}]
)
if response.stop_reason == "tool_use":
tool_use = next(b for b in response.content if b.type == "tool_use")
result = search_database(**tool_use.input)
# Continue conversation with result
response2 = client.messages.create(
model="claude-sonnet-4",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "Find me waterproof hiking boots under $200"},
{"role": "assistant", "content": response.content},
{"role": "user", "content": [{"type": "tool_result", "tool_use_id": tool_use.id, "content": json.dumps(result)}]}
]
)
Best practices
- Limit your tool count: Models perform worse with >10 tools. Start with 3-5 and add more only when needed.
- Write great descriptions: The description determines when and how the model calls your function. Test edge cases.
- Validate outputs: Always validate tool call arguments before executing. Treat LLM-generated JSON as untrusted input.
- Add error handling: Return meaningful error messages from tools — the model uses them to self-correct.
For agentic applications, see best LLMs for automation for the top models by function calling reliability.