How to Build a Chatbot with an LLM API: Full Guide for 2026
Quick answer: A production chatbot needs five things the tutorials don't cover: conversation history management (to avoid context bloat), a well-designed system prompt, streaming for responsiveness, error handling with graceful degradation, and a cost model. This guide covers all five.
Architecture overview
A production LLM chatbot has these layers:
- Frontend: Chat UI (React, Vue, or plain HTML)
- API layer: Next.js route handler or Express endpoint
- Conversation manager: Stores and trims message history
- LLM client: Calls the model, handles retries
- Persistence: Database for conversation storage
The system prompt is your product
The system prompt defines your chatbot's personality, knowledge, constraints, and behavior. Invest time here — it's the highest leverage prompt in your application.
You are Aria, a customer support specialist for Acme SaaS.
Your responsibilities:
- Answer questions about Acme's features, pricing, and integrations
- Help users troubleshoot common issues using the knowledge base below
- Escalate to a human agent when: the issue requires account access, the user is frustrated after 3 turns, or the issue is not covered in your knowledge base
Behavior rules:
- Be concise. Maximum 3 sentences per response unless the user asks for detail.
- Never guess. If you don't know, say so and offer to escalate.
- Never discuss competitors by name.
Knowledge base:
[INSERT PRODUCT DOCS HERE]
Conversation history management
A naive implementation appends every message to history indefinitely. This causes:
- Context window overflow after extended conversations
- Increasing cost per turn as history grows
- Degrading quality as old irrelevant context crowds the window
The solution is a conversation manager that trims or summarizes old messages:
const MAX_HISTORY_TOKENS = 4000;
function trimConversation(
messages: Message[],
maxTokens: number
): Message[] {
// Always keep the last N turns
const KEEP_LAST = 6;
if (messages.length <= KEEP_LAST) return messages;
// Estimate tokens (rough: 4 chars per token)
let tokenCount = messages
.slice(-KEEP_LAST)
.reduce((sum, m) => sum + m.content.length / 4, 0);
if (tokenCount <= maxTokens) return messages.slice(-KEEP_LAST);
// Further trim if needed
return messages.slice(-3);
}
Streaming API with Next.js
// app/api/chat/route.ts
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
export async function POST(req: Request) {
const { messages, systemPrompt } = await req.json();
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
const response = await client.messages.stream({
model: 'claude-haiku-4',
max_tokens: 1024,
system: systemPrompt,
messages,
});
for await (const event of response) {
if (event.type === 'content_block_delta' &&
event.delta.type === 'text_delta') {
controller.enqueue(encoder.encode(event.delta.text));
}
}
controller.close();
},
});
return new Response(stream, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' },
});
}
Model selection for chatbots
For most chatbots:
- Customer support (high volume): Claude Haiku 4 or GPT-4.1 Mini — fast, cheap, good enough
- Complex product advisor: Claude Sonnet 4 or GPT-4o — better reasoning for nuanced questions
- Internal tools: Gemini 2.0 Flash — generous free tier for low-volume internal use
See the best LLMs for chatbot development for a full ranked comparison.
Cost model for chatbots
Estimate your chatbot costs before launch:
Cost per conversation = (system_prompt_tokens + avg_history_tokens + avg_user_tokens)
× input_price_per_token
+ avg_response_tokens × output_price_per_token
For a customer support bot with a 1,000-token system prompt, 500-token history, 50-token user message, and 200-token response at Claude Haiku 4 pricing:
- Input: 1,550 tokens × $0.80/1M = $0.00124
- Output: 200 tokens × $4.00/1M = $0.0008
- Per conversation: $0.00204
- At 10,000 conversations/month: $20.40/month
Use the LLMversus cost calculator to model your specific chatbot costs and compare across providers.