chatbotllm-apitutorialstreamingconversation-management

How to Build a Chatbot with an LLM API: Full Guide for 2026

How to Build a Chatbot with an LLM API: Full Guide for 2026

Quick answer: A production chatbot needs five things the tutorials don't cover: conversation history management (to avoid context bloat), a well-designed system prompt, streaming for responsiveness, error handling with graceful degradation, and a cost model. This guide covers all five.


Architecture overview

A production LLM chatbot has these layers:

  1. Frontend: Chat UI (React, Vue, or plain HTML)
  2. API layer: Next.js route handler or Express endpoint
  3. Conversation manager: Stores and trims message history
  4. LLM client: Calls the model, handles retries
  5. Persistence: Database for conversation storage


The system prompt is your product

The system prompt defines your chatbot's personality, knowledge, constraints, and behavior. Invest time here — it's the highest leverage prompt in your application.

You are Aria, a customer support specialist for Acme SaaS.

Your responsibilities:
- Answer questions about Acme's features, pricing, and integrations
- Help users troubleshoot common issues using the knowledge base below
- Escalate to a human agent when: the issue requires account access, the user is frustrated after 3 turns, or the issue is not covered in your knowledge base

Behavior rules:
- Be concise. Maximum 3 sentences per response unless the user asks for detail.
- Never guess. If you don't know, say so and offer to escalate.
- Never discuss competitors by name.

Knowledge base:
[INSERT PRODUCT DOCS HERE]


Conversation history management

A naive implementation appends every message to history indefinitely. This causes:

  • Context window overflow after extended conversations
  • Increasing cost per turn as history grows
  • Degrading quality as old irrelevant context crowds the window

The solution is a conversation manager that trims or summarizes old messages:

const MAX_HISTORY_TOKENS = 4000;

function trimConversation(
  messages: Message[],
  maxTokens: number
): Message[] {
  // Always keep the last N turns
  const KEEP_LAST = 6;
  if (messages.length <= KEEP_LAST) return messages;
  
  // Estimate tokens (rough: 4 chars per token)
  let tokenCount = messages
    .slice(-KEEP_LAST)
    .reduce((sum, m) => sum + m.content.length / 4, 0);
  
  if (tokenCount <= maxTokens) return messages.slice(-KEEP_LAST);
  
  // Further trim if needed
  return messages.slice(-3);
}


Streaming API with Next.js

// app/api/chat/route.ts
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

export async function POST(req: Request) {
  const { messages, systemPrompt } = await req.json();

  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      const response = await client.messages.stream({
        model: 'claude-haiku-4',
        max_tokens: 1024,
        system: systemPrompt,
        messages,
      });

      for await (const event of response) {
        if (event.type === 'content_block_delta' && 
            event.delta.type === 'text_delta') {
          controller.enqueue(encoder.encode(event.delta.text));
        }
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' },
  });
}


Model selection for chatbots

For most chatbots:

  • Customer support (high volume): Claude Haiku 4 or GPT-4.1 Mini — fast, cheap, good enough
  • Complex product advisor: Claude Sonnet 4 or GPT-4o — better reasoning for nuanced questions
  • Internal tools: Gemini 2.0 Flash — generous free tier for low-volume internal use

See the best LLMs for chatbot development for a full ranked comparison.


Cost model for chatbots

Estimate your chatbot costs before launch:

Cost per conversation = (system_prompt_tokens + avg_history_tokens + avg_user_tokens)
                        × input_price_per_token
                      + avg_response_tokens × output_price_per_token

For a customer support bot with a 1,000-token system prompt, 500-token history, 50-token user message, and 200-token response at Claude Haiku 4 pricing:

  • Input: 1,550 tokens × $0.80/1M = $0.00124
  • Output: 200 tokens × $4.00/1M = $0.0008
  • Per conversation: $0.00204
  • At 10,000 conversations/month: $20.40/month

Use the LLMversus cost calculator to model your specific chatbot costs and compare across providers.

Your ad here

Related Tools