Tutorials & GuidesFebruary 4, 202620 min read

Build a Production Chatbot from Scratch: No Shortcuts

Founder & CEO, Agentik{OS}

Chat widgets are easy. Production chatbots that handle real users without breaking are hard. Here's the full guide to building one that actually works.

Build a Production Chatbot from Scratch: No Shortcuts

The demo works beautifully. You show it to your team. Everyone is impressed. Then you put it in front of real users and it falls apart.

Real users ask questions you didn't anticipate. They provide context in unexpected ways. They ask follow-up questions in the middle of different topics. They paste in long text. They send one-word messages. They try to make the chatbot say inappropriate things. They get frustrated when it doesn't understand them.

Building a chatbot that survives this is a different project from building a chatbot that works in your demo. This guide covers the version that works for real users.

What We're Building

A production-ready customer support chatbot for a SaaS product. Features:

Context-aware multi-turn conversation
Knowledge base integration (answers from your docs)
Conversation memory across sessions
Graceful escalation to human agents
Streaming responses for good UX
Abuse prevention and rate limiting
Analytics and monitoring

Tech stack:

Next.js 16 (App Router)
Convex (real-time data, conversation storage)
Claude API (with streaming)
TypeScript throughout

Architecture First

Most chatbot tutorials skip architecture and jump straight to code. This is where they fail. Let me show the architecture that scales.

User Browser
    |
    | (streaming SSE)
    v
Next.js API Route
    |           |
    |           v
    |     Rate Limiter
    |           |
    v           v
Claude API   Abuse Detection
    |
    v
Convex Database
    |
    |---> Conversation History
    |---> Message Analytics
    '---> User Sessions

Key principle: separate the stateless AI call from the stateful conversation management. The AI model doesn't hold state. Your database does. The API layer connects them.

Step 1: Conversation Data Model

typescript

// convex/schema.ts
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";

export default defineSchema({
  conversations: defineTable({
    sessionId: v.string(),
    userId: v.optional(v.string()),
    createdAt: v.number(),
    updatedAt: v.number(),
    status: v.union(v.literal("active"), v.literal("resolved"), v.literal("escalated")),
    metadata: v.optional(v.object({
      page: v.optional(v.string()),
      userAgent: v.optional(v.string()),
      referrer: v.optional(v.string()),
    })),
  }).index("by_session", ["sessionId"])
    .index("by_user", ["userId"]),

  messages: defineTable({
    conversationId: v.id("conversations"),
    role: v.union(v.literal("user"), v.literal("assistant"), v.literal("system")),
    content: v.string(),
    timestamp: v.number(),
    metadata: v.optional(v.object({
      tokensUsed: v.optional(v.number()),
      model: v.optional(v.string()),
      latencyMs: v.optional(v.number()),
    })),
  }).index("by_conversation", ["conversationId"]),

  rateLimits: defineTable({
    identifier: v.string(), // IP or userId
    messageCount: v.number(),
    windowStart: v.number(),
  }).index("by_identifier", ["identifier"]),
});

Step 2: The Chat API Route with Streaming

typescript

// app/api/chat/route.ts
import { NextRequest } from "next/server";
import Anthropic from "@anthropic-ai/sdk";
import { ConvexHttpClient } from "convex/browser";
import { api } from "@/convex/_generated/api";

const anthropic = new Anthropic();
const convex = new ConvexHttpClient(process.env.NEXT_PUBLIC_CONVEX_URL!);

const SYSTEM_PROMPT = `You are a helpful customer support assistant for Acme SaaS.

Your role:
- Answer questions about product features, pricing, and troubleshooting
- Be concise and helpful
- If you don't know something, say so and offer to connect them with the team
- Never make up information about features or pricing

Escalation triggers (respond with [ESCALATE] prefix):
- Customer expresses significant frustration or mentions legal action
- Technical issues you cannot resolve
- Requests for refunds or account modifications
- Any security concerns

Knowledge base context will be provided when available.`;

const RATE_LIMIT = { messages: 20, windowMs: 60000 }; // 20 messages per minute

async function checkRateLimit(identifier: string): Promise<boolean> {
  const now = Date.now();
  const windowStart = now - RATE_LIMIT.windowMs;

  const existing = await convex.query(api.rateLimit.get, { identifier });

  if (!existing || existing.windowStart < windowStart) {
    await convex.mutation(api.rateLimit.set, {
      identifier,
      messageCount: 1,
      windowStart: now,
    });
    return true;
  }

  if (existing.messageCount >= RATE_LIMIT.messages) {
    return false;
  }

  await convex.mutation(api.rateLimit.increment, { identifier });
  return true;
}

export async function POST(req: NextRequest) {
  const ip = req.headers.get("x-forwarded-for") ?? "unknown";

  // Rate limiting
  const allowed = await checkRateLimit(ip);
  if (!allowed) {
    return new Response("Too many requests. Please wait before sending another message.", {
      status: 429,
    });
  }

  const { message, sessionId, conversationHistory } = await req.json();

  // Input validation
  if (!message || typeof message !== "string" || message.trim().length === 0) {
    return new Response("Invalid message", { status: 400 });
  }

  if (message.length > 2000) {
    return new Response("Message too long. Please keep messages under 2000 characters.", { status: 400 });
  }

  // Get or create conversation
  let conversationId = req.headers.get("x-conversation-id");
  if (!conversationId) {
    conversationId = await convex.mutation(api.conversations.create, {
      sessionId,
      metadata: {
        userAgent: req.headers.get("user-agent") ?? undefined,
        referrer: req.headers.get("referer") ?? undefined,
      },
    });
  }

  // Save user message
  await convex.mutation(api.messages.add, {
    conversationId: conversationId as any,
    role: "user",
    content: message,
    timestamp: Date.now(),
  });

  // Build message history (last 10 messages for context)
  const recentHistory = conversationHistory.slice(-10);

  const startTime = Date.now();

  // Stream the response
  const stream = anthropic.messages.stream({
    model: "claude-sonnet-4-20250514",
    max_tokens: 1024,
    system: SYSTEM_PROMPT,
    messages: [
      ...recentHistory,
      { role: "user", content: message },
    ],
  });

  // Collect full response for storage
  let fullResponse = "";

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      try {
        for await (const chunk of stream) {
          if (
            chunk.type === "content_block_delta" &&
            chunk.delta.type === "text_delta"
          ) {
            const text = chunk.delta.text;
            fullResponse += text;
            controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
          }
        }

        // Save assistant response
        const latencyMs = Date.now() - startTime;
        const usage = (await stream.finalMessage()).usage;

        await convex.mutation(api.messages.add, {
          conversationId: conversationId as any,
          role: "assistant",
          content: fullResponse,
          timestamp: Date.now(),
          metadata: {
            tokensUsed: usage.input_tokens + usage.output_tokens,
            model: "claude-sonnet-4-20250514",
            latencyMs,
          },
        });

        // Check for escalation trigger
        if (fullResponse.startsWith("[ESCALATE]")) {
          await convex.mutation(api.conversations.escalate, {
            conversationId: conversationId as any,
          });
        }

        controller.enqueue(encoder.encode(`data: ${JSON.stringify({ done: true, conversationId })}\n\n`));
        controller.close();
      } catch (error) {
        controller.enqueue(
          encoder.encode(`data: ${JSON.stringify({ error: "Stream error" })}\n\n`)
        );
        controller.close();
      }
    },
  });

  return new Response(readable, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
      "X-Conversation-Id": conversationId,
    },
  });
}

Step 3: The Chat UI Component

tsx

// components/ChatWidget.tsx
"use client";
import { useState, useRef, useEffect } from "react";
import { nanoid } from "nanoid";

interface Message {
  role: "user" | "assistant";
  content: string;
  timestamp: Date;
}

export function ChatWidget() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState("");
  const [isStreaming, setIsStreaming] = useState(false);
  const [conversationId, setConversationId] = useState<string | null>(null);
  const sessionId = useRef(nanoid());
  const messagesEndRef = useRef<HTMLDivElement>(null);

  useEffect(() => {
    messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
  }, [messages]);

  async function sendMessage() {
    if (!input.trim() || isStreaming) return;

    const userMessage: Message = {
      role: "user",
      content: input.trim(),
      timestamp: new Date(),
    };

    setMessages(prev => [...prev, userMessage]);
    setInput("");
    setIsStreaming(true);

    // Add placeholder assistant message
    const assistantMessage: Message = {
      role: "assistant",
      content: "",
      timestamp: new Date(),
    };
    setMessages(prev => [...prev, assistantMessage]);

    try {
      const response = await fetch("/api/chat", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          ...(conversationId ? { "x-conversation-id": conversationId } : {}),
        },
        body: JSON.stringify({
          message: userMessage.content,
          sessionId: sessionId.current,
          conversationHistory: messages.map(m => ({
            role: m.role,
            content: m.content,
          })),
        }),
      });

      if (!response.ok) {
        const errorText = await response.text();
        setMessages(prev => [
          ...prev.slice(0, -1),
          { ...assistantMessage, content: errorText || "Something went wrong. Please try again." },
        ]);
        return;
      }

      const reader = response.body!.getReader();
      const decoder = new TextDecoder();
      let fullContent = "";

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const text = decoder.decode(value);
        const lines = text.split("\n").filter(l => l.startsWith("data: "));

        for (const line of lines) {
          const data = JSON.parse(line.slice(6));

          if (data.text) {
            fullContent += data.text;
            setMessages(prev => [
              ...prev.slice(0, -1),
              { ...assistantMessage, content: fullContent },
            ]);
          }

          if (data.done && data.conversationId) {
            setConversationId(data.conversationId);
          }
        }
      }
    } catch (error) {
      setMessages(prev => [
        ...prev.slice(0, -1),
        { ...assistantMessage, content: "Connection error. Please check your internet and try again." },
      ]);
    } finally {
      setIsStreaming(false);
    }
  }

  return (
    <div className="flex flex-col h-[500px] border rounded-lg overflow-hidden">
      <div className="bg-primary p-4">
        <h3 className="text-primary-foreground font-semibold">Support Chat</h3>
      </div>

      <div className="flex-1 overflow-y-auto p-4 space-y-4">
        {messages.length === 0 && (
          <p className="text-muted-foreground text-sm text-center">
            How can I help you today?
          </p>
        )}
        {messages.map((msg, i) => (
          <div key={i} className={`flex ${msg.role === "user" ? "justify-end" : "justify-start"}`}>
            <div
              className={`max-w-[80%] rounded-lg px-4 py-2 text-sm ${
                msg.role === "user"
                  ? "bg-primary text-primary-foreground"
                  : "bg-muted"
              }`}
            >
              {msg.content || <span className="animate-pulse">...</span>}
            </div>
          </div>
        ))}
        <div ref={messagesEndRef} />
      </div>

      <div className="border-t p-4 flex gap-2">
        <input
          className="flex-1 text-sm border rounded px-3 py-2 focus:outline-none focus:ring-2 focus:ring-primary"
          value={input}
          onChange={e => setInput(e.target.value)}
          onKeyDown={e => e.key === "Enter" && !e.shiftKey && sendMessage()}
          placeholder="Type your message..."
          disabled={isStreaming}
          maxLength={2000}
        />
        <button
          onClick={sendMessage}
          disabled={isStreaming || !input.trim()}
          className="bg-primary text-primary-foreground px-4 py-2 rounded text-sm disabled:opacity-50"
        >
          Send
        </button>
      </div>
    </div>
  );
}

The Escalation System

When the chatbot can't help, it needs to hand off gracefully. The [ESCALATE] prefix in the system prompt triggers a flag in Convex.

Your human support queue should:

Show all conversations with status: "escalated"
Display full conversation history
Allow agents to take over and continue in the same thread
Automatically notify the customer that a human is joining

The implementation details vary by your support tooling, but the pattern is: agent detects limitation, sets a flag, your backend routes to human queue.

What Makes It Production-Ready

Three things separate this from a demo:

Rate limiting. Without it, a single automated script can exhaust your API budget in minutes. The sliding window approach handles bursts while preventing abuse.

Error boundaries everywhere. Every async operation can fail. The streaming connection can drop. The AI API can timeout. Each failure mode has a graceful fallback that doesn't leave the user staring at a spinner.

Conversation persistence. When the user refreshes the page, their conversation history is still there. This sounds obvious but most chatbot tutorials don't implement it, and users notice immediately when history disappears.

FAQ

Q: How do you build a production chatbot?

Build a production chatbot in layers: conversation management (tracking state and context), AI integration (Claude or GPT for understanding and generating responses), knowledge base connection (RAG for domain-specific answers), action capabilities (booking, searching, updating records), and monitoring (tracking quality, escalation rates, and user satisfaction).

Q: What makes a chatbot production-ready vs a demo?

Production chatbots need graceful error handling, conversation context management, escalation to human agents, response quality monitoring, rate limiting, persistent conversation history, multi-language support, and accessibility compliance. Demo chatbots skip these, leading to poor user experiences when things go wrong.

Q: How do you measure chatbot quality?

Measure chatbot quality through resolution rate (percentage of conversations resolved without human escalation), user satisfaction scores, response accuracy, average conversation length, escalation rate, and containment rate (percentage staying in the bot vs leaving the channel). Track these daily and set quality thresholds that trigger alerts.

Sources

What We're Building

A production-ready customer support chatbot for a SaaS product. Features:

Context-aware multi-turn conversation
Knowledge base integration (answers from your docs)
Conversation memory across sessions
Graceful escalation to human agents
Streaming responses for good UX
Abuse prevention and rate limiting
Analytics and monitoring

Tech stack:

Next.js 16 (App Router)
Convex (real-time data, conversation storage)
Claude API (with streaming)
TypeScript throughout

Architecture First

Most chatbot tutorials skip architecture and jump straight to code. This is where they fail. Let me show the architecture that scales.

User Browser
    |
    | (streaming SSE)
    v
Next.js API Route
    |           |
    |           v
    |     Rate Limiter
    |           |
    v           v
Claude API   Abuse Detection
    |
    v
Convex Database
    |
    |---> Conversation History
    |---> Message Analytics
    '---> User Sessions

Key principle: separate the stateless AI call from the stateful conversation management. The AI model doesn't hold state. Your database does. The API layer connects them.

Step 1: Conversation Data Model

typescript

// convex/schema.ts
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";

export default defineSchema({
  conversations: defineTable({
    sessionId: v.string(),
    userId: v.optional(v.string()),
    createdAt: v.number(),
    updatedAt: v.number(),
    status: v.union(v.literal("active"), v.literal("resolved"), v.literal("escalated")),
    metadata: v.optional(v.object({
      page: v.optional(v.string()),
      userAgent: v.optional(v.string()),
      referrer: v.optional(v.string()),
    })),
  }).index("by_session", ["sessionId"])
    .index("by_user", ["userId"]),

  messages: defineTable({
    conversationId: v.id("conversations"),
    role: v.union(v.literal("user"), v.literal("assistant"), v.literal("system")),
    content: v.string(),
    timestamp: v.number(),
    metadata: v.optional(v.object({
      tokensUsed: v.optional(v.number()),
      model: v.optional(v.string()),
      latencyMs: v.optional(v.number()),
    })),
  }).index("by_conversation", ["conversationId"]),

  rateLimits: defineTable({
    identifier: v.string(), // IP or userId
    messageCount: v.number(),
    windowStart: v.number(),
  }).index("by_identifier", ["identifier"]),
});

Step 2: The Chat API Route with Streaming

typescript

// app/api/chat/route.ts
import { NextRequest } from "next/server";
import Anthropic from "@anthropic-ai/sdk";
import { ConvexHttpClient } from "convex/browser";
import { api } from "@/convex/_generated/api";

const anthropic = new Anthropic();
const convex = new ConvexHttpClient(process.env.NEXT_PUBLIC_CONVEX_URL!);

const SYSTEM_PROMPT = `You are a helpful customer support assistant for Acme SaaS.

Your role:
- Answer questions about product features, pricing, and troubleshooting
- Be concise and helpful
- If you don't know something, say so and offer to connect them with the team
- Never make up information about features or pricing

Escalation triggers (respond with [ESCALATE] prefix):
- Customer expresses significant frustration or mentions legal action
- Technical issues you cannot resolve
- Requests for refunds or account modifications
- Any security concerns

Knowledge base context will be provided when available.`;

const RATE_LIMIT = { messages: 20, windowMs: 60000 }; // 20 messages per minute

async function checkRateLimit(identifier: string): Promise<boolean> {
  const now = Date.now();
  const windowStart = now - RATE_LIMIT.windowMs;

  const existing = await convex.query(api.rateLimit.get, { identifier });

  if (!existing || existing.windowStart < windowStart) {
    await convex.mutation(api.rateLimit.set, {
      identifier,
      messageCount: 1,
      windowStart: now,
    });
    return true;
  }

  if (existing.messageCount >= RATE_LIMIT.messages) {
    return false;
  }

  await convex.mutation(api.rateLimit.increment, { identifier });
  return true;
}

export async function POST(req: NextRequest) {
  const ip = req.headers.get("x-forwarded-for") ?? "unknown";

  // Rate limiting
  const allowed = await checkRateLimit(ip);
  if (!allowed) {
    return new Response("Too many requests. Please wait before sending another message.", {
      status: 429,
    });
  }

  const { message, sessionId, conversationHistory } = await req.json();

  // Input validation
  if (!message || typeof message !== "string" || message.trim().length === 0) {
    return new Response("Invalid message", { status: 400 });
  }

  if (message.length > 2000) {
    return new Response("Message too long. Please keep messages under 2000 characters.", { status: 400 });
  }

  // Get or create conversation
  let conversationId = req.headers.get("x-conversation-id");
  if (!conversationId) {
    conversationId = await convex.mutation(api.conversations.create, {
      sessionId,
      metadata: {
        userAgent: req.headers.get("user-agent") ?? undefined,
        referrer: req.headers.get("referer") ?? undefined,
      },
    });
  }

  // Save user message
  await convex.mutation(api.messages.add, {
    conversationId: conversationId as any,
    role: "user",
    content: message,
    timestamp: Date.now(),
  });

  // Build message history (last 10 messages for context)
  const recentHistory = conversationHistory.slice(-10);

  const startTime = Date.now();

  // Stream the response
  const stream = anthropic.messages.stream({
    model: "claude-sonnet-4-20250514",
    max_tokens: 1024,
    system: SYSTEM_PROMPT,
    messages: [
      ...recentHistory,
      { role: "user", content: message },
    ],
  });

  // Collect full response for storage
  let fullResponse = "";

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      try {
        for await (const chunk of stream) {
          if (
            chunk.type === "content_block_delta" &&
            chunk.delta.type === "text_delta"
          ) {
            const text = chunk.delta.text;
            fullResponse += text;
            controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
          }
        }

        // Save assistant response
        const latencyMs = Date.now() - startTime;
        const usage = (await stream.finalMessage()).usage;

        await convex.mutation(api.messages.add, {
          conversationId: conversationId as any,
          role: "assistant",
          content: fullResponse,
          timestamp: Date.now(),
          metadata: {
            tokensUsed: usage.input_tokens + usage.output_tokens,
            model: "claude-sonnet-4-20250514",
            latencyMs,
          },
        });

        // Check for escalation trigger
        if (fullResponse.startsWith("[ESCALATE]")) {
          await convex.mutation(api.conversations.escalate, {
            conversationId: conversationId as any,
          });
        }

        controller.enqueue(encoder.encode(`data: ${JSON.stringify({ done: true, conversationId })}\n\n`));
        controller.close();
      } catch (error) {
        controller.enqueue(
          encoder.encode(`data: ${JSON.stringify({ error: "Stream error" })}\n\n`)
        );
        controller.close();
      }
    },
  });

  return new Response(readable, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
      "X-Conversation-Id": conversationId,
    },
  });
}

Step 3: The Chat UI Component

tsx

// components/ChatWidget.tsx
"use client";
import { useState, useRef, useEffect } from "react";
import { nanoid } from "nanoid";

interface Message {
  role: "user" | "assistant";
  content: string;
  timestamp: Date;
}

export function ChatWidget() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState("");
  const [isStreaming, setIsStreaming] = useState(false);
  const [conversationId, setConversationId] = useState<string | null>(null);
  const sessionId = useRef(nanoid());
  const messagesEndRef = useRef<HTMLDivElement>(null);

  useEffect(() => {
    messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
  }, [messages]);

  async function sendMessage() {
    if (!input.trim() || isStreaming) return;

    const userMessage: Message = {
      role: "user",
      content: input.trim(),
      timestamp: new Date(),
    };

    setMessages(prev => [...prev, userMessage]);
    setInput("");
    setIsStreaming(true);

    // Add placeholder assistant message
    const assistantMessage: Message = {
      role: "assistant",
      content: "",
      timestamp: new Date(),
    };
    setMessages(prev => [...prev, assistantMessage]);

    try {
      const response = await fetch("/api/chat", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          ...(conversationId ? { "x-conversation-id": conversationId } : {}),
        },
        body: JSON.stringify({
          message: userMessage.content,
          sessionId: sessionId.current,
          conversationHistory: messages.map(m => ({
            role: m.role,
            content: m.content,
          })),
        }),
      });

      if (!response.ok) {
        const errorText = await response.text();
        setMessages(prev => [
          ...prev.slice(0, -1),
          { ...assistantMessage, content: errorText || "Something went wrong. Please try again." },
        ]);
        return;
      }

      const reader = response.body!.getReader();
      const decoder = new TextDecoder();
      let fullContent = "";

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const text = decoder.decode(value);
        const lines = text.split("\n").filter(l => l.startsWith("data: "));

        for (const line of lines) {
          const data = JSON.parse(line.slice(6));

          if (data.text) {
            fullContent += data.text;
            setMessages(prev => [
              ...prev.slice(0, -1),
              { ...assistantMessage, content: fullContent },
            ]);
          }

          if (data.done && data.conversationId) {
            setConversationId(data.conversationId);
          }
        }
      }
    } catch (error) {
      setMessages(prev => [
        ...prev.slice(0, -1),
        { ...assistantMessage, content: "Connection error. Please check your internet and try again." },
      ]);
    } finally {
      setIsStreaming(false);
    }
  }

  return (
    <div className="flex flex-col h-[500px] border rounded-lg overflow-hidden">
      <div className="bg-primary p-4">
        <h3 className="text-primary-foreground font-semibold">Support Chat</h3>
      </div>

      <div className="flex-1 overflow-y-auto p-4 space-y-4">
        {messages.length === 0 && (
          <p className="text-muted-foreground text-sm text-center">
            How can I help you today?
          </p>
        )}
        {messages.map((msg, i) => (
          <div key={i} className={`flex ${msg.role === "user" ? "justify-end" : "justify-start"}`}>
            <div
              className={`max-w-[80%] rounded-lg px-4 py-2 text-sm ${
                msg.role === "user"
                  ? "bg-primary text-primary-foreground"
                  : "bg-muted"
              }`}
            >
              {msg.content || <span className="animate-pulse">...</span>}
            </div>
          </div>
        ))}
        <div ref={messagesEndRef} />
      </div>

      <div className="border-t p-4 flex gap-2">
        <input
          className="flex-1 text-sm border rounded px-3 py-2 focus:outline-none focus:ring-2 focus:ring-primary"
          value={input}
          onChange={e => setInput(e.target.value)}
          onKeyDown={e => e.key === "Enter" && !e.shiftKey && sendMessage()}
          placeholder="Type your message..."
          disabled={isStreaming}
          maxLength={2000}
        />
        <button
          onClick={sendMessage}
          disabled={isStreaming || !input.trim()}
          className="bg-primary text-primary-foreground px-4 py-2 rounded text-sm disabled:opacity-50"
        >
          Send
        </button>
      </div>
    </div>
  );
}

The Escalation System

When the chatbot can't help, it needs to hand off gracefully. The [ESCALATE] prefix in the system prompt triggers a flag in Convex.

Your human support queue should:

Show all conversations with status: "escalated"
Display full conversation history
Allow agents to take over and continue in the same thread
Automatically notify the customer that a human is joining

The implementation details vary by your support tooling, but the pattern is: agent detects limitation, sets a flag, your backend routes to human queue.

What Makes It Production-Ready

Three things separate this from a demo:

Rate limiting. Without it, a single automated script can exhaust your API budget in minutes. The sliding window approach handles bursts while preventing abuse.

FAQ

Q: How do you build a production chatbot?

Q: What makes a chatbot production-ready vs a demo?

Q: How do you measure chatbot quality?

Build a Production Chatbot from Scratch: No Shortcuts

What We're Building

Architecture First

Step 1: Conversation Data Model

Step 2: The Chat API Route with Streaming

Step 3: The Chat UI Component

The Escalation System

What Makes It Production-Ready

FAQ

Sources

Further Reading

Related Articles

Want to Implement This?

Build a Production Chatbot from Scratch: No Shortcuts

What We're Building

Architecture First

Step 1: Conversation Data Model

Step 2: The Chat API Route with Streaming

Step 3: The Chat UI Component

The Escalation System

What Makes It Production-Ready

FAQ

Sources

Further Reading

Related Articles

Want to Implement This?