Loading...
Loading...
Weekly AI insights —
Real strategies, no fluff. Unsubscribe anytime.
Written by Gareth Simono, Founder and CEO of Agentik {OS}. Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise platforms. Gareth orchestrates 267 specialized AI agents to deliver production software 10x faster than traditional development teams.
Founder & CEO, Agentik{OS}
Chat widgets are easy. Production chatbots that handle real users without breaking are hard. Here's the full guide to building one that actually works.

The demo works beautifully. You show it to your team. Everyone is impressed. Then you put it in front of real users and it falls apart.
Real users ask questions you didn't anticipate. They provide context in unexpected ways. They ask follow-up questions in the middle of different topics. They paste in long text. They send one-word messages. They try to make the chatbot say inappropriate things. They get frustrated when it doesn't understand them.
Building a chatbot that survives this is a different project from building a chatbot that works in your demo. This guide covers the version that works for real users.
A production-ready customer support chatbot for a SaaS product. Features:
Tech stack:
Most chatbot tutorials skip architecture and jump straight to code. This is where they fail. Let me show the architecture that scales.
User Browser
|
| (streaming SSE)
v
Next.js API Route
| |
| v
| Rate Limiter
| |
v v
Claude API Abuse Detection
|
v
Convex Database
|
|---> Conversation History
|---> Message Analytics
'---> User Sessions
Key principle: separate the stateless AI call from the stateful conversation management. The AI model doesn't hold state. Your database does. The API layer connects them.
// convex/schema.ts
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";
export default defineSchema({
conversations: defineTable({
sessionId: v.string(),
userId: v.optional(v.string()),
createdAt: v.number(),
updatedAt: v.number(),
status: v.union(v.literal("active"), v.literal("resolved"), v.literal("escalated")),
metadata: v.optional(v.object({
page: v.optional(v.string()),
userAgent: v.optional(v.string()),
referrer: v.optional(v.string()),
})),
}).index("by_session", ["sessionId"])
.index("by_user", ["userId"]),
messages: defineTable({
conversationId: v.id("conversations"),
role: v.union(v.literal("user"), v.literal("assistant"), v.literal("system")),
content: v.string(),
timestamp: v.number(),
metadata: v.optional(v.object({
tokensUsed: v.optional(v.number()),
model: v.optional(v.string()),
latencyMs: v.optional(v.number()),
})),
}).index("by_conversation", ["conversationId"]),
rateLimits: defineTable({
identifier: v.string(), // IP or userId
messageCount: v.number(),
windowStart: v.number(),
}).index("by_identifier", ["identifier"]),
});// app/api/chat/route.ts
import { NextRequest } from "next/server";
import Anthropic from "@anthropic-ai/sdk";
import { ConvexHttpClient } from "convex/browser";
import { api } from "@/convex/_generated/api";
const anthropic = new Anthropic();
const convex = new ConvexHttpClient(process.env.NEXT_PUBLIC_CONVEX_URL!);
const SYSTEM_PROMPT = `You are a helpful customer support assistant for Acme SaaS.
Your role:
- Answer questions about product features, pricing, and troubleshooting
- Be concise and helpful
- If you don't know something, say so and offer to connect them with the team
- Never make up information about features or pricing
Escalation triggers (respond with [ESCALATE] prefix):
- Customer expresses significant frustration or mentions legal action
- Technical issues you cannot resolve
- Requests for refunds or account modifications
- Any security concerns
Knowledge base context will be provided when available.`;
const RATE_LIMIT = { messages: 20, windowMs: 60000 }; // 20 messages per minute
async function checkRateLimit(identifier: string): Promise<boolean> {
const now = Date.now();
const windowStart = now - RATE_LIMIT.windowMs;
const existing = await convex.query(api.rateLimit.get, { identifier });
if (!existing || existing.windowStart < windowStart) {
await convex.mutation(api.rateLimit.set, {
identifier,
messageCount: 1,
windowStart: now,
});
return true;
}
if (existing.messageCount >= RATE_LIMIT.messages) {
return false;
}
await convex.mutation(api.rateLimit.increment, { identifier });
return true;
}
export async function POST(req: NextRequest) {
const ip = req.headers.get("x-forwarded-for") ?? "unknown";
// Rate limiting
const allowed = await checkRateLimit(ip);
if (!allowed) {
return new Response("Too many requests. Please wait before sending another message.", {
status: 429,
});
}
const { message, sessionId, conversationHistory } = await req.json();
// Input validation
if (!message || typeof message !== "string" || message.trim().length === 0) {
return new Response("Invalid message", { status: 400 });
}
if (message.length > 2000) {
return new Response("Message too long. Please keep messages under 2000 characters.", { status: 400 });
}
// Get or create conversation
let conversationId = req.headers.get("x-conversation-id");
if (!conversationId) {
conversationId = await convex.mutation(api.conversations.create, {
sessionId,
metadata: {
userAgent: req.headers.get("user-agent") ?? undefined,
referrer: req.headers.get("referer") ?? undefined,
},
});
}
// Save user message
await convex.mutation(api.messages.add, {
conversationId: conversationId as any,
role: "user",
content: message,
timestamp: Date.now(),
});
// Build message history (last 10 messages for context)
const recentHistory = conversationHistory.slice(-10);
const startTime = Date.now();
// Stream the response
const stream = anthropic.messages.stream({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
system: SYSTEM_PROMPT,
messages: [
...recentHistory,
{ role: "user", content: message },
],
});
// Collect full response for storage
let fullResponse = "";
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
try {
for await (const chunk of stream) {
if (
chunk.type === "content_block_delta" &&
chunk.delta.type === "text_delta"
) {
const text = chunk.delta.text;
fullResponse += text;
controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
}
}
// Save assistant response
const latencyMs = Date.now() - startTime;
const usage = (await stream.finalMessage()).usage;
await convex.mutation(api.messages.add, {
conversationId: conversationId as any,
role: "assistant",
content: fullResponse,
timestamp: Date.now(),
metadata: {
tokensUsed: usage.input_tokens + usage.output_tokens,
model: "claude-sonnet-4-20250514",
latencyMs,
},
});
// Check for escalation trigger
if (fullResponse.startsWith("[ESCALATE]")) {
await convex.mutation(api.conversations.escalate, {
conversationId: conversationId as any,
});
}
controller.enqueue(encoder.encode(`data: ${JSON.stringify({ done: true, conversationId })}\n\n`));
controller.close();
} catch (error) {
controller.enqueue(
encoder.encode(`data: ${JSON.stringify({ error: "Stream error" })}\n\n`)
);
controller.close();
}
},
});
return new Response(readable, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
Connection: "keep-alive",
"X-Conversation-Id": conversationId,
},
});
}// components/ChatWidget.tsx
"use client";
import { useState, useRef, useEffect } from "react";
import { nanoid } from "nanoid";
interface Message {
role: "user" | "assistant";
content: string;
timestamp: Date;
}
export function ChatWidget() {
const [messages, setMessages] = useState<Message[]>([]);
const [input, setInput] = useState("");
const [isStreaming, setIsStreaming] = useState(false);
const [conversationId, setConversationId] = useState<string | null>(null);
const sessionId = useRef(nanoid());
const messagesEndRef = useRef<HTMLDivElement>(null);
useEffect(() => {
messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
}, [messages]);
async function sendMessage() {
if (!input.trim() || isStreaming) return;
const userMessage: Message = {
role: "user",
content: input.trim(),
timestamp: new Date(),
};
setMessages(prev => [...prev, userMessage]);
setInput("");
setIsStreaming(true);
// Add placeholder assistant message
const assistantMessage: Message = {
role: "assistant",
content: "",
timestamp: new Date(),
};
setMessages(prev => [...prev, assistantMessage]);
try {
const response = await fetch("/api/chat", {
method: "POST",
headers: {
"Content-Type": "application/json",
...(conversationId ? { "x-conversation-id": conversationId } : {}),
},
body: JSON.stringify({
message: userMessage.content,
sessionId: sessionId.current,
conversationHistory: messages.map(m => ({
role: m.role,
content: m.content,
})),
}),
});
if (!response.ok) {
const errorText = await response.text();
setMessages(prev => [
...prev.slice(0, -1),
{ ...assistantMessage, content: errorText || "Something went wrong. Please try again." },
]);
return;
}
const reader = response.body!.getReader();
const decoder = new TextDecoder();
let fullContent = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split("\n").filter(l => l.startsWith("data: "));
for (const line of lines) {
const data = JSON.parse(line.slice(6));
if (data.text) {
fullContent += data.text;
setMessages(prev => [
...prev.slice(0, -1),
{ ...assistantMessage, content: fullContent },
]);
}
if (data.done && data.conversationId) {
setConversationId(data.conversationId);
}
}
}
} catch (error) {
setMessages(prev => [
...prev.slice(0, -1),
{ ...assistantMessage, content: "Connection error. Please check your internet and try again." },
]);
} finally {
setIsStreaming(false);
}
}
return (
<div className="flex flex-col h-[500px] border rounded-lg overflow-hidden">
<div className="bg-primary p-4">
<h3 className="text-primary-foreground font-semibold">Support Chat</h3>
</div>
<div className="flex-1 overflow-y-auto p-4 space-y-4">
{messages.length === 0 && (
<p className="text-muted-foreground text-sm text-center">
How can I help you today?
</p>
)}
{messages.map((msg, i) => (
<div key={i} className={`flex ${msg.role === "user" ? "justify-end" : "justify-start"}`}>
<div
className={`max-w-[80%] rounded-lg px-4 py-2 text-sm ${
msg.role === "user"
? "bg-primary text-primary-foreground"
: "bg-muted"
}`}
>
{msg.content || <span className="animate-pulse">...</span>}
</div>
</div>
))}
<div ref={messagesEndRef} />
</div>
<div className="border-t p-4 flex gap-2">
<input
className="flex-1 text-sm border rounded px-3 py-2 focus:outline-none focus:ring-2 focus:ring-primary"
value={input}
onChange={e => setInput(e.target.value)}
onKeyDown={e => e.key === "Enter" && !e.shiftKey && sendMessage()}
placeholder="Type your message..."
disabled={isStreaming}
maxLength={2000}
/>
<button
onClick={sendMessage}
disabled={isStreaming || !input.trim()}
className="bg-primary text-primary-foreground px-4 py-2 rounded text-sm disabled:opacity-50"
>
Send
</button>
</div>
</div>
);
}When the chatbot can't help, it needs to hand off gracefully. The [ESCALATE] prefix in the system prompt triggers a flag in Convex.
Your human support queue should:
status: "escalated"The implementation details vary by your support tooling, but the pattern is: agent detects limitation, sets a flag, your backend routes to human queue.
Three things separate this from a demo:
Rate limiting. Without it, a single automated script can exhaust your API budget in minutes. The sliding window approach handles bursts while preventing abuse.
Error boundaries everywhere. Every async operation can fail. The streaming connection can drop. The AI API can timeout. Each failure mode has a graceful fallback that doesn't leave the user staring at a spinner.
Conversation persistence. When the user refreshes the page, their conversation history is still there. This sounds obvious but most chatbot tutorials don't implement it, and users notice immediately when history disappears.
Q: How do you build a production chatbot?
Build a production chatbot in layers: conversation management (tracking state and context), AI integration (Claude or GPT for understanding and generating responses), knowledge base connection (RAG for domain-specific answers), action capabilities (booking, searching, updating records), and monitoring (tracking quality, escalation rates, and user satisfaction).
Q: What makes a chatbot production-ready vs a demo?
Production chatbots need graceful error handling, conversation context management, escalation to human agents, response quality monitoring, rate limiting, persistent conversation history, multi-language support, and accessibility compliance. Demo chatbots skip these, leading to poor user experiences when things go wrong.
Q: How do you measure chatbot quality?
Measure chatbot quality through resolution rate (percentage of conversations resolved without human escalation), user satisfaction scores, response accuracy, average conversation length, escalation rate, and containment rate (percentage staying in the bot vs leaving the channel). Track these daily and set quality thresholds that trigger alerts.
Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise. Gareth built Agentik {OS} to prove that one person with the right AI system can outperform an entire traditional development team. He has personally architected and shipped 7+ production applications using AI-first workflows.

Build a RAG System That Actually Works in Production
Basic RAG works in demos and breaks in production. Here is what naive implementations get wrong and how to build the version that handles real users.

AI Lead Generation: Build a System That Runs Itself
Cold outreach is broken. Spray and pray is dead. Here's how to build an AI lead generation system that finds the right people and says the right thing.

Build Your First AI Agent in One Afternoon
Stop watching tutorials about tutorials. Here's how to actually build an AI agent that does something useful, from zero, in one sitting.
Stop reading about AI and start building with it. Book a free discovery call and see how AI agents can accelerate your business.