Loading...
Loading...
Weekly AI insights —
Real strategies, no fluff. Unsubscribe anytime.
Written by Gareth Simono, Founder and CEO of Agentik {OS}. Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise platforms. Gareth orchestrates 267 specialized AI agents to deliver production software 10x faster than traditional development teams.
Founder & CEO, Agentik {OS}
Your AI service will go down. Not might. Will. The fallback chain pattern keeps your app running with graceful degradation at every level.

Your AI service will go down. Not might. Will.
This is not pessimism. This is engineering reality. OpenAI has had outages. Anthropic has had outages. Every API on the internet has outages, planned and unplanned. The question you need to answer before your first production deploy is not "if" but "what do users see when the AI goes down?"
If your answer is a frozen UI with a spinner that never resolves, you have failed your users at the most basic level. Your application should continue to function, with reduced capability, even when every AI service is simultaneously unavailable.
Building that resilience is the subject of this article.
Think of your AI features as a stack of progressively simpler alternatives. When the highest tier fails, drop to the next. When that fails, drop further. The user's experience degrades gracefully rather than collapsing entirely.
| Level | Strategy | Speed | Cost | Capability |
|---|---|---|---|---|
| 1 | Primary AI model (Claude Sonnet, GPT-4o) | 1-3s | High | Full |
| 2 | Smaller, faster model (Claude Haiku, GPT-4o-mini) | 200-500ms | Low | 80% of use cases |
| 3 | Semantic cache (similar past responses) | < 50ms | Near-zero | Known queries |
| 4 | Rule-based templates | Instant | Zero | Generic but functional |
| 5 | Honest degradation message | Instant | Zero | User informed |
Each level catches the failure of the level above it. Most users will never reach level 3. Some will. A few will hit level 5. But none will see a broken application.
// Fallback chain implementation
async function generateWithFallback(
prompt: string,
context: RequestContext
): Promise<GenerationResult> {
// Level 1: Primary model
try {
return await generateWithClaude(prompt, 'claude-sonnet-4-20250514', context);
} catch (err) {
logFailure('primary_model', err, context);
}
// Level 2: Faster model
try {
return await generateWithClaude(prompt, 'claude-haiku-4-20250514', context);
} catch (err) {
logFailure('fallback_model', err, context);
}
// Level 3: Semantic cache
const cached = await semanticCache.findSimilar(prompt, threshold: 0.92);
if (cached) {
return { content: cached.response, source: 'cache', quality: 'cached' };
}
// Level 4: Rule-based default
const ruleBasedResponse = generateRuleBased(prompt, context.featureId);
if (ruleBasedResponse) {
return { content: ruleBasedResponse, source: 'rule-based', quality: 'template' };
}
// Level 5: Honest degradation
return {
content: null,
source: 'unavailable',
quality: 'none',
message: 'AI features are temporarily unavailable. You can continue manually.',
};
}Half of AI errors are caused by bad inputs, not bad models.
A user pastes a 50,000-character document into a field that feeds an AI prompt. The combined prompt exceeds the model's context window. The API returns a 400 error. Without input validation, your user sees an opaque error message and has no idea what went wrong.
Validate everything before it reaches the model:
import { z } from 'zod';
// Strict input validation schema
const aiRequestSchema = z.object({
// Length limits that account for your prompt overhead
userContent: z.string()
.min(1, 'Content cannot be empty')
.max(8000, 'Content exceeds maximum length of 8,000 characters'),
// Context data similarly bounded
documentContext: z.string()
.max(16000, 'Document context exceeds maximum length')
.optional(),
// Enum types prevent injection attempts
tone: z.enum(['professional', 'casual', 'technical', 'friendly']),
language: z.string()
.regex(/^[a-z]{2}(-[A-Z]{2})?$/, 'Invalid language code')
.default('en'),
});
type AIRequest = z.infer<typeof aiRequestSchema>;
async function handleAIRequest(rawInput: unknown): Promise<Response> {
const parseResult = aiRequestSchema.safeParse(rawInput);
if (!parseResult.success) {
return Response.json({
error: {
code: 'VALIDATION_ERROR',
message: 'Invalid request',
details: parseResult.error.errors.map(e => ({
field: e.path.join('.'),
message: e.message,
})),
}
}, { status: 400 });
}
// Input is now type-safe and validated
return processValidatedRequest(parseResult.data);
}Beyond length limits, check for content that is likely to cause problems:
Return helpful, specific error messages. "Your content is 15,247 characters. The maximum for this feature is 8,000 characters. You can summarize your content or use the full document upload feature." is infinitely more useful than "Error: 400 Bad Request."
AI model outputs are untrusted input. Treat them with the same suspicion you apply to user form submissions.
This is not hypothetical. AI models sometimes:
Sanitize every AI output before it reaches your users:
import DOMPurify from 'isomorphic-dompurify';
interface SanitizedOutput {
content: string;
wasModified: boolean;
violations: string[];
}
function sanitizeAIOutput(
rawOutput: string,
config: SanitizationConfig
): SanitizedOutput {
const violations: string[] = [];
let content = rawOutput;
// Remove any HTML if plain text expected
if (config.expectedFormat === 'plain-text') {
const cleaned = DOMPurify.sanitize(rawOutput, { ALLOWED_TAGS: [] });
if (cleaned !== rawOutput) {
violations.push('HTML content detected and removed');
content = cleaned;
}
}
// If HTML is allowed, still sanitize
if (config.expectedFormat === 'html') {
content = DOMPurify.sanitize(rawOutput, {
ALLOWED_TAGS: ['p', 'br', 'strong', 'em', 'ul', 'ol', 'li', 'code', 'pre'],
ALLOWED_ATTR: [],
});
}
// Check for system prompt leakage
if (config.systemPromptKeywords) {
for (const keyword of config.systemPromptKeywords) {
if (content.toLowerCase().includes(keyword.toLowerCase())) {
violations.push(`System prompt keyword detected: ${keyword}`);
content = content.replace(new RegExp(keyword, 'gi'), '[redacted]');
}
}
}
// Enforce max output length
if (config.maxLength && content.length > config.maxLength) {
violations.push(`Output truncated from ${content.length} to ${config.maxLength} characters`);
content = content.substring(0, config.maxLength);
}
return {
content,
wasModified: violations.length > 0,
violations,
};
}Log every violation. A pattern of violations indicates a prompt that needs adjustment, a model that is behaving unexpectedly, or a user attempting prompt injection.
When an AI service starts failing, the worst response is to keep hammering it with requests. Response times increase. Timeout errors accumulate. Users wait longer for responses that never come. Server resources pile up with pending requests.
A circuit breaker detects sustained failures and stops the cascade:
class AICircuitBreaker {
private failureCount = 0;
private lastFailureTime = 0;
private state: 'closed' | 'open' | 'half-open' = 'closed';
private readonly FAILURE_THRESHOLD = 5;
private readonly RECOVERY_TIMEOUT_MS = 30_000;
private readonly SUCCESS_THRESHOLD = 2;
private successCount = 0;
async call<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === 'open') {
const timeSinceLastFailure = Date.now() - this.lastFailureTime;
if (timeSinceLastFailure < this.RECOVERY_TIMEOUT_MS) {
throw new CircuitOpenError('AI service circuit breaker is open');
}
// Transition to half-open: allow one test request
this.state = 'half-open';
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (err) {
this.onFailure();
throw err;
}
}
private onSuccess() {
this.failureCount = 0;
if (this.state === 'half-open') {
this.successCount++;
if (this.successCount >= this.SUCCESS_THRESHOLD) {
this.state = 'closed';
this.successCount = 0;
}
}
}
private onFailure() {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.FAILURE_THRESHOLD) {
this.state = 'open';
}
}
getState() { return this.state; }
}When the circuit is open, all requests immediately fail and route to the fallback chain. Every 30 seconds, one probe request goes through. If it succeeds twice, the circuit closes and normal operation resumes.
Without circuit breakers, an AI service outage becomes a full application outage because every request piles up waiting for timeouts.
Some AI failures are transient. Rate limiting errors. Brief network interruptions. 503 responses during rolling restarts.
Retrying these can recover gracefully. But naive retry logic makes things worse.
Exponential backoff. Do not retry immediately. Wait before retrying, and wait longer with each attempt. Immediate retries on a rate-limited service just burn through your retry budget faster.
Jitter. If all clients retry at the same exponential intervals, they hit the service simultaneously. Add random jitter to spread retries.
Retry budgets. Retries consume resources. Bound the total time you are willing to spend on retries, not just the number of attempts.
async function withRetry<T>(
fn: () => Promise<T>,
options: RetryOptions = {}
): Promise<T> {
const {
maxAttempts = 3,
baseDelayMs = 500,
maxDelayMs = 10_000,
retryableErrors = ['rate_limit', 'overloaded', 'timeout'],
} = options;
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await fn();
} catch (err) {
const isRetryable = retryableErrors.some(code =>
err instanceof Error && err.message.includes(code)
);
if (!isRetryable || attempt === maxAttempts) throw err;
// Exponential backoff with jitter
const exponentialDelay = baseDelayMs * Math.pow(2, attempt - 1);
const jitter = Math.random() * baseDelayMs;
const delay = Math.min(exponentialDelay + jitter, maxDelayMs);
await sleep(delay);
}
}
throw new Error('Max retry attempts exceeded');
}Every failure your users encounter needs to communicate three things.
What happened. Not "An error occurred." "We could not generate your summary right now."
What they can do. "Try again in a few minutes, or use the manual editor below."
That their data is safe. "Your document has been saved. Nothing was lost."
// Typed error messages by failure mode
const AI_ERROR_MESSAGES: Record<string, UserFacingError> = {
service_unavailable: {
title: 'AI features temporarily unavailable',
description: 'Our AI service is experiencing issues. Your work has been saved.',
action: 'You can continue editing manually, or try again in a few minutes.',
canRetry: true,
},
rate_limit: {
title: 'Too many requests',
description: 'You have reached your hourly AI usage limit.',
action: 'Your limit resets in {resetTime}. Upgrade your plan for higher limits.',
canRetry: false,
},
content_too_long: {
title: 'Content is too long',
description: 'Your content exceeds the maximum length for this feature.',
action: 'Shorten your content to under {maxLength} characters, or use the full document feature.',
canRetry: false,
},
output_invalid: {
title: 'AI response was invalid',
description: 'The AI generated an unexpected response. This has been logged.',
action: 'Please try again. If this keeps happening, contact support.',
canRetry: true,
},
};Never show raw error messages from AI APIs. Never show stack traces. Every error message is a product decision. Treat it like one.
Solid error handling pairs naturally with monitoring that catches patterns before they become crises, and deployment automation that rolls back before users are widely affected.
Q: How should AI applications handle errors?
AI applications use a fallback chain: try the primary AI model, fall back to a simpler model if it fails, fall back to cached responses, then fall back to a graceful error message. Every AI call should have timeout handling, retry logic with exponential backoff, and circuit breakers to prevent cascading failures.
Q: What is the fallback chain pattern for AI errors?
The fallback chain progressively degrades: primary model (Claude Opus) fails, try secondary model (Claude Sonnet), then try cached response, then show a helpful error message. Each level provides a worse but acceptable user experience. The user should never see a raw error from an AI model failure.
Q: How do you handle AI model timeouts and rate limits?
Handle timeouts with configurable deadline per request, automatic retries with exponential backoff, and fallback to alternative models. Handle rate limits with request queuing, prioritization, and graceful degradation. Pre-cache frequent responses to serve during rate limit periods.
Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise. Gareth built Agentik {OS} to prove that one person with the right AI system can outperform an entire traditional development team. He has personally architected and shipped 7+ production applications using AI-first workflows.

Monitoring AI Apps: What You're Not Tracking
Your API returns 200 OK while the AI generates nonsense. Standard monitoring misses this entirely. Here's the AI-specific observability stack you need.

AI Security: Prompt Injection Is the New SQLi
Prompt injection is the SQL injection of 2026. Your AI app is almost certainly vulnerable. Here are the defense layers that actually work.

Deployment Automation: AI Agents Handle DevOps
Thousands of production deployments, zero 2am wake-up calls. AI agents automate Vercel config, env management, and progressive rollouts that actually work.
Stop reading about AI and start building with it. Book a free discovery call and see how AI agents can accelerate your business.