AI DevelopmentFebruary 7, 202618 min read

Error Handling for AI Apps: The Fallback Chain

Q: How should AI applications handle errors?

AI applications use a fallback chain: try the primary AI model, fall back to a simpler model if it fails, fall back to cached responses, then fall back to a graceful error message. Every AI call should have timeout handling, retry logic with exponential backoff, and circuit breakers to prevent cascading failures.

Q: What is the fallback chain pattern for AI errors?

The fallback chain progressively degrades: primary model (Claude Opus) fails, try secondary model (Claude Sonnet), then try cached response, then show a helpful error message. Each level provides a worse but acceptable user experience. The user should never see a raw error from an AI model failure.

Q: How do you handle AI model timeouts and rate limits?

Handle timeouts with configurable deadline per request, automatic retries with exponential backoff, and fallback to alternative models. Handle rate limits with request queuing, prioritization, and graceful degradation. Pre-cache frequent responses to serve during rate limit periods.

Gareth Simono

Founder & CEO, Agentik{OS}

Your AI service will go down. Not might. Will. The fallback chain pattern keeps your app running with graceful degradation at every level.

Error Handling for AI Apps: The Fallback Chain

Your AI service will go down. Not might. Will.

This is not pessimism. This is engineering reality. OpenAI has had outages. Anthropic has had outages. Every API on the internet has outages, planned and unplanned. The question you need to answer before your first production deploy is not "if" but "what do users see when the AI goes down?"

If your answer is a frozen UI with a spinner that never resolves, you have failed your users at the most basic level. Your application should continue to function, with reduced capability, even when every AI service is simultaneously unavailable.

Building that resilience is the subject of this article.

The Fallback Chain Principle

Think of your AI features as a stack of progressively simpler alternatives. When the highest tier fails, drop to the next. When that fails, drop further. The user's experience degrades gracefully rather than collapsing entirely.

Level	Strategy	Speed	Cost	Capability
1	Primary AI model (Claude Sonnet, GPT-4o)	1-3s	High	Full
2	Smaller, faster model (Claude Haiku, GPT-4o-mini)	200-500ms	Low	80% of use cases
3	Semantic cache (similar past responses)	< 50ms	Near-zero	Known queries
4	Rule-based templates	Instant	Zero	Generic but functional
5	Honest degradation message	Instant	Zero	User informed

Each level catches the failure of the level above it. Most users will never reach level 3. Some will. A few will hit level 5. But none will see a broken application.

typescript

// Fallback chain implementation
async function generateWithFallback(
  prompt: string,
  context: RequestContext
): Promise<GenerationResult> {
  // Level 1: Primary model
  try {
    return await generateWithClaude(prompt, 'claude-sonnet-4-20250514', context);
  } catch (err) {
    logFailure('primary_model', err, context);
  }

  // Level 2: Faster model  
  try {
    return await generateWithClaude(prompt, 'claude-haiku-4-20250514', context);
  } catch (err) {
    logFailure('fallback_model', err, context);
  }

  // Level 3: Semantic cache
  const cached = await semanticCache.findSimilar(prompt, threshold: 0.92);
  if (cached) {
    return { content: cached.response, source: 'cache', quality: 'cached' };
  }

  // Level 4: Rule-based default
  const ruleBasedResponse = generateRuleBased(prompt, context.featureId);
  if (ruleBasedResponse) {
    return { content: ruleBasedResponse, source: 'rule-based', quality: 'template' };
  }

  // Level 5: Honest degradation
  return {
    content: null,
    source: 'unavailable',
    quality: 'none',
    message: 'AI features are temporarily unavailable. You can continue manually.',
  };
}

Input Validation: Stopping Problems at the Door

Half of AI errors are caused by bad inputs, not bad models.

A user pastes a 50,000-character document into a field that feeds an AI prompt. The combined prompt exceeds the model's context window. The API returns a 400 error. Without input validation, your user sees an opaque error message and has no idea what went wrong.

Validate everything before it reaches the model:

typescript

import { z } from 'zod';

// Strict input validation schema
const aiRequestSchema = z.object({
  // Length limits that account for your prompt overhead
  userContent: z.string()
    .min(1, 'Content cannot be empty')
    .max(8000, 'Content exceeds maximum length of 8,000 characters'),

  // Context data similarly bounded
  documentContext: z.string()
    .max(16000, 'Document context exceeds maximum length')
    .optional(),

  // Enum types prevent injection attempts
  tone: z.enum(['professional', 'casual', 'technical', 'friendly']),

  language: z.string()
    .regex(/^[a-z]{2}(-[A-Z]{2})?$/, 'Invalid language code')
    .default('en'),
});

type AIRequest = z.infer<typeof aiRequestSchema>;

async function handleAIRequest(rawInput: unknown): Promise<Response> {
  const parseResult = aiRequestSchema.safeParse(rawInput);

  if (!parseResult.success) {
    return Response.json({
      error: {
        code: 'VALIDATION_ERROR',
        message: 'Invalid request',
        details: parseResult.error.errors.map(e => ({
          field: e.path.join('.'),
          message: e.message,
        })),
      }
    }, { status: 400 });
  }

  // Input is now type-safe and validated
  return processValidatedRequest(parseResult.data);
}

Beyond length limits, check for content that is likely to cause problems:

Extremely high repetition (likely a copy-paste accident or test)
Content in unexpected character encodings
Content that looks like a system prompt injection attempt
User-provided context that is obviously malformed

Return helpful, specific error messages. "Your content is 15,247 characters. The maximum for this feature is 8,000 characters. You can summarize your content or use the full document upload feature." is infinitely more useful than "Error: 400 Bad Request."

Output Sanitization: Treating AI Like Untrusted Input

AI model outputs are untrusted input. Treat them with the same suspicion you apply to user form submissions.

This is not hypothetical. AI models sometimes:

Include HTML or script tags in text output
Produce JSON with syntax errors that fail to parse
Include content from the system prompt in their responses
Reference other users' data from training
Generate content that violates your application's policies

Sanitize every AI output before it reaches your users:

typescript

import DOMPurify from 'isomorphic-dompurify';

interface SanitizedOutput {
  content: string;
  wasModified: boolean;
  violations: string[];
}

function sanitizeAIOutput(
  rawOutput: string,
  config: SanitizationConfig
): SanitizedOutput {
  const violations: string[] = [];
  let content = rawOutput;

  // Remove any HTML if plain text expected
  if (config.expectedFormat === 'plain-text') {
    const cleaned = DOMPurify.sanitize(rawOutput, { ALLOWED_TAGS: [] });
    if (cleaned !== rawOutput) {
      violations.push('HTML content detected and removed');
      content = cleaned;
    }
  }

  // If HTML is allowed, still sanitize
  if (config.expectedFormat === 'html') {
    content = DOMPurify.sanitize(rawOutput, {
      ALLOWED_TAGS: ['p', 'br', 'strong', 'em', 'ul', 'ol', 'li', 'code', 'pre'],
      ALLOWED_ATTR: [],
    });
  }

  // Check for system prompt leakage
  if (config.systemPromptKeywords) {
    for (const keyword of config.systemPromptKeywords) {
      if (content.toLowerCase().includes(keyword.toLowerCase())) {
        violations.push(`System prompt keyword detected: ${keyword}`);
        content = content.replace(new RegExp(keyword, 'gi'), '[redacted]');
      }
    }
  }

  // Enforce max output length
  if (config.maxLength && content.length > config.maxLength) {
    violations.push(`Output truncated from ${content.length} to ${config.maxLength} characters`);
    content = content.substring(0, config.maxLength);
  }

  return {
    content,
    wasModified: violations.length > 0,
    violations,
  };
}

Log every violation. A pattern of violations indicates a prompt that needs adjustment, a model that is behaving unexpectedly, or a user attempting prompt injection.

Circuit Breakers: Stopping the Cascade

When an AI service starts failing, the worst response is to keep hammering it with requests. Response times increase. Timeout errors accumulate. Users wait longer for responses that never come. Server resources pile up with pending requests.

A circuit breaker detects sustained failures and stops the cascade:

typescript

class AICircuitBreaker {
  private failureCount = 0;
  private lastFailureTime = 0;
  private state: 'closed' | 'open' | 'half-open' = 'closed';

  private readonly FAILURE_THRESHOLD = 5;
  private readonly RECOVERY_TIMEOUT_MS = 30_000;
  private readonly SUCCESS_THRESHOLD = 2;
  private successCount = 0;

  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      const timeSinceLastFailure = Date.now() - this.lastFailureTime;
      if (timeSinceLastFailure < this.RECOVERY_TIMEOUT_MS) {
        throw new CircuitOpenError('AI service circuit breaker is open');
      }
      // Transition to half-open: allow one test request
      this.state = 'half-open';
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (err) {
      this.onFailure();
      throw err;
    }
  }

  private onSuccess() {
    this.failureCount = 0;
    if (this.state === 'half-open') {
      this.successCount++;
      if (this.successCount >= this.SUCCESS_THRESHOLD) {
        this.state = 'closed';
        this.successCount = 0;
      }
    }
  }

  private onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.FAILURE_THRESHOLD) {
      this.state = 'open';
    }
  }

  getState() { return this.state; }
}

When the circuit is open, all requests immediately fail and route to the fallback chain. Every 30 seconds, one probe request goes through. If it succeeds twice, the circuit closes and normal operation resumes.

Without circuit breakers, an AI service outage becomes a full application outage because every request piles up waiting for timeouts.

Retry Logic: The Careful Kind

Some AI failures are transient. Rate limiting errors. Brief network interruptions. 503 responses during rolling restarts.

Retrying these can recover gracefully. But naive retry logic makes things worse.

Exponential backoff. Do not retry immediately. Wait before retrying, and wait longer with each attempt. Immediate retries on a rate-limited service just burn through your retry budget faster.

Jitter. If all clients retry at the same exponential intervals, they hit the service simultaneously. Add random jitter to spread retries.

Retry budgets. Retries consume resources. Bound the total time you are willing to spend on retries, not just the number of attempts.

typescript

async function withRetry<T>(
  fn: () => Promise<T>,
  options: RetryOptions = {}
): Promise<T> {
  const {
    maxAttempts = 3,
    baseDelayMs = 500,
    maxDelayMs = 10_000,
    retryableErrors = ['rate_limit', 'overloaded', 'timeout'],
  } = options;

  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (err) {
      const isRetryable = retryableErrors.some(code =>
        err instanceof Error && err.message.includes(code)
      );

      if (!isRetryable || attempt === maxAttempts) throw err;

      // Exponential backoff with jitter
      const exponentialDelay = baseDelayMs * Math.pow(2, attempt - 1);
      const jitter = Math.random() * baseDelayMs;
      const delay = Math.min(exponentialDelay + jitter, maxDelayMs);

      await sleep(delay);
    }
  }

  throw new Error('Max retry attempts exceeded');
}

Error Messages That Actually Help

Every failure your users encounter needs to communicate three things.

What happened. Not "An error occurred." "We could not generate your summary right now."

What they can do. "Try again in a few minutes, or use the manual editor below."

That their data is safe. "Your document has been saved. Nothing was lost."

typescript

// Typed error messages by failure mode
const AI_ERROR_MESSAGES: Record<string, UserFacingError> = {
  service_unavailable: {
    title: 'AI features temporarily unavailable',
    description: 'Our AI service is experiencing issues. Your work has been saved.',
    action: 'You can continue editing manually, or try again in a few minutes.',
    canRetry: true,
  },
  rate_limit: {
    title: 'Too many requests',
    description: 'You have reached your hourly AI usage limit.',
    action: 'Your limit resets in {resetTime}. Upgrade your plan for higher limits.',
    canRetry: false,
  },
  content_too_long: {
    title: 'Content is too long',
    description: 'Your content exceeds the maximum length for this feature.',
    action: 'Shorten your content to under {maxLength} characters, or use the full document feature.',
    canRetry: false,
  },
  output_invalid: {
    title: 'AI response was invalid',
    description: 'The AI generated an unexpected response. This has been logged.',
    action: 'Please try again. If this keeps happening, contact support.',
    canRetry: true,
  },
};

Never show raw error messages from AI APIs. Never show stack traces. Every error message is a product decision. Treat it like one.

Solid error handling pairs naturally with monitoring that catches patterns before they become crises, and deployment automation that rolls back before users are widely affected.

FAQ

Q: How should AI applications handle errors?

AI applications use a fallback chain: try the primary AI model, fall back to a simpler model if it fails, fall back to cached responses, then fall back to a graceful error message. Every AI call should have timeout handling, retry logic with exponential backoff, and circuit breakers to prevent cascading failures.

Q: What is the fallback chain pattern for AI errors?

The fallback chain progressively degrades: primary model (Claude Opus) fails, try secondary model (Claude Sonnet), then try cached response, then show a helpful error message. Each level provides a worse but acceptable user experience. The user should never see a raw error from an AI model failure.

Q: How do you handle AI model timeouts and rate limits?

Handle timeouts with configurable deadline per request, automatic retries with exponential backoff, and fallback to alternative models. Handle rate limits with request queuing, prioritization, and graceful degradation. Pre-cache frequent responses to serve during rate limit periods.

Sources

The Fallback Chain Principle

Level	Strategy	Speed	Cost	Capability
1	Primary AI model (Claude Sonnet, GPT-4o)	1-3s	High	Full
2	Smaller, faster model (Claude Haiku, GPT-4o-mini)	200-500ms	Low	80% of use cases
3	Semantic cache (similar past responses)	< 50ms	Near-zero	Known queries
4	Rule-based templates	Instant	Zero	Generic but functional
5	Honest degradation message	Instant	Zero	User informed

Each level catches the failure of the level above it. Most users will never reach level 3. Some will. A few will hit level 5. But none will see a broken application.

typescript

// Fallback chain implementation
async function generateWithFallback(
  prompt: string,
  context: RequestContext
): Promise<GenerationResult> {
  // Level 1: Primary model
  try {
    return await generateWithClaude(prompt, 'claude-sonnet-4-20250514', context);
  } catch (err) {
    logFailure('primary_model', err, context);
  }

  // Level 2: Faster model  
  try {
    return await generateWithClaude(prompt, 'claude-haiku-4-20250514', context);
  } catch (err) {
    logFailure('fallback_model', err, context);
  }

  // Level 3: Semantic cache
  const cached = await semanticCache.findSimilar(prompt, threshold: 0.92);
  if (cached) {
    return { content: cached.response, source: 'cache', quality: 'cached' };
  }

  // Level 4: Rule-based default
  const ruleBasedResponse = generateRuleBased(prompt, context.featureId);
  if (ruleBasedResponse) {
    return { content: ruleBasedResponse, source: 'rule-based', quality: 'template' };
  }

  // Level 5: Honest degradation
  return {
    content: null,
    source: 'unavailable',
    quality: 'none',
    message: 'AI features are temporarily unavailable. You can continue manually.',
  };
}

Input Validation: Stopping Problems at the Door

Half of AI errors are caused by bad inputs, not bad models.

Validate everything before it reaches the model:

typescript

import { z } from 'zod';

// Strict input validation schema
const aiRequestSchema = z.object({
  // Length limits that account for your prompt overhead
  userContent: z.string()
    .min(1, 'Content cannot be empty')
    .max(8000, 'Content exceeds maximum length of 8,000 characters'),

  // Context data similarly bounded
  documentContext: z.string()
    .max(16000, 'Document context exceeds maximum length')
    .optional(),

  // Enum types prevent injection attempts
  tone: z.enum(['professional', 'casual', 'technical', 'friendly']),

  language: z.string()
    .regex(/^[a-z]{2}(-[A-Z]{2})?$/, 'Invalid language code')
    .default('en'),
});

type AIRequest = z.infer<typeof aiRequestSchema>;

async function handleAIRequest(rawInput: unknown): Promise<Response> {
  const parseResult = aiRequestSchema.safeParse(rawInput);

  if (!parseResult.success) {
    return Response.json({
      error: {
        code: 'VALIDATION_ERROR',
        message: 'Invalid request',
        details: parseResult.error.errors.map(e => ({
          field: e.path.join('.'),
          message: e.message,
        })),
      }
    }, { status: 400 });
  }

  // Input is now type-safe and validated
  return processValidatedRequest(parseResult.data);
}

Beyond length limits, check for content that is likely to cause problems:

Extremely high repetition (likely a copy-paste accident or test)
Content in unexpected character encodings
Content that looks like a system prompt injection attempt
User-provided context that is obviously malformed

Output Sanitization: Treating AI Like Untrusted Input

AI model outputs are untrusted input. Treat them with the same suspicion you apply to user form submissions.

This is not hypothetical. AI models sometimes:

Include HTML or script tags in text output
Produce JSON with syntax errors that fail to parse
Include content from the system prompt in their responses
Reference other users' data from training
Generate content that violates your application's policies

Sanitize every AI output before it reaches your users:

typescript

import DOMPurify from 'isomorphic-dompurify';

interface SanitizedOutput {
  content: string;
  wasModified: boolean;
  violations: string[];
}

function sanitizeAIOutput(
  rawOutput: string,
  config: SanitizationConfig
): SanitizedOutput {
  const violations: string[] = [];
  let content = rawOutput;

  // Remove any HTML if plain text expected
  if (config.expectedFormat === 'plain-text') {
    const cleaned = DOMPurify.sanitize(rawOutput, { ALLOWED_TAGS: [] });
    if (cleaned !== rawOutput) {
      violations.push('HTML content detected and removed');
      content = cleaned;
    }
  }

  // If HTML is allowed, still sanitize
  if (config.expectedFormat === 'html') {
    content = DOMPurify.sanitize(rawOutput, {
      ALLOWED_TAGS: ['p', 'br', 'strong', 'em', 'ul', 'ol', 'li', 'code', 'pre'],
      ALLOWED_ATTR: [],
    });
  }

  // Check for system prompt leakage
  if (config.systemPromptKeywords) {
    for (const keyword of config.systemPromptKeywords) {
      if (content.toLowerCase().includes(keyword.toLowerCase())) {
        violations.push(`System prompt keyword detected: ${keyword}`);
        content = content.replace(new RegExp(keyword, 'gi'), '[redacted]');
      }
    }
  }

  // Enforce max output length
  if (config.maxLength && content.length > config.maxLength) {
    violations.push(`Output truncated from ${content.length} to ${config.maxLength} characters`);
    content = content.substring(0, config.maxLength);
  }

  return {
    content,
    wasModified: violations.length > 0,
    violations,
  };
}

Log every violation. A pattern of violations indicates a prompt that needs adjustment, a model that is behaving unexpectedly, or a user attempting prompt injection.

Circuit Breakers: Stopping the Cascade

A circuit breaker detects sustained failures and stops the cascade:

typescript

class AICircuitBreaker {
  private failureCount = 0;
  private lastFailureTime = 0;
  private state: 'closed' | 'open' | 'half-open' = 'closed';

  private readonly FAILURE_THRESHOLD = 5;
  private readonly RECOVERY_TIMEOUT_MS = 30_000;
  private readonly SUCCESS_THRESHOLD = 2;
  private successCount = 0;

  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      const timeSinceLastFailure = Date.now() - this.lastFailureTime;
      if (timeSinceLastFailure < this.RECOVERY_TIMEOUT_MS) {
        throw new CircuitOpenError('AI service circuit breaker is open');
      }
      // Transition to half-open: allow one test request
      this.state = 'half-open';
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (err) {
      this.onFailure();
      throw err;
    }
  }

  private onSuccess() {
    this.failureCount = 0;
    if (this.state === 'half-open') {
      this.successCount++;
      if (this.successCount >= this.SUCCESS_THRESHOLD) {
        this.state = 'closed';
        this.successCount = 0;
      }
    }
  }

  private onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.FAILURE_THRESHOLD) {
      this.state = 'open';
    }
  }

  getState() { return this.state; }
}

Without circuit breakers, an AI service outage becomes a full application outage because every request piles up waiting for timeouts.

Retry Logic: The Careful Kind

Some AI failures are transient. Rate limiting errors. Brief network interruptions. 503 responses during rolling restarts.

Retrying these can recover gracefully. But naive retry logic makes things worse.

Exponential backoff. Do not retry immediately. Wait before retrying, and wait longer with each attempt. Immediate retries on a rate-limited service just burn through your retry budget faster.

Jitter. If all clients retry at the same exponential intervals, they hit the service simultaneously. Add random jitter to spread retries.

Retry budgets. Retries consume resources. Bound the total time you are willing to spend on retries, not just the number of attempts.

typescript

async function withRetry<T>(
  fn: () => Promise<T>,
  options: RetryOptions = {}
): Promise<T> {
  const {
    maxAttempts = 3,
    baseDelayMs = 500,
    maxDelayMs = 10_000,
    retryableErrors = ['rate_limit', 'overloaded', 'timeout'],
  } = options;

  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (err) {
      const isRetryable = retryableErrors.some(code =>
        err instanceof Error && err.message.includes(code)
      );

      if (!isRetryable || attempt === maxAttempts) throw err;

      // Exponential backoff with jitter
      const exponentialDelay = baseDelayMs * Math.pow(2, attempt - 1);
      const jitter = Math.random() * baseDelayMs;
      const delay = Math.min(exponentialDelay + jitter, maxDelayMs);

      await sleep(delay);
    }
  }

  throw new Error('Max retry attempts exceeded');
}

Error Messages That Actually Help

Every failure your users encounter needs to communicate three things.

What happened. Not "An error occurred." "We could not generate your summary right now."

What they can do. "Try again in a few minutes, or use the manual editor below."

That their data is safe. "Your document has been saved. Nothing was lost."

typescript

// Typed error messages by failure mode
const AI_ERROR_MESSAGES: Record<string, UserFacingError> = {
  service_unavailable: {
    title: 'AI features temporarily unavailable',
    description: 'Our AI service is experiencing issues. Your work has been saved.',
    action: 'You can continue editing manually, or try again in a few minutes.',
    canRetry: true,
  },
  rate_limit: {
    title: 'Too many requests',
    description: 'You have reached your hourly AI usage limit.',
    action: 'Your limit resets in {resetTime}. Upgrade your plan for higher limits.',
    canRetry: false,
  },
  content_too_long: {
    title: 'Content is too long',
    description: 'Your content exceeds the maximum length for this feature.',
    action: 'Shorten your content to under {maxLength} characters, or use the full document feature.',
    canRetry: false,
  },
  output_invalid: {
    title: 'AI response was invalid',
    description: 'The AI generated an unexpected response. This has been logged.',
    action: 'Please try again. If this keeps happening, contact support.',
    canRetry: true,
  },
};

Never show raw error messages from AI APIs. Never show stack traces. Every error message is a product decision. Treat it like one.

Solid error handling pairs naturally with monitoring that catches patterns before they become crises, and deployment automation that rolls back before users are widely affected.

FAQ

Q: How should AI applications handle errors?

Q: What is the fallback chain pattern for AI errors?

Q: How do you handle AI model timeouts and rate limits?

Error Handling for AI Apps: The Fallback Chain

The Fallback Chain Principle

Input Validation: Stopping Problems at the Door

Output Sanitization: Treating AI Like Untrusted Input

Circuit Breakers: Stopping the Cascade

Retry Logic: The Careful Kind

Error Messages That Actually Help

FAQ

Sources

Further Reading

Related Articles

Want to Implement This?

Error Handling for AI Apps: The Fallback Chain

The Fallback Chain Principle

Input Validation: Stopping Problems at the Door

Output Sanitization: Treating AI Like Untrusted Input

Circuit Breakers: Stopping the Cascade

Retry Logic: The Careful Kind

Error Messages That Actually Help

FAQ

Sources

Further Reading

Related Articles

Want to Implement This?