Loading...
Loading...
Weekly AI insights —
Real strategies, no fluff. Unsubscribe anytime.
Written by Gareth Simono, Founder and CEO of Agentik {OS}. Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise platforms. Gareth orchestrates 267 specialized AI agents to deliver production software 10x faster than traditional development teams.
Founder & CEO, Agentik {OS}
Most multi-agent demos crumble in production. Here's how to build orchestration that survives real workloads, error storms, and 3am failures.

One agent is a tool. Multiple agents working together are a team. And the difference between a productive team and a chaotic pile of competing individuals comes down to one thing: orchestration.
I run multi-agent systems in production. Not demos. Not proofs of concept that only work when the conditions are perfect. Systems that handle real workloads, serve real users, and need to keep running at 3am on a Sunday when nobody is watching and the API is rate-limiting and one of the agents is in a confused state.
The gap between "multi-agent demo that kills on stage" and "multi-agent system that actually works in production" is enormous. This is everything I've learned about closing that gap.
Every multi-agent system starts with a temptation: build one super-agent that does everything. Give it every tool, every piece of context, every capability the system needs. One prompt to rule them all.
This fails. Not always immediately. Usually at scale.
A single agent handling 50 tools makes poor tool selection decisions. The model has to reason about all 50 tools at once when selecting which one to use for each step. Tool selection quality degrades with the number of available tools.
A single agent handling every type of work conflates contexts. When the same agent writes code, reviews it, deploys it, and monitors it, the context from writing bleeds into the reviewing. The review is less critical because the agent knows what the code "should" be doing.
A single agent handling long workflows loses track of state. By the 15th step of a 20-step workflow, the early context has faded and the agent makes decisions without full awareness of what happened at the beginning.
Specialized agents solve all three problems. An agent optimized for code generation has 5-10 relevant tools and a focused system prompt. It makes better decisions because it's reasoning about a smaller, more relevant context. A separate review agent has no emotional attachment to the code it's reviewing because it didn't write it. A dedicated deployment agent doesn't carry the context of the code generation phase.
The right analogy is a software development team, not a superhero developer. Teams produce better output than individuals on complex problems because specialization and independent verification catch what generalism misses.
The orchestrator is the most critical component in a multi-agent system. It's also the component most tutorials gloss over.
A production orchestrator handles four responsibilities:
Take complex requests and break them into executable subtasks. This needs to be systematic, not improvised.
// src/orchestrator/task-decomposer.ts
import Anthropic from '@anthropic-ai/sdk'
const anthropic = new Anthropic()
interface Task {
id: string
type: 'code_generation' | 'code_review' | 'testing' | 'documentation' | 'deployment'
description: string
dependencies: string[] // IDs of tasks that must complete first
assignedAgent: string
status: 'pending' | 'in_progress' | 'completed' | 'failed'
result?: unknown
error?: string
}
export async function decomposeRequest(request: string, context: {
projectType: string
existingFeatures: string[]
techStack: string[]
}): Promise<Task[]> {
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-5',
system: `You are a technical project manager who breaks down feature requests into executable subtasks.
Output a JSON array of tasks. Each task has:
- id: unique string
- type: one of [code_generation, code_review, testing, documentation, deployment]
- description: specific, executable description (not vague)
- dependencies: array of task ids that must complete before this task
- assignedAgent: which specialist agent handles this
Rules:
- Be specific. "Generate the user profile schema" not "work on database"
- Make dependencies explicit. A review task always depends on its generation task
- Testing always depends on code generation
- Deployment always depends on testing`,
messages: [
{
role: 'user',
content: `Decompose this feature request into tasks:
${request}
Project context:
- Type: ${context.projectType}
- Tech stack: ${context.techStack.join(', ')}
- Existing features: ${context.existingFeatures.join(', ')}`,
},
],
})
const content = response.content[0]
if (content.type !== 'text') throw new Error('Expected text response')
return JSON.parse(content.text) as Task[]
}Match each subtask to the appropriate specialist agent:
// src/orchestrator/agent-router.ts
const AGENT_REGISTRY = {
code_generation: {
model: 'claude-opus-4-5',
tools: ['read_file', 'write_file', 'search_codebase', 'run_linter'],
systemPrompt: 'You are a senior software engineer. Write clean, tested, production-ready code.',
},
code_review: {
model: 'claude-opus-4-5',
tools: ['read_file', 'search_codebase', 'check_types'],
systemPrompt: 'You are a code reviewer. Be critical. Find bugs, security issues, and design problems. Do not be lenient because the code compiles.',
},
testing: {
model: 'claude-sonnet-4-5',
tools: ['read_file', 'write_file', 'run_tests', 'check_coverage'],
systemPrompt: 'You are a QA engineer. Write comprehensive tests. Aim for 95% coverage. Test happy paths and failure cases equally.',
},
documentation: {
model: 'claude-haiku-3-5', // Cheaper for doc generation
tools: ['read_file', 'write_file'],
systemPrompt: 'You write clear, concise technical documentation for developers.',
},
deployment: {
model: 'claude-sonnet-4-5',
tools: ['run_build', 'run_tests', 'deploy', 'check_health', 'rollback'],
systemPrompt: 'You are a DevOps engineer. Deploy carefully. Verify before and after. Roll back at the first sign of problems.',
},
}
export function routeTask(task: Task) {
const agent = AGENT_REGISTRY[task.type]
if (!agent) throw new Error(`No agent registered for task type: ${task.type}`)
return agent
}Track progress across all subtasks. This is where most tutorial implementations fail. They don't handle concurrent execution, partial failures, or state reconstruction after interruption.
// src/orchestrator/state-manager.ts
import { Redis } from '@upstash/redis'
const redis = Redis.fromEnv()
interface WorkflowState {
workflowId: string
request: string
tasks: Task[]
startedAt: Date
completedAt?: Date
status: 'running' | 'completed' | 'failed' | 'partial'
}
export class WorkflowStateManager {
async createWorkflow(workflowId: string, request: string, tasks: Task[]): Promise<void> {
const state: WorkflowState = {
workflowId,
request,
tasks,
startedAt: new Date(),
status: 'running',
}
// Persist with 24-hour TTL
await redis.set(`workflow:${workflowId}`, JSON.stringify(state), { ex: 86400 })
}
async updateTaskStatus(
workflowId: string,
taskId: string,
status: Task['status'],
result?: unknown,
error?: string
): Promise<void> {
const state = await this.getWorkflow(workflowId)
if (!state) throw new Error(`Workflow ${workflowId} not found`)
const task = state.tasks.find(t => t.id === taskId)
if (!task) throw new Error(`Task ${taskId} not found in workflow ${workflowId}`)
task.status = status
if (result !== undefined) task.result = result
if (error !== undefined) task.error = error
await redis.set(`workflow:${workflowId}`, JSON.stringify(state), { ex: 86400 })
}
async getReadyTasks(workflowId: string): Promise<Task[]> {
const state = await this.getWorkflow(workflowId)
if (!state) return []
const completedTaskIds = new Set(
state.tasks.filter(t => t.status === 'completed').map(t => t.id)
)
return state.tasks.filter(task =>
task.status === 'pending' &&
task.dependencies.every(depId => completedTaskIds.has(depId))
)
}
private async getWorkflow(workflowId: string): Promise<WorkflowState | null> {
const data = await redis.get<string>(`workflow:${workflowId}`)
return data ? JSON.parse(data) : null
}
}In a multi-agent system, errors propagate. Agent A fails. Its output is missing. Agent B receives incomplete input. Agent B's output is wrong. Agent C builds on wrong output. By the time you detect the problem, four agents have wasted work and the user has been waiting for 20 minutes.
Three-level error handling prevents cascade failures.
Each agent validates inputs before starting:
// Every agent validates its input before execution
async function validateAgentInput(task: Task, context: Record<string, unknown>): Promise<void> {
const required = getRequiredContextFields(task.type)
for (const field of required) {
if (context[field] === undefined || context[field] === null) {
throw new Error(
`Task ${task.id} (${task.type}) missing required context: ${field}. ` +
`This dependency task may have failed.`
)
}
}
// Validate output quality from dependency tasks
if (task.type === 'code_review' && context.generatedCode) {
const code = context.generatedCode as string
if (code.length < 10) {
throw new Error(`Code generation produced suspiciously short output: ${code}`)
}
}
}Check each agent's output before passing it downstream:
// Quality gate before passing output to downstream agents
async function validateAgentOutput(task: Task, output: unknown): Promise<boolean> {
switch (task.type) {
case 'code_generation':
// Code must be non-empty and not contain obvious errors
const code = output as string
if (!code || code.length < 20) return false
if (code.includes('TODO: implement') && !code.includes('// TODO')) return false
return true
case 'testing':
// Tests must exist and pass
const testResult = output as { passed: number; failed: number; coverage: number }
if (testResult.failed > 0) return false
if (testResult.coverage < 80) return false
return true
case 'code_review':
// Review must not contain CRITICAL issues
const review = output as { severity: 'CRITICAL' | 'HIGH' | 'MEDIUM' | 'LOW'; issues: string[] }
return review.severity !== 'CRITICAL'
default:
return true
}
}Circuit breakers remove broken agents from rotation:
// Track agent health and remove unreliable agents
class AgentHealthMonitor {
private failureCount = new Map<string, number>()
private disabledAgents = new Set<string>()
private readonly FAILURE_THRESHOLD = 3
recordFailure(agentType: string): void {
const count = (this.failureCount.get(agentType) ?? 0) + 1
this.failureCount.set(agentType, count)
if (count >= this.FAILURE_THRESHOLD) {
this.disabledAgents.add(agentType)
console.error(`Agent ${agentType} disabled after ${count} consecutive failures`)
this.scheduleReenablement(agentType, 5 * 60 * 1000) // Try again in 5 minutes
}
}
recordSuccess(agentType: string): void {
this.failureCount.set(agentType, 0)
this.disabledAgents.delete(agentType)
}
isAvailable(agentType: string): boolean {
return !this.disabledAgents.has(agentType)
}
private scheduleReenablement(agentType: string, delayMs: number): void {
setTimeout(() => {
this.disabledAgents.delete(agentType)
this.failureCount.set(agentType, 0)
console.log(`Agent ${agentType} re-enabled for testing`)
}, delayMs)
}
}Five agents running concurrently, each making LLM API calls, executing tools, maintaining context. At 10 concurrent workflows, that's up to 50 simultaneous API calls. Without resource management, you hit rate limits, memory limits, and API quotas in ways that are hard to diagnose.
// src/orchestrator/resource-manager.ts
import pLimit from 'p-limit'
export class ResourceManager {
// Limit concurrent LLM API calls
private apiCallLimiter = pLimit(10)
// Limit concurrent tool executions (they use more memory)
private toolExecutionLimiter = pLimit(5)
// Token budget per workflow (prevents runaway costs)
private tokenBudgets = new Map<string, number>()
private tokenUsage = new Map<string, number>()
async executeWithRateLimit<T>(
fn: () => Promise<T>,
type: 'api_call' | 'tool_execution'
): Promise<T> {
const limiter = type === 'api_call'
? this.apiCallLimiter
: this.toolExecutionLimiter
return limiter(() => fn())
}
setTokenBudget(workflowId: string, budget: number): void {
this.tokenBudgets.set(workflowId, budget)
this.tokenUsage.set(workflowId, 0)
}
recordTokenUsage(workflowId: string, tokens: number): void {
const current = this.tokenUsage.get(workflowId) ?? 0
this.tokenUsage.set(workflowId, current + tokens)
}
isOverBudget(workflowId: string): boolean {
const budget = this.tokenBudgets.get(workflowId) ?? Infinity
const used = this.tokenUsage.get(workflowId) ?? 0
return used > budget
}
}Multi-agent systems fail in non-obvious ways. An agent produces subtly incorrect output that passes all quality gates. The mistake propagates through three downstream agents. By the time the final output reaches the user, the error source is completely obscured.
Log everything:
// Every agent invocation produces a structured trace
interface AgentTrace {
workflowId: string
taskId: string
agentType: string
startTime: Date
endTime: Date
durationMs: number
inputSummary: string
outputSummary: string
tokenUsage: { input: number; output: number }
success: boolean
error?: string
}
async function traceAgentExecution(
workflowId: string,
task: Task,
fn: () => Promise<unknown>
): Promise<unknown> {
const startTime = new Date()
let result: unknown
let error: Error | undefined
try {
result = await fn()
return result
} catch (e) {
error = e instanceof Error ? e : new Error(String(e))
throw error
} finally {
const trace: AgentTrace = {
workflowId,
taskId: task.id,
agentType: task.type,
startTime,
endTime: new Date(),
durationMs: Date.now() - startTime.getTime(),
inputSummary: summarizeForLog(task.description),
outputSummary: result ? summarizeForLog(result) : 'no output',
tokenUsage: { input: 0, output: 0 }, // populated from LLM response
success: !error,
error: error?.message,
}
await persistTrace(trace)
}
}With complete traces, debugging a production issue means following the trace: which agent ran first, what it produced, what the next agent received, where the output diverged from expectations. It's like having a complete runtime replay.
Don't launch ten agents on day one. Start with two agents and an orchestrator: one specialist and one reviewer. Get coordination right. Get error handling right. Get observability right.
Add agents one at a time. Each new agent adds orchestration complexity. Validate that the capability justifies the complexity.
The best multi-agent systems I've seen in production run between three and seven agents. Enough specialization to produce high-quality output. Few enough to be debuggable when something goes wrong.
For the communication protocols between these agents, agent-to-agent communication patterns covers the message passing, shared memory, and negotiation patterns that keep coordinated agents productive rather than chaotic.
Q: What is multi-agent orchestration?
Multi-agent orchestration is the coordination of multiple specialized AI agents working together on complex tasks. Instead of one general-purpose agent, specialized agents handle specific domains (coding, testing, deployment, review) and communicate through defined protocols. An orchestrator manages task distribution, dependency resolution, and conflict handling.
Q: How do you coordinate multiple AI agents in production?
Production multi-agent systems use a central orchestrator that decomposes tasks and assigns them to specialized agents, standardized communication protocols (like MCP) for agent-to-agent messaging, and shared context stores that maintain project state across agent interactions.
Q: What are the common patterns for multi-agent systems?
The five core patterns are prompt chaining (sequential agents with validation gates), routing (directing tasks to the right specialist), parallelization (multiple agents simultaneously), orchestrator-workers (coordinator with specialists), and evaluator-optimizer (self-improving loops with feedback).
Q: When should you use multi-agent systems vs a single agent?
Use multi-agent systems when tasks require different expertise domains, benefit from parallel execution, or need specialized tool access. A single agent suffices for focused tasks with clear scope. Multi-agent coordination overhead is only justified when task complexity exceeds what one agent can handle.
Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise. Gareth built Agentik {OS} to prove that one person with the right AI system can outperform an entire traditional development team. He has personally architected and shipped 7+ production applications using AI-first workflows.

Agent-to-Agent Communication: Protocols That Work
Two agents on the same task without communication is worse than one alone. Here's how to design protocols that prevent chaos and enable collaboration.

Production Agent Teams: From Demo to Reality
The demo worked perfectly. Three weeks into production, they pulled it. The gap between prototype and production is always the same set of problems.

Scaling Agent Systems: Architecture That Survives Growth
Every agent system hits a wall. The architecture decisions made on day one determine whether that wall arrives at 1,000 users or 1,000,000.
Stop reading about AI and start building with it. Book a free discovery call and see how AI agents can accelerate your business.