Loading...
Loading...
Weekly AI insights —
Real strategies, no fluff. Unsubscribe anytime.
Written by Gareth Simono, Founder and CEO of Agentik {OS}. Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise platforms. Gareth orchestrates 267 specialized AI agents to deliver production software 10x faster than traditional development teams.
Founder & CEO, Agentik {OS}
267 agents, zero visibility. When agents crash, tasks stall, and costs spike, you find out too late. So we built Nerve -- a Convex-backed real-time nervous system with 40 HTTP endpoints, heartbeat monitoring, dependency-aware task sequencing, and a /debug command that hunts, diagnoses, fixes, and verifies bugs automatically.

We run 267 specialized AI agents across six departments. Not as a demo. As the actual operating system of a cybersecurity consulting firm. And for the first eight months, we had no idea what any of them were doing.
That sounds absurd in retrospect. We had built an entire multi-agent orchestration layer -- 12 Matrix-themed agents coordinated by an ORACLE router, with MORPHEUS executing tasks, KEYMAKER building dependency graphs, SERAPH auditing code quality. The architecture was sophisticated. The observability was nonexistent.
When an agent crashed, we found out because a task never completed. When costs spiked, we found out at the end of the month. When two agents conflicted on the same file, we found out when the build broke. Every failure was discovered after the damage was done.
So we built Nerve.
Running one AI agent is an AI problem. Running 267 agents across parallel workflows is a distributed systems problem. And distributed systems without observability are just expensive random number generators.
Here is what was actually happening before Nerve:
Stale agents consumed resources indefinitely. An agent would hit a rate limit, hang on a retry loop, and sit there burning context window tokens for hours. We had no heartbeat system. No timeout detection. No way to know an agent was stuck until someone manually checked.
Cost tracking was retroactive. We knew roughly what we spent per month. We had no idea what we spent per session, per agent, or per model. A single runaway agent could burn through a hundred dollars before anyone noticed. Our cost data lived in billing dashboards updated daily, not in real-time streams.
Task dependencies were implicit. KEYMAKER would build a plan with five steps. Step 3 depended on step 2. But the dependency was encoded in natural language instructions, not in a machine-readable graph. If step 2 failed, step 3 would start anyway with stale context and produce garbage output that looked plausible enough to pass initial review.
Debugging was archaeological. When something went wrong, the debugging process was: read logs. Lots of logs. Across multiple agents. In multiple tmux sessions. Try to reconstruct what happened, in what order, from fragments of output scattered across the filesystem. This took hours per incident.
We needed a nervous system -- something that could feel what was happening across the entire agent fleet in real time, detect anomalies as they occurred, and react before damage accumulated.
Nerve is a Convex-backed real-time agent nervous system. Every agent registers with Nerve when it starts, sends heartbeats while it runs, reports costs as it incurs them, logs decisions as it makes them, and records failures as they happen. Everything is queryable in real time through HTTP endpoints or Convex subscriptions.
Nerve tracks agent operations across 9 Convex tables:
| Table | Purpose | Key Fields |
|---|---|---|
agentSessions | Live agent tracking | status, lastHeartbeat, totalCostUsd |
messages | Inter-agent messaging bus | from, to, type, priority, threadId |
costEntries | Per-call token and cost tracking | model, inputTokens, outputTokens, costUsd |
decisions | ORACLE routing audit trail | classification, confidence, agentsChosen, rationale |
failures | Centralized error registry | errorType, retryCount, autoRetried, resolved |
systemConfig | Kill switch and global settings | key, value |
taskSteps | Dependency-aware task sequencing | blockedBy, status, assignedAgent |
progressEvents | Real-time step streaming | step, totalSteps, percentage, artifacts |
costAlerts | Threshold breach tracking | alertType, currentCostUsd, thresholdUsd |
Every table is indexed for the queries that matter. Agent sessions are indexed by status, session, agent name, and parent agent. Failures are indexed by type, agent, resolution status, and session. The indexes are not an afterthought -- they define the access patterns that make real-time dashboards possible.
Nerve exposes 40 HTTP endpoints organized into seven domains. Every endpoint is accessible from any CLI tool, any agent, or any monitoring script via simple HTTP calls:
Mail (4 endpoints): /mail/send, /mail/broadcast, /mail/inbox, /mail/read -- Inter-agent messaging with priority levels (low, normal, high, critical), thread support, and read tracking.
Costs (3 endpoints): /costs/track, /costs/session, /costs/dashboard -- Per-call cost tracking with model-level granularity. Every LLM call logs input tokens, output tokens, cache read/write tokens, cost in USD, and duration.
Agents (6 endpoints): /agents/register, /agents/heartbeat, /agents/status, /agents/running, /agents/dashboard, /agents/stale, /agents/auto-clean -- Full lifecycle management from registration to auto-cleanup of stale sessions.
Decisions (3 endpoints): /decisions/log, /decisions/recent, /decisions/stats -- Every ORACLE routing decision is logged with classification (SIMPLE/MEDIUM/COMPLEX/EPIC), confidence score, chosen agents, alternatives considered, and rationale.
Failures (5 endpoints): /failures/log, /failures/unresolved, /failures/stats, /failures/retryable, /failures/retry -- Centralized error tracking with 10 error types, retry counts, and auto-retry support.
Tasks (7 endpoints): /tasks/create-plan, /tasks/start, /tasks/complete, /tasks/fail, /tasks/retry, /tasks/plan-status, /tasks/next-ready -- Dependency-aware task sequencing with blockedBy arrays and automatic unblocking.
Progress (4 endpoints): /progress/emit, /progress/latest, /progress/timeline, /progress/active -- Real-time progress streaming with step counts, percentages, and artifact tracking.
Alerts (4 endpoints): /alerts/check, /alerts/pending, /alerts/notify, /alerts/list -- Cost threshold monitoring with three alert types: session threshold, agent threshold, and daily threshold.
System (3 endpoints): /system/killswitch (GET and POST), /system/dashboard -- Global kill switch and system-wide dashboard.
We chose Convex for Nerve for one reason above all others: real-time subscriptions.
Traditional REST APIs require polling. If you want to know whether an agent is still alive, you poll the heartbeat endpoint every N seconds. If you want a live cost dashboard, you poll the cost endpoint. With 267 agents sending heartbeats and cost updates, polling creates a thundering herd problem that scales poorly.
Convex subscriptions push updates to clients the instant data changes. When an agent sends a heartbeat, every dashboard subscribed to that agent's session sees the update immediately. When a cost threshold is breached, the alert appears in real time. No polling interval. No wasted requests. No stale data.
This matters operationally because the difference between knowing an agent is stuck now versus knowing it was stuck five minutes ago is the difference between killing one stale agent and dealing with a cascade of dependent failures.
Every running agent sends a heartbeat to Nerve at regular intervals. The heartbeat updates the lastHeartbeat timestamp on the agent's session record.
A cron-based health monitor checks for stale agents -- any agent whose last heartbeat is older than 5 minutes. Stale agents are flagged and, depending on configuration, auto-killed. The auto-clean endpoint handles this:
# Check for stale agents
curl https://nerve.convex.site/agents/stale
# Auto-clean stale agents (kill those older than 5 min)
curl -X POST https://nerve.convex.site/agents/auto-clean \
-H "Content-Type: application/json" \
-d '{"maxStaleMinutes": 5}'When a stale agent is killed, Nerve sends a kill_signal message through the mail system and logs the event as a failure. ORACLE receives notification and can re-route the stalled task to a fresh agent. The entire recovery loop -- detect stale, kill, notify, re-route -- happens without human intervention.
Before Nerve, a stale agent could sit for hours. Now the maximum time-to-detection is 5 minutes, and recovery is automatic.
KEYMAKER builds execution plans as dependency graphs. Each plan consists of ordered steps, where each step can optionally declare a blockedBy array -- a list of step identifiers that must complete before this step can start.
# Create a plan with dependencies
curl -X POST https://nerve.convex.site/tasks/create-plan \
-H "Content-Type: application/json" \
-d '{
"sessionId": "session-abc",
"planId": "plan-auth-system",
"steps": [
{"stepIndex": 0, "title": "Scaffold auth module", "status": "ready"},
{"stepIndex": 1, "title": "Implement JWT validation", "blockedBy": ["plan-auth-system:0"], "status": "blocked"},
{"stepIndex": 2, "title": "Add middleware guards", "blockedBy": ["plan-auth-system:1"], "status": "blocked"},
{"stepIndex": 3, "title": "Write integration tests", "blockedBy": ["plan-auth-system:2"], "status": "blocked"}
]
}'When step 0 completes, Nerve automatically checks which steps were blocked by it. Step 1's blockedBy resolves, and its status transitions from blocked to ready. The /tasks/next-ready endpoint returns the next step that can be picked up by an available agent.
This eliminates the class of bugs where downstream agents start working with incomplete or stale context from upstream steps that have not finished yet. The dependency graph is machine-readable and enforced, not inferred from natural language instructions.
Steps can also fail and be retried, with retry counts tracked per step. If a step exceeds its retry limit, the plan can be replanned or escalated -- the decision is surfaced through Nerve's failure registry rather than silently swallowed.
Every LLM call that runs through our agent system logs its cost to Nerve:
curl -X POST https://nerve.convex.site/costs/track \
-H "Content-Type: application/json" \
-d '{
"agentName": "MORPHEUS",
"model": "claude-opus-4",
"inputTokens": 12450,
"outputTokens": 3200,
"cacheReadTokens": 8000,
"costUsd": 0.23,
"sessionId": "session-abc",
"taskDescription": "Implement auth middleware"
}'The cost dashboard aggregates this data in real time: total spend per session, per agent, per model. You can see exactly which agents are expensive and which models are driving cost.
More importantly, Nerve has a threshold alert system with three tiers:
| Alert Type | Default Threshold | Trigger |
|---|---|---|
| Session threshold | $50 per session | Any single session exceeds $50 |
| Agent threshold | $25 per agent | Any single agent exceeds $25 cumulative |
| Daily threshold | $200 per day | Total daily spend exceeds $200 |
When a threshold is breached, Nerve creates a costAlerts record and sends a cost_threshold message through the mail system. The Telegram integration picks up pending alerts and pushes notifications to the operator. You know about runaway costs within seconds, not at the end of the billing cycle.
# Check all thresholds for a session
curl -X POST https://nerve.convex.site/alerts/check \
-H "Content-Type: application/json" \
-d '{"sessionId": "session-abc"}'
# Get unnotified alerts
curl https://nerve.convex.site/alerts/pendingBuild failures, lint errors, and test failures are the most common agent failures by volume. Before Nerve, each one required manual investigation. Now, Nerve tracks retryable failures and supports automatic retry with limits.
When an agent hits a build error, it logs the failure to Nerve with errorType: "build_error". The failure record includes the error message, stack trace, and the current retry count. If retryCount is below maxRetries (default: 3), the orchestrator can automatically retry the failing step.
The /failures/retryable endpoint returns all failures that are eligible for retry -- those that have not been resolved and have not exceeded their retry limit. The orchestrator queries this endpoint, picks up retryable failures, and re-dispatches them.
# Get retryable failures for a session
curl -X POST https://nerve.convex.site/failures/retryable \
-H "Content-Type: application/json" \
-d '{"sessionId": "session-abc"}'
# Mark a failure as retried (increments retry count)
curl -X POST https://nerve.convex.site/failures/retry \
-H "Content-Type: application/json" \
-d '{"failureId": "failure-xyz"}'The 10 failure types tracked in Nerve cover the full spectrum of agent errors: build_error, runtime_error, timeout, permission, dependency, merge_conflict, api_error, lint_error, test_failure, and unknown. Each type has its own retry behavior and escalation path. Build errors retry up to 3 times. Timeouts escalate to ORACLE for re-routing. Merge conflicts trigger the 4-tier resolution system.
Nerve gives us visibility. The /debug command gives us automated response.
When you run /debug against a project, it triggers a 6-step pipeline that hunts for bugs, diagnoses root causes, plans fixes, implements them, verifies the results, and generates a report. Everything is tracked through Nerve -- every agent involved, every step completed, every failure encountered, every cost incurred.
Step 1: Hunt. Five specialized hunting squads sweep the project simultaneously:
Step 2: Diagnose. All findings from the hunt phase are aggregated and deduplicated. A diagnostic agent correlates symptoms to root causes. A console error in the browser might trace back to a missing API endpoint, which traces back to a dependency that was not installed, which traces back to a merge that dropped a line from package.json. The diagnostic step follows the causal chain, not just the symptom.
Step 3: Plan. KEYMAKER builds a dependency-aware fix plan. Fixes are ordered by dependency -- you fix the database schema before you fix the API route that depends on it. Each fix step has explicit inputs (the diagnosed root cause) and outputs (the expected state after the fix).
Step 4: Fix. 9 specialized fixer agents execute the plan:
Each fixer agent reports progress through Nerve's progress events system. You can watch fixes being applied in real time.
Step 5: Verify. After all fixes are applied, the verification step reruns the same hunting squads from Step 1. Every bug found in the hunt phase is checked against the fixed codebase. New bugs introduced by the fixes are caught here. If verification fails, the pipeline loops back to Step 4 for a second fix pass.
Step 6: Report. A structured report is generated with: bugs found, root causes identified, fixes applied, verification results, remaining issues (if any), and total cost of the debug session. The report is logged to Nerve and optionally sent via Telegram.
Every step of the /debug pipeline creates records in Nerve:
This means you can query Nerve after a debug session and see exactly what happened: which agents ran, how long they took, what they cost, what they found, and what they fixed. The debug process itself is fully observable.
Nerve is the nervous system. The agents are the organs. Here is how they connect.
ORACLE receives every incoming task and classifies it: SIMPLE, MEDIUM, COMPLEX, or EPIC. Based on the classification, ORACLE decides which agents to involve and in what configuration.
A SIMPLE task (fix a typo, rename a variable) goes directly to MORPHEUS. No planning needed. A COMPLEX task (build an auth system) goes through KEYMAKER for planning, then MORPHEUS for execution, then SERAPH for audit. An EPIC task (launch a new product) activates the full pipeline with all relevant agents.
Every routing decision is logged to Nerve's decisions table with the classification, confidence score, chosen agents, alternatives considered, and rationale. This creates an audit trail that shows not just what ORACLE decided, but why.
# View recent ORACLE decisions
curl https://nerve.convex.site/decisions/recent
# Example response
{
"classification": "COMPLEX",
"confidence": 0.87,
"taskSummary": "Implement OAuth2 with Google and GitHub providers",
"agentsChosen": ["KEYMAKER", "MORPHEUS", "SERAPH"],
"alternativesConsidered": ["MORPHEUS-only (rejected: needs planning)"],
"rationale": "Multi-file auth integration with external providers requires dependency planning and security audit"
}KEYMAKER takes a complex task and produces a dependency graph -- a structured plan where each step declares its inputs, outputs, assigned agent, and blockedBy relationships. The plan is registered in Nerve's taskSteps table, making it machine-readable and trackable.
KEYMAKER does not execute. It plans. This separation is critical. A planner that also executes is constantly context-switching between forward-looking planning and backward-looking error handling. Keeping them separate means KEYMAKER's context stays clean and focused on the task decomposition.
MORPHEUS is the implementation commander. It receives step assignments from the plan, executes them, and reports results back through Nerve. MORPHEUS handles the actual code writing, file modifications, and tool use.
For each step, MORPHEUS registers an agent session, sends heartbeats during execution, logs costs for every LLM call, emits progress events, and updates the task step status upon completion. If a step fails, MORPHEUS logs the failure with full context and the orchestrator decides whether to retry, skip, or escalate.
SERAPH runs code quality and security audits on completed work. It reviews code changes for TypeScript errors, security vulnerabilities, performance issues, and style violations. SERAPH's findings feed back into the failure registry if issues are found, potentially triggering additional fix cycles.
| Agent | Role | How It Uses Nerve |
|---|---|---|
| NIOBE | Deep parallel research | Registers research sessions, logs costs per search |
| SMITH | Feedback and improvement | Reads failure patterns from Nerve to improve future runs |
| ARCHITECT | System design and analysis | Queries decision history for architectural context |
| MEROVINGIAN | Cross-project knowledge | Reads patterns across multiple session histories |
| NEO | Session health monitoring | Watches heartbeat data, detects anomalies |
| LINK | Telegram bridge | Forwards cost alerts and failure notifications |
| CONSTRUCT | UI component library | Tracks component generation costs and outcomes |
| ZION | System metrics dashboard | Aggregates all Nerve data into operational views |
Agents need to communicate. Not through shared files or implicit state, but through an explicit messaging bus with delivery guarantees.
Nerve's mail system supports 17 message types organized by purpose:
Lifecycle messages: worker_done, task_assign, kill_signal, heartbeat
Coordination messages: merge_ready, merge_conflict, step_unblocked, data_pass, context_update
Alert messages: escalation, blocker, cost_alert, cost_threshold, stale_alert
Status messages: progress_update, ci_retry, health_report, info
Every message has a priority level (low, normal, high, critical), a session ID for scoping, and optional thread IDs for conversation tracking. Messages can be sent to specific agents, to agent groups (@builders, @qa, @leads), or broadcast to all agents (@all).
The inbox query supports filtering by read status, allowing agents to process only unread messages. This prevents agents from re-processing messages they have already handled -- a subtle but important property for idempotent agent operations.
/debug finds, diagnoses, plans, fixes, and verifies without human involvement. Debug sessions that took hours now complete in minutes.The most significant improvement is not any single feature. It is the shift from reactive to proactive operations. Before Nerve, we responded to problems after they caused damage. Now we detect and resolve them as they happen, often before they affect downstream agents.
Nerve is part of the Agentik OS ecosystem -- the same infrastructure that powers our 267-agent operation. We built it because nothing else existed at the intersection of real-time agent observability, dependency-aware task orchestration, and automated failure recovery.
The agent orchestration space is evolving fast. Most frameworks focus on the agent layer -- how to prompt agents, how to chain them, how to give them tools. Very few address the infrastructure layer -- how to monitor agents at scale, how to track costs in real time, how to enforce task dependencies, how to automate recovery.
That infrastructure layer is where production multi-agent systems succeed or fail. The agents are commodities. The nervous system is the product.
If you are running more than a handful of agents in production, you need something like Nerve. Not necessarily our implementation, but the capabilities it provides: real-time session tracking, heartbeat-based health monitoring, structured inter-agent communication, dependency-aware task sequencing, cost alerting, and failure management with auto-retry.
Building this took us months. Using it saves us hours every day.
Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise. Gareth built Agentik {OS} to prove that one person with the right AI system can outperform an entire traditional development team. He has personally architected and shipped 7+ production applications using AI-first workflows.

AI Agents Just Entered the Production Era. Here's What Changes.
Banks are deploying agentic AI for trade surveillance. VCs just poured $1B into agent infrastructure. The pilot phase is over — and most teams aren't ready.

Multi-Agent Orchestration: The Real Production Guide
Most multi-agent demos crumble in production. Here's how to build orchestration that survives real workloads, error storms, and 3am failures.

Agent Platforms: Build vs Buy in 2026
We deployed Paperclip alongside our 200-agent AISB system. Here is what separates management layers from execution engines in production.
Stop reading about AI and start building with it. Book a free discovery call and see how AI agents can accelerate your business.