Loading...
Loading...
Weekly AI insights —
Real strategies, no fluff. Unsubscribe anytime.
Written by Gareth Simono, Founder and CEO of Agentik {OS}. Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise platforms. Gareth orchestrates 267 specialized AI agents to deliver production software 10x faster than traditional development teams.
Founder & CEO, Agentik {OS}
We deployed Paperclip alongside our 200-agent AISB system. Here is what separates management layers from execution engines in production.

TL;DR: We forked Paperclip, deployed it locally on port 3100 with 28 schema migrations, and ran it alongside AISB's 200+ specialized agents for one week. The result: these are not competing tools. Paperclip is a management layer. AISB is an execution engine. Conflating the two is the mistake most teams make, and it costs them dearly.
Seventy-three percent of AI teams cite agent visibility as their primary production challenge, yet most platforms ship observability as an afterthought (Stack Overflow Developer Survey, 2024). We discovered this the hard way. Before deploying Paperclip, our AISB system had 200+ agents running across Claude Code, Codex, and Gemini adapters with no unified dashboard. Incidents were diagnosed in retrospect, not in real time.
The gap is not technical. The data exists. Heartbeat signals, token counts, and task completions are all emitted by modern agent runtimes.
The gap is architectural. Nobody wired those signals into a single pane before the fire started.
We ran this way for months. When AISB Nerve (our Convex-backed observability layer) finally went live, we realized we had been flying blind on cost, throughput, and failure patterns simultaneously.
The visibility problem is not unique to teams that build their own systems. It is endemic to anyone who treats agent monitoring as a phase-two concern.
A production-grade agent platform needs four things: budget enforcement, heartbeat monitoring, multi-team isolation, and a clear separation between who manages agents and who runs them. Most tools deliver one or two. Very few deliver all four without significant configuration overhead.
Budget enforcement sounds simple. It is not. Uncapped agentic workflows consumed 12x more compute than single-agent equivalents in controlled testing (GitHub AI Research Blog, 2025).
That number should stop you cold. A single unguarded workflow loop can erase a month of compute budget in hours.
Heartbeat monitoring requires more than a ping. You need last-seen timestamps, task state, assigned model, and current session cost aggregated in one query. Paperclip ships this out of the box. AISB Nerve provides it via Convex real-time subscriptions. The difference is that Paperclip surfaces it in a UI, while Nerve exposes it as an API.
Multi-team isolation is where most internal tools collapse entirely. We will cover that separately.
The separation between management and execution is the insight that changed how we think about this entire space. Paperclip does not run your agents. It governs them. AISB runs them. Once we stopped asking "which one is better" and started asking "which layer does this belong to," the evaluation became straightforward.
A heartbeat system works by having each agent emit a liveness signal on a fixed interval, typically every 30 to 60 seconds, which the orchestration layer compares against a staleness threshold to determine if an agent is alive, degraded, or dead. The signal itself is cheap. The logic that acts on missed signals is where most implementations diverge.
Paperclip's heartbeat implementation is embedded in its PostgreSQL schema. We observed 28 migration files on first deploy, several of which define the agent session and heartbeat tables directly.
The staleness check runs as a background job. Miss three consecutive heartbeats and the agent is flagged in the dashboard.
AISB Nerve uses a different approach. Rather than a scheduled poll, Nerve relies on Convex's real-time mutation system. When an agent calls aisb-nerve agent heartbeat, it writes a timestamp to the cloud database, and any subscriber watching that record sees the update within milliseconds.
The trade-off is infrastructure dependency. Paperclip's approach works with embedded PostgreSQL and no external services. Nerve requires a live Convex deployment.
For teams running fully air-gapped or on-premise, Paperclip's self-contained model is a meaningful advantage. For teams already invested in Convex or similar real-time backends, Nerve's latency is unbeatable.
Both approaches solve the same problem. The choice comes down to your existing stack, not the superiority of one heartbeat algorithm over another.
The build vs buy framing is wrong for most agent teams, because it assumes the thing you are evaluating is a single product. In practice, an agent platform is three distinct concerns: agent execution, agent governance, and agent observability. You can buy, build, or fork each layer independently, and the best teams usually do all three.
We forked Paperclip to agentik-os/agentik-team rather than deploying the upstream repository directly. The fork took less than a day.
We needed to add the OpenClaw adapter, adjust the multi-company schema, and wire in our existing Telegram notification hooks. None of that required rewriting the core.
The more precise question is: "Which parts of this are table stakes, and which parts are differentiated?" Budget tracking, heartbeat monitoring, and org charts are table stakes. Every serious team needs them, and building them from scratch is expensive and error-prone.
Agent behavior, specialized domain logic, and execution pipelines are differentiated. AISB's 200+ specialized agents represent months of accumulated prompt engineering, tool definitions, and failure recovery patterns. That is not something you buy.
Teams that separate orchestration from execution reported 40% fewer production incidents in the first 90 days of deployment (McKinsey Technology Trends, 2025). That statistic reflects architectural clarity, not tooling. When your management layer and execution layer have clean boundaries, failures are contained and diagnosable.
In our one-week evaluation, three things stood out about Paperclip: the agent adapter system is genuinely well-designed, the budget enforcement UI is production-ready with minimal configuration, and the multi-company model exposed a real gap in how we had structured AISB's access control. The weaknesses were equally clear: no native real-time API, limited programmatic access to cost data, and no built-in concept of agent specialization.
Paperclip ships with seven agent adapters: Claude Code, Codex, Cursor, Gemini, OpenClaw, OpenCode, and Pi. We used all seven in testing.
The adapter pattern is clean. Adding the OpenClaw adapter to our fork required extending a single interface file and registering the adapter in the configuration.
Paperclip's budget enforcement blocks agents from starting new tasks when a configured spending threshold is reached. This is exactly the behavior we needed. AISB Nerve tracks cost per agent using real-time Convex mutations, with pricing set at Opus $15/$75 per million tokens, Sonnet $3/$15, and Haiku $0.8/$4 input/output. But Nerve does not block execution. It observes and alerts.
Combining both systems gives us blocking enforcement from Paperclip and granular attribution from Nerve. Neither tool alone provides that.
The gap in agent specialization is the sharpest limitation. Paperclip treats all agents as generic workers differentiated only by adapter type. AISB treats agents as specialists, with named personas (ORACLE, MORPHEUS, SERAPH, KEYMAKER, and so on), defined responsibilities, and domain-specific tool access. Mapping that specialization into Paperclip's org chart model required custom metadata fields.
Multi-company isolation forces every assumption about shared state to become explicit. When you run a single-company agent system, you can take shortcuts with database schemas, shared queues, and global configuration. The moment you add a second company, every one of those shortcuts becomes a potential data leak or a billing attribution error.
Multi-tenancy is the top unmet requirement for enterprise AI agent deployments, cited by 61% of respondents (Gartner AI Infrastructure, 2025).
Paperclip's schema addresses this directly. The 28 migration files we reviewed include explicit foreign keys tying every agent session, budget record, and heartbeat entry to a company identifier. Queries are scoped by company at the ORM layer.
For Agentik OS, this matters because we run client projects alongside internal tools. DentistryGPT, Gluten-Libre, and Resonant all run in the same VPS environment. Agent costs, task histories, and error logs must not bleed across client boundaries.
AISB Nerve does not currently enforce multi-company isolation at the data layer. All Convex records share a single namespace. We use naming conventions and session prefixes to separate concerns, but that is a convention, not a constraint.
Migrating Nerve to a multi-tenant model is on the roadmap. Paperclip's approach gives us a clear reference implementation to follow. The irony is that the open-source platform we evaluated ended up teaching us something concrete about our own system's architecture.
Every platform decision carries risks that vendors do not put in their README files. We identified three that are relevant to any team considering Paperclip or a similar management layer alongside an existing execution system. Naming them early is cheaper than discovering them in production.
First: data consistency across two observability tools. Fifty-eight percent of developers on AI systems reported data consistency issues when using more than one observability tool (Stack Overflow Developer Survey, 2024). Running Paperclip and AISB Nerve in parallel means two sources of cost truth. We mitigated this by designating Nerve as the authoritative cost record and treating Paperclip's budget enforcement as a guardrail layer only.
Second: schema drift. Paperclip is an active open-source project. Our fork will diverge from upstream. We built a simple diff script that runs weekly and flags schema changes in the upstream repository. Staying within two or three migrations of upstream is manageable. Falling six months behind is not.
Third: the specialization gap creates a false sense of uniformity. When Paperclip's dashboard shows all agents as equivalent workers, it obscures the real cost and latency differences between an Opus-backed ORACLE agent and a Haiku-backed ZION metrics agent. AI agent adoption is growing fast, with GitHub Octoverse reporting that AI-assisted coding tools are now used by over 70% of professional developers surveyed (GitHub Octoverse, 2025). The tooling assumptions baked into those tools are built for homogeneous agent pools, not specialized agent ecosystems.
If you are running more than ten agents in production and you cannot answer "which agent spent the most this week" in under sixty seconds, you need a management layer before you need more agents. Start by auditing your current visibility, then decide whether to fork, buy, or build. The sequence matters more than the choice.
For teams under ten agents, the overhead of a full platform is probably not justified yet. Build a simple heartbeat table and a cost attribution script first. Complexity earns its place when visibility genuinely fails.
For teams already running specialized agent ecosystems, the Paperclip evaluation taught us that the right question is not "replace or integrate" but "which layer owns which concern." Governance belongs to the management layer. Execution belongs to the execution layer. Keep those boundaries explicit in code, not just in documentation.
We will publish our full Paperclip fork configuration, including the multi-company schema adjustments and the OpenClaw adapter, in the coming weeks. Watch the repository at agentik-os/agentik-team for updates.
For deeper reading on the architectural patterns behind these decisions, start with our guide to multi-agent orchestration in production, then move to agent cost optimization strategies for the budget enforcement patterns that complement what Paperclip provides out of the box. If you want to understand how AISB itself is structured before layering a management platform on top, the AISB orchestration system overview covers the full 200-agent architecture including AISB Nerve's Convex backend and the Matrix-themed agent hierarchy we use in production.
Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise. Gareth built Agentik {OS} to prove that one person with the right AI system can outperform an entire traditional development team. He has personally architected and shipped 7+ production applications using AI-first workflows.

Multi-Agent Orchestration: The Real Production Guide
Most multi-agent demos crumble in production. Here's how to build orchestration that survives real workloads, error storms, and 3am failures.

AI Agent Cost Optimization: Stop Burning Money
$47,000 per month on LLM calls for 2,000 users. That's $23.50 per user on inference alone. Most of it was waste. Here's how to fix agent economics.

AI Super Brain: How 12 Matrix-Themed Agents Run an Entire Company
Inside the AISB system: 12 specialized agents named after Matrix characters that autonomously classify tasks, plan execution, dispatch workers, audit quality, and learn from every interaction. The operating system behind Agentik OS.
Stop reading about AI and start building with it. Book a free discovery call and see how AI agents can accelerate your business.