Loading...
Loading...

I deployed an AI agent on a Monday. By Thursday it had spent $340 in API calls because it got stuck in a loop nobody noticed.
No alerts. No dashboards. No logging beyond console.log.
The agent was retrying a failed tool call. The tool was down. The agent kept trying. For three days. Every retry cost money.
Now I monitor everything. Here's the setup.
Every AI agent needs monitoring on five dimensions:
1. Cost. How much are you spending per request, per user, per day? Is it trending up?
2. Latency. How long does each agent run take? Where's the bottleneck?
3. Quality. Are the outputs actually good? Are users satisfied?
4. Errors. What's failing? How often? Is it getting worse?
5. Behavior. What tools is the agent calling? How many iterations per task? Is it looping?
Most people monitor errors and ignore the other four. Then they get surprised by a $2,000 API bill.
You don't need expensive observability platforms. Here's what I use:
For most solo builders, structured logging plus Telegram alerts is enough.
interface AgentLog {
timestamp: string;
runId: string;
event: "run_start" | "tool_call" | "tool_result" | "llm_call" | "llm_response" | "run_end" | "error";
data: {
model?: string;
toolName?: string;
inputTokens?: number;
outputTokens?: number;
cost?: number;
latencyMs?: number;
iterationCount?: number;
error?: string;
};
}
class AgentMonitor {
private runId: string;
private startTime: number;
private iterationCount: number = 0;
private totalCost: number = 0;
private maxIterations: number;
constructor(maxIterations: number = 20) {
this.runId = crypto.randomUUID();
this.startTime = Date.now();
this.maxIterations = maxIterations;
}
log(event: AgentLog["event"], data: AgentLog["data"] = {}) {
const entry: AgentLog = {
timestamp: new Date().toISOString(),
runId: this.runId,
event,
data: {
...data,
iterationCount: this.iterationCount,
},
};
// Write to structured log
appendToLog(entry);
// Check for anomalies
if (event === "tool_call") {
this.iterationCount++;
if (this.iterationCount > this.maxIterations) {
this.alert(`Agent exceeded \${this.maxIterations} iterations. Possible loop.`);
throw new Error("Max iterations exceeded");
}
}
if (event === "llm_response" && data.cost) {
this.totalCost += data.cost;
if (this.totalCost > 1.0) { // $1 per run threshold
this.alert(`Agent run cost exceeded $1.00 (current: $\${this.totalCost.toFixed(2)})`);
}
}
}
async alert(message: string) {
// Send to Telegram, Slack, email, whatever
await sendTelegramAlert(message);
}
}Every tool call, every LLM request, every response. Logged with timing and cost.
Calculate cost per request. It's straightforward.
const PRICING = {
"claude-sonnet-4-20250514": {
inputPer1M: 3.0,
outputPer1M: 15.0,
},
"claude-haiku-35-20241022": {
inputPer1M: 0.80,
outputPer1M: 4.0,
},
};
function calculateCost(
model: string,
inputTokens: number,
outputTokens: number
): number {
const pricing = PRICING[model as keyof typeof PRICING];
if (!pricing) return 0;
return (
(inputTokens / 1_000_000) * pricing.inputPer1M +
(outputTokens / 1_000_000) * pricing.outputPer1M
);
}Track this per run, per user, per day. Set budgets. Alert when you're approaching them.
// Daily cost check
async function checkDailyCost() {
const today = new Date().toISOString().split("T")[0];
const totalCost = await db.query(
"SELECT SUM(cost) as total FROM agent_logs WHERE date(timestamp) = ? AND event = 'llm_response'",
[today]
);
if (totalCost.total > DAILY_BUDGET) {
await sendAlert(`Daily AI cost exceeded budget: $\${totalCost.total.toFixed(2)} / $\${DAILY_BUDGET}`);
}
}The most dangerous agent bug. A loop that burns money indefinitely.
function detectLoop(recentActions: string[]): boolean {
// Check for repeated patterns
if (recentActions.length < 6) return false;
const last3 = recentActions.slice(-3).join(",");
const prev3 = recentActions.slice(-6, -3).join(",");
// Same three actions repeated = probable loop
return last3 === prev3;
}When detected: kill the run, alert the operator, log the context for debugging.
This is the hardest part. How do you know if the AI's output is good?
Option 1: User feedback. Add a thumbs up/down to agent responses. Track the ratio.
Option 2: AI-as-judge. Use a cheap model to evaluate the expensive model's output.
async function evaluateResponse(
userQuery: string,
agentResponse: string
): Promise<number> {
const evaluation = await cheapModel.generate({
prompt: `Rate this AI response on a scale of 1-5.
User asked: \${userQuery}
Agent responded: \${agentResponse}
Criteria:
- Relevance (did it answer the question?)
- Accuracy (is the information correct?)
- Completeness (did it cover everything needed?)
- Clarity (is it easy to understand?)
Respond with just a number 1-5.`
});
return parseInt(evaluation);
}Option 3: Outcome tracking. Did the agent's action lead to the desired result? If it created a ticket, was the ticket resolved? If it sent an email, did the recipient respond?
I don't look at a dashboard. I set up alerts and forget about them.
Alerts I run:
If no alerts fire, everything is fine. If one fires, I look at the structured logs for that time period.
That's the whole monitoring philosophy. Don't watch dashboards. Set thresholds. Get alerted when they're crossed. Spend your attention on building, not watching.

Build comprehensive observability for AI agent systems — trace agent decisions, monitor quality metrics, and debug issues in production.

Comprehensive monitoring strategies for applications built with AI agents — from error tracking to performance metrics and cost optimization.

Your AI agent is only as useful as the services it can talk to. Here are the patterns I use to connect AI to everything else.
Stop reading about AI and start building with it. Book a free discovery call and see how AI agents can accelerate your business.