Monitoring Your AI Agents: Setup and Best Practices

I deployed an AI agent on a Monday. By Thursday it had spent $340 in API calls because it got stuck in a loop nobody noticed.

No alerts. No dashboards. No logging beyond console.log.

The agent was retrying a failed tool call. The tool was down. The agent kept trying. For three days. Every retry cost money.

Now I monitor everything. Here's the setup.

What to Monitor (The Five Signals)

Every AI agent needs monitoring on five dimensions:

1. Cost. How much are you spending per request, per user, per day? Is it trending up?

2. Latency. How long does each agent run take? Where's the bottleneck?

3. Quality. Are the outputs actually good? Are users satisfied?

4. Errors. What's failing? How often? Is it getting worse?

5. Behavior. What tools is the agent calling? How many iterations per task? Is it looping?

Most people monitor errors and ignore the other four. Then they get surprised by a $2,000 API bill.

The Monitoring Stack

You don't need expensive observability platforms. Here's what I use:

Structured logging: Every agent action logged as JSON
SQLite: Store logs locally for querying
Grafana + Prometheus: Dashboards and alerts (if you want the visual)
Telegram alerts: Instant notification for critical issues

For most solo builders, structured logging plus Telegram alerts is enough.

Structured Logging Setup

typescript

interface AgentLog {
  timestamp: string;
  runId: string;
  event: "run_start" | "tool_call" | "tool_result" | "llm_call" | "llm_response" | "run_end" | "error";
  data: {
    model?: string;
    toolName?: string;
    inputTokens?: number;
    outputTokens?: number;
    cost?: number;
    latencyMs?: number;
    iterationCount?: number;
    error?: string;
  };
}

class AgentMonitor {
  private runId: string;
  private startTime: number;
  private iterationCount: number = 0;
  private totalCost: number = 0;
  private maxIterations: number;

  constructor(maxIterations: number = 20) {
    this.runId = crypto.randomUUID();
    this.startTime = Date.now();
    this.maxIterations = maxIterations;
  }

  log(event: AgentLog["event"], data: AgentLog["data"] = {}) {
    const entry: AgentLog = {
      timestamp: new Date().toISOString(),
      runId: this.runId,
      event,
      data: {
        ...data,
        iterationCount: this.iterationCount,
      },
    };

    // Write to structured log
    appendToLog(entry);

    // Check for anomalies
    if (event === "tool_call") {
      this.iterationCount++;
      if (this.iterationCount > this.maxIterations) {
        this.alert(`Agent exceeded \${this.maxIterations} iterations. Possible loop.`);
        throw new Error("Max iterations exceeded");
      }
    }

    if (event === "llm_response" && data.cost) {
      this.totalCost += data.cost;
      if (this.totalCost > 1.0) { // $1 per run threshold
        this.alert(`Agent run cost exceeded $1.00 (current: $\${this.totalCost.toFixed(2)})`);
      }
    }
  }

  async alert(message: string) {
    // Send to Telegram, Slack, email, whatever
    await sendTelegramAlert(message);
  }
}

Every tool call, every LLM request, every response. Logged with timing and cost.

Cost Tracking

Calculate cost per request. It's straightforward.

typescript

const PRICING = {
  "claude-sonnet-4-20250514": {
    inputPer1M: 3.0,
    outputPer1M: 15.0,
  },
  "claude-haiku-35-20241022": {
    inputPer1M: 0.80,
    outputPer1M: 4.0,
  },
};

function calculateCost(
  model: string,
  inputTokens: number,
  outputTokens: number
): number {
  const pricing = PRICING[model as keyof typeof PRICING];
  if (!pricing) return 0;

  return (
    (inputTokens / 1_000_000) * pricing.inputPer1M +
    (outputTokens / 1_000_000) * pricing.outputPer1M
  );
}

Track this per run, per user, per day. Set budgets. Alert when you're approaching them.

typescript

// Daily cost check
async function checkDailyCost() {
  const today = new Date().toISOString().split("T")[0];
  const totalCost = await db.query(
    "SELECT SUM(cost) as total FROM agent_logs WHERE date(timestamp) = ? AND event = 'llm_response'",
    [today]
  );

  if (totalCost.total > DAILY_BUDGET) {
    await sendAlert(`Daily AI cost exceeded budget: $\${totalCost.total.toFixed(2)} / $\${DAILY_BUDGET}`);
  }
}

Loop Detection

The most dangerous agent bug. A loop that burns money indefinitely.

typescript

function detectLoop(recentActions: string[]): boolean {
  // Check for repeated patterns
  if (recentActions.length < 6) return false;

  const last3 = recentActions.slice(-3).join(",");
  const prev3 = recentActions.slice(-6, -3).join(",");

  // Same three actions repeated = probable loop
  return last3 === prev3;
}

When detected: kill the run, alert the operator, log the context for debugging.

Quality Monitoring

This is the hardest part. How do you know if the AI's output is good?

Option 1: User feedback. Add a thumbs up/down to agent responses. Track the ratio.

Option 2: AI-as-judge. Use a cheap model to evaluate the expensive model's output.

typescript

async function evaluateResponse(
  userQuery: string,
  agentResponse: string
): Promise<number> {
  const evaluation = await cheapModel.generate({
    prompt: `Rate this AI response on a scale of 1-5.

    User asked: \${userQuery}
    Agent responded: \${agentResponse}

    Criteria:
    - Relevance (did it answer the question?)
    - Accuracy (is the information correct?)
    - Completeness (did it cover everything needed?)
    - Clarity (is it easy to understand?)

    Respond with just a number 1-5.`
  });

  return parseInt(evaluation);
}

Option 3: Outcome tracking. Did the agent's action lead to the desired result? If it created a ticket, was the ticket resolved? If it sent an email, did the recipient respond?

The Dashboard I Actually Check

I don't look at a dashboard. I set up alerts and forget about them.

Alerts I run:

Daily cost exceeds budget
Any agent run takes longer than 2 minutes
Error rate exceeds 5% in the last hour
Quality score drops below 3.5 average
Any agent loops detected
New error types (errors I haven't seen before)

If no alerts fire, everything is fine. If one fires, I look at the structured logs for that time period.

That's the whole monitoring philosophy. Don't watch dashboards. Set thresholds. Get alerted when they're crossed. Spend your attention on building, not watching.

Monitoring Your AI Agents: Setup and Best Practices

I deployed an AI agent on a Monday. By Thursday it had spent $340 in API calls because it got stuck in a loop nobody noticed.

No alerts. No dashboards. No logging beyond console.log.

The agent was retrying a failed tool call. The tool was down. The agent kept trying. For three days. Every retry cost money.

Now I monitor everything. Here's the setup.

What to Monitor (The Five Signals)

Every AI agent needs monitoring on five dimensions:

1. Cost. How much are you spending per request, per user, per day? Is it trending up?

2. Latency. How long does each agent run take? Where's the bottleneck?

3. Quality. Are the outputs actually good? Are users satisfied?

4. Errors. What's failing? How often? Is it getting worse?

5. Behavior. What tools is the agent calling? How many iterations per task? Is it looping?

Most people monitor errors and ignore the other four. Then they get surprised by a $2,000 API bill.

The Monitoring Stack

You don't need expensive observability platforms. Here's what I use:

Structured logging: Every agent action logged as JSON
SQLite: Store logs locally for querying
Grafana + Prometheus: Dashboards and alerts (if you want the visual)
Telegram alerts: Instant notification for critical issues

For most solo builders, structured logging plus Telegram alerts is enough.

Structured Logging Setup

typescript

interface AgentLog {
  timestamp: string;
  runId: string;
  event: "run_start" | "tool_call" | "tool_result" | "llm_call" | "llm_response" | "run_end" | "error";
  data: {
    model?: string;
    toolName?: string;
    inputTokens?: number;
    outputTokens?: number;
    cost?: number;
    latencyMs?: number;
    iterationCount?: number;
    error?: string;
  };
}

class AgentMonitor {
  private runId: string;
  private startTime: number;
  private iterationCount: number = 0;
  private totalCost: number = 0;
  private maxIterations: number;

  constructor(maxIterations: number = 20) {
    this.runId = crypto.randomUUID();
    this.startTime = Date.now();
    this.maxIterations = maxIterations;
  }

  log(event: AgentLog["event"], data: AgentLog["data"] = {}) {
    const entry: AgentLog = {
      timestamp: new Date().toISOString(),
      runId: this.runId,
      event,
      data: {
        ...data,
        iterationCount: this.iterationCount,
      },
    };

    // Write to structured log
    appendToLog(entry);

    // Check for anomalies
    if (event === "tool_call") {
      this.iterationCount++;
      if (this.iterationCount > this.maxIterations) {
        this.alert(`Agent exceeded \${this.maxIterations} iterations. Possible loop.`);
        throw new Error("Max iterations exceeded");
      }
    }

    if (event === "llm_response" && data.cost) {
      this.totalCost += data.cost;
      if (this.totalCost > 1.0) { // $1 per run threshold
        this.alert(`Agent run cost exceeded $1.00 (current: $\${this.totalCost.toFixed(2)})`);
      }
    }
  }

  async alert(message: string) {
    // Send to Telegram, Slack, email, whatever
    await sendTelegramAlert(message);
  }
}

Every tool call, every LLM request, every response. Logged with timing and cost.

Cost Tracking

Calculate cost per request. It's straightforward.

typescript

const PRICING = {
  "claude-sonnet-4-20250514": {
    inputPer1M: 3.0,
    outputPer1M: 15.0,
  },
  "claude-haiku-35-20241022": {
    inputPer1M: 0.80,
    outputPer1M: 4.0,
  },
};

function calculateCost(
  model: string,
  inputTokens: number,
  outputTokens: number
): number {
  const pricing = PRICING[model as keyof typeof PRICING];
  if (!pricing) return 0;

  return (
    (inputTokens / 1_000_000) * pricing.inputPer1M +
    (outputTokens / 1_000_000) * pricing.outputPer1M
  );
}

Track this per run, per user, per day. Set budgets. Alert when you're approaching them.

typescript

// Daily cost check
async function checkDailyCost() {
  const today = new Date().toISOString().split("T")[0];
  const totalCost = await db.query(
    "SELECT SUM(cost) as total FROM agent_logs WHERE date(timestamp) = ? AND event = 'llm_response'",
    [today]
  );

  if (totalCost.total > DAILY_BUDGET) {
    await sendAlert(`Daily AI cost exceeded budget: $\${totalCost.total.toFixed(2)} / $\${DAILY_BUDGET}`);
  }
}

Loop Detection

The most dangerous agent bug. A loop that burns money indefinitely.

typescript

function detectLoop(recentActions: string[]): boolean {
  // Check for repeated patterns
  if (recentActions.length < 6) return false;

  const last3 = recentActions.slice(-3).join(",");
  const prev3 = recentActions.slice(-6, -3).join(",");

  // Same three actions repeated = probable loop
  return last3 === prev3;
}

When detected: kill the run, alert the operator, log the context for debugging.

Quality Monitoring

This is the hardest part. How do you know if the AI's output is good?

Option 1: User feedback. Add a thumbs up/down to agent responses. Track the ratio.

Option 2: AI-as-judge. Use a cheap model to evaluate the expensive model's output.

typescript

async function evaluateResponse(
  userQuery: string,
  agentResponse: string
): Promise<number> {
  const evaluation = await cheapModel.generate({
    prompt: `Rate this AI response on a scale of 1-5.

    User asked: \${userQuery}
    Agent responded: \${agentResponse}

    Criteria:
    - Relevance (did it answer the question?)
    - Accuracy (is the information correct?)
    - Completeness (did it cover everything needed?)
    - Clarity (is it easy to understand?)

    Respond with just a number 1-5.`
  });

  return parseInt(evaluation);
}

Option 3: Outcome tracking. Did the agent's action lead to the desired result? If it created a ticket, was the ticket resolved? If it sent an email, did the recipient respond?

The Dashboard I Actually Check

I don't look at a dashboard. I set up alerts and forget about them.

Alerts I run:

Daily cost exceeds budget
Any agent run takes longer than 2 minutes
Error rate exceeds 5% in the last hour
Quality score drops below 3.5 average
Any agent loops detected
New error types (errors I haven't seen before)

If no alerts fire, everything is fine. If one fires, I look at the structured logs for that time period.

That's the whole monitoring philosophy. Don't watch dashboards. Set thresholds. Get alerted when they're crossed. Spend your attention on building, not watching.

Monitoring Your AI Agents: Setup and Best Practices

Monitoring Your AI Agents: Setup and Best Practices

What to Monitor (The Five Signals)

The Monitoring Stack

Structured Logging Setup

Cost Tracking

Loop Detection

Quality Monitoring

The Dashboard I Actually Check

Related Articles

Agent Monitoring and Observability: Seeing Inside the Black Box

Monitoring AI-Driven Applications: What to Track and Why

API Integration Patterns: Connecting AI with External Services

Want to Implement This?

Monitoring Your AI Agents: Setup and Best Practices

Monitoring Your AI Agents: Setup and Best Practices

What to Monitor (The Five Signals)

The Monitoring Stack

Structured Logging Setup

Cost Tracking

Loop Detection

Quality Monitoring

The Dashboard I Actually Check

Related Articles

Agent Monitoring and Observability: Seeing Inside the Black Box

Monitoring AI-Driven Applications: What to Track and Why

API Integration Patterns: Connecting AI with External Services

Want to Implement This?