Loading...
Loading...
Weekly AI insights —
Real strategies, no fluff. Unsubscribe anytime.
Written by Gareth Simono, Founder and CEO of Agentik {OS}. Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise platforms. Gareth orchestrates 267 specialized AI agents to deliver production software 10x faster than traditional development teams.
Founder & CEO, Agentik{OS}
Stop wrapping ChatGPT in a text box and calling it an agent. Here's how to build real agents with perception, reasoning, tools, and memory.

Stop wrapping ChatGPT in a text box and calling it an agent. I see this constantly. Someone hooks up a form to the OpenAI API, adds a system prompt, and announces they have built an AI agent. They have built a chatbot. A useful chatbot maybe, but nowhere close to an agent.
An agent acts. It perceives its environment, decides what to do, uses tools to do it, observes the results, and decides what to do next. The loop runs until the task is complete. No human in the middle.
I have built agents from scratch, extending frameworks, and using orchestration platforms. Here is what I have learned about what actually works, what breaks in production, and where the real complexity hides.
Strip away the marketing and every functional agent has the same four components. Miss one of them and you do not have an agent. You have something that acts like an agent in demos and fails in production.
The agent needs to observe its environment. What does that mean in practice? It depends on your domain.
For a coding agent: reading files, running commands, checking test results, parsing error messages. For a customer support agent: reading tickets, querying customer data, checking order status. For a research agent: searching the web, reading documents, extracting information.
Perception is implemented through tools. The agent calls tools to gather information. Good tool design at the perception layer makes or breaks the agent's ability to understand its situation.
The critical insight: agents need to know when they have gathered enough information to act. Perception is not endless data collection. It is purposeful information gathering to answer a specific question or complete a specific task.
This is where the LLM lives. Given what the agent has perceived, what should it do next?
Reasoning quality depends heavily on prompt architecture. The system prompt needs to tell the agent:
Most agent failures I debug trace back to reasoning problems, not tool problems. The tools work. The agent uses the wrong tool, in the wrong order, or stops too early because the system prompt was not clear enough.
The agent needs to do things. Modify files. Call APIs. Send messages. Update databases. Execute commands.
Action tools are the most dangerous category because they have side effects. A perception mistake means the agent had wrong information. An action mistake means the agent did something wrong in the real world.
Design action tools with reversibility in mind wherever possible. "Create draft" before "send." "Preview changes" before "apply." "Simulate" before "execute." Irreversible actions should require explicit confirmation or have additional safeguards.
An agent without memory is an agent that starts from zero every session. For simple tasks, that is fine. For anything complex or ongoing, it is fatal.
Memory comes in four forms, and you will likely need all four at different timescales:
| Memory Type | Duration | Storage | Use Case |
|---|---|---|---|
| Working memory | Single session | Context window | Current task state |
| Episodic memory | Weeks to months | Vector database | Past interactions and outcomes |
| Semantic memory | Persistent | Vector + structured DB | Domain knowledge |
| Procedural memory | Persistent | Prompt / fine-tuning | How to do things |
For a deep dive on getting memory right, read the dedicated article on agent memory systems.
Every agent runs a loop. The implementation varies but the structure is consistent.
interface AgentState {
goal: string;
context: Message[];
tools: Tool[];
maxIterations: number;
currentIteration: number;
}
async function runAgentLoop(state: AgentState): Promise<AgentResult> {
while (state.currentIteration < state.maxIterations) {
// 1. Reason: what should I do next?
const decision = await llm.complete({
messages: state.context,
tools: state.tools,
system: buildSystemPrompt(state.goal),
});
// 2. Check for completion
if (decision.stopReason === "end_turn" && !decision.toolUse) {
return { success: true, result: decision.content };
}
// 3. Execute tool calls
if (decision.toolUse) {
const toolResults = await executeTools(decision.toolUse, state.tools);
// 4. Update context with results
state.context.push(
{ role: "assistant", content: decision.content },
{ role: "user", content: toolResults }
);
}
state.currentIteration++;
}
// Max iterations hit
return { success: false, reason: "max_iterations_exceeded" };
}This is the skeleton. Production agents add error handling, retry logic, checkpointing, cost tracking, and observability. But the loop structure stays the same.
The max iterations limit is not optional. An agent without an iteration ceiling can run indefinitely, burning compute and money on a task it cannot complete. Set limits. Handle them gracefully.
Tools are the agent's interface to the world. Every tool has three components: the implementation, the schema, and the description.
The implementation is where you write the actual code. The schema defines the input parameters. The description tells the agent when and how to use it. As with MCP, the description is the most important part.
const searchCodebase: Tool = {
name: "search_codebase",
description:
"Search the codebase for files, functions, or code patterns using ripgrep syntax. " +
"Use this to find where something is defined or used before making changes. " +
"Returns file paths and line numbers of matches. " +
"Limit searches to specific directories when possible for performance.",
inputSchema: {
type: "object",
properties: {
pattern: {
type: "string",
description: "Search pattern (ripgrep syntax supported)",
},
directory: {
type: "string",
description:
"Directory to search in. Defaults to project root. Use to narrow scope.",
},
fileType: {
type: "string",
description: "File extension filter (e.g., 'ts', 'py'). Optional.",
},
},
required: ["pattern"],
},
execute: async ({ pattern, directory, fileType }) => {
const args = ["--with-filename", "--line-number", pattern];
if (directory) args.push(directory);
if (fileType) args.unshift("-t", fileType);
const result = await exec(`rg ${args.join(" ")}`);
return result.stdout || "No matches found";
},
};When designing your tool set, resist the temptation to build a Swiss Army knife. Focused tools with clear purposes work better than general-purpose tools that try to do everything. The agent reasons about when to use each tool based on the descriptions. Overlapping descriptions lead to confused decisions.
The system prompt is where most agent builders leave money on the table. A weak system prompt produces an inconsistent agent. A strong system prompt produces predictable, reliable behavior.
A production-quality agent system prompt has five sections:
Identity and goal. Who is the agent, what is it trying to accomplish, and what does success look like. Be specific. "You are a helpful assistant" is useless. "You are a code review agent that analyzes pull requests for security vulnerabilities, performance issues, and adherence to our coding standards" gives the agent a clear mandate.
Available tools. Even though tools are passed separately, explicitly mentioning the key tools and when to use them in the system prompt dramatically improves decision quality. The agent has two sources of tool guidance: the tool descriptions and the system prompt. Use both.
Decision framework. How should the agent reason through ambiguous situations? When should it ask for clarification versus proceed? What are the rules for taking irreversible actions?
Error handling. What should the agent do when tools fail? When information is missing? When the task seems impossible? Explicit guidance here prevents the agent from giving up too early or pushing through situations where it should stop.
Output format. What should the final response look like? Structured JSON? Markdown? Prose? If the agent produces outputs that go into downstream systems, format consistency is critical.
const systemPrompt = `You are a security code review agent.
YOUR GOAL: Analyze code changes for security vulnerabilities and produce a structured review.
AVAILABLE TOOLS:
- search_codebase: Find related code before making judgments
- read_file: Read full file contents when search shows partial matches
- check_dependencies: Verify that dependencies have no known CVEs
DECISION FRAMEWORK:
- Always search for context before making security judgments
- If you find a potential vulnerability, verify it is actually exploitable before flagging
- Mark as CRITICAL only if exploitation would have significant impact
- When uncertain, flag as WARNING with detailed explanation
OUTPUT FORMAT:
Return a JSON object with:
- summary: one-sentence overall assessment
- findings: array of {severity, location, description, recommendation}
- passed: boolean
`;Agents that cannot complete a task will keep trying. Without an iteration limit and a timeout, they run forever. Always define what "done" looks like and what to do when the iteration ceiling is reached.
An action tool that accepts free-form SQL strings gives the agent too much power. Constrain inputs to safe patterns. Use enums for categorical choices. Validate inputs before execution. Principle of least privilege applies to tools.
When a tool fails, return enough information for the agent to understand what went wrong and what to try next. "Error: connection failed" is useless. "Error: database connection failed after 3 retries. Check that the DB_HOST environment variable is set correctly." gives the agent something to work with.
A long-running agent accumulates context. Eventually it hits the context window limit and starts forgetting earlier information or failing entirely. Implement context pruning: summarize old tool results, discard low-value exchanges, maintain only what is relevant to the current step.
A task that takes twenty minutes and crashes at minute nineteen should not start over from scratch. Implement checkpointing. Serialize the agent state to persistent storage at meaningful milestones. Resume from the last checkpoint on failure.
Testing non-deterministic systems requires a different approach than traditional unit testing. You cannot assert exact outputs. You can assert properties of outputs.
For each tool call, assert:
For overall task completion, assert:
Build an evaluation harness that runs a set of canonical tasks and measures success rate, step count, and cost. Run it before every deployment. The agent evaluation frameworks article covers this in depth.
I get asked this constantly. My honest answer: start from scratch on your first agent, then adopt a framework.
Building from scratch forces you to understand what an agent actually is. You cannot hide behind framework abstractions. You feel every piece of complexity. That understanding makes you dramatically more effective when you move to a framework.
For production systems, frameworks like LangGraph offer battle-tested orchestration, built-in observability, and ecosystem integrations that would take months to build yourself. The CrewAI vs AutoGen vs LangGraph comparison covers when each framework shines.
But know what the framework is doing for you. Black boxes break in unpredictable ways. Understanding the underlying agent loop means you can debug framework issues when they arise.
Q: How do you build a custom AI agent from scratch?
Building a custom AI agent involves five core components: a planning module that decomposes tasks, a code/action generation engine, a testing and validation layer, an error recovery system, and a memory module for context. Start with a well-defined loop (plan, execute, validate, fix, repeat) and add capabilities incrementally.
Q: What tools and frameworks are needed to build AI agents?
Essential tools are a large language model (Claude, GPT-4), a tool-calling interface (MCP or function calling), state management for agent memory, testing infrastructure, and guardrails for safety. Frameworks like LangGraph, CrewAI, or the Anthropic Agent SDK can accelerate development.
Q: How do you make AI agents reliable in production?
Reliability comes from layered safeguards: TypeScript strict mode for compile-time checks, automated tests for every action, build verification after each change, human review for critical decisions, retry logic with exponential backoff, and comprehensive logging.
Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise. Gareth built Agentik {OS} to prove that one person with the right AI system can outperform an entire traditional development team. He has personally architected and shipped 7+ production applications using AI-first workflows.

MCP Protocol Deep Dive: Why It Changes Everything
MCP is to AI agents what HTTP was to browsers. One standard interface that means build once, works everywhere. Here's the real technical breakdown.

Agent Memory Systems: Building AI That Actually Remembers
Your brilliant AI agent forgets everything between sessions. Here's how to build memory systems that make agents genuinely useful over time.

Tool Use Patterns for AI Agents: What Actually Works
An agent without tools is a chatbot with delusions. The tool matters less than how you describe it. Here are the patterns that work.
Stop reading about AI and start building with it. Book a free discovery call and see how AI agents can accelerate your business.