Loading...
Loading...
Weekly AI insights —
Real strategies, no fluff. Unsubscribe anytime.
Written by Gareth Simono, Founder and CEO of Agentik {OS}. Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise platforms. Gareth orchestrates 267 specialized AI agents to deliver production software 10x faster than traditional development teams.
Founder & CEO, Agentik {OS}
Fine-tuning changes behavior. RAG adds knowledge. Most teams choose wrong. Here's the decision framework that saves months of wasted work and thousands.

Teams waste months on this decision. They pick fine-tuning when they need RAG. They build RAG systems when they need fine-tuning. Then they wonder why the results are mediocre.
The confusion exists because both approaches improve AI output quality. But they improve different things. Mixing them up is like using a screwdriver to drive a nail. You can make it work, sort of. It was never the right tool.
Here is the framework I use to make this decision quickly and correctly.
Think of it this way:
RAG gives the model access to information it does not have. Your company's internal knowledge base. Your product documentation. Last week's market data. Anything the model was not trained on or needs to reference at query time.
Fine-tuning changes how the model behaves. Its writing style, its reasoning approach, its response format, its domain-specific vocabulary, the persona it adopts. It changes the model itself, not what information it can access.
RAG is about knowledge. Fine-tuning is about behavior. Almost every "should I fine-tune or use RAG?" question becomes straightforward once you internalize that distinction.
Most teams who think they need fine-tuning actually need better prompting plus RAG. Fine-tuning is powerful but expensive, slow, and inflexible. It should be the last resort after you have exhausted simpler options.
RAG (Retrieval Augmented Generation) is the right architecture when:
The model needs access to information that changes. Current product specs, recent news, updated policies, live data. The model's training data has a cutoff date. RAG bridges that gap without retraining.
The information volume exceeds the context window. You have ten thousand support documents. You cannot include them all in every prompt. RAG retrieves only what is relevant for each specific query.
You need source attribution. RAG lets you show users exactly which documents the answer came from. Fine-tuned knowledge is baked in and cannot be attributed to specific sources.
You need the ability to update knowledge without retraining. Add a new product line? Update the vector database. With fine-tuning, you would need to retrain or keep a separate RAG layer anyway.
The task is primarily information retrieval or question answering. Customer support, internal knowledge bases, document search, research assistance.
// Classic RAG architecture
async function ragResponse(
query: string,
knowledgeBase: VectorDB
): Promise<{ answer: string; sources: string[] }> {
// 1. Embed the query
const queryVector = await embed(query);
// 2. Retrieve relevant documents
const docs = await knowledgeBase.query({
vector: queryVector,
topK: 5,
includeMetadata: true,
});
// 3. Build context
const context = docs.matches
.map(d => `Source: ${d.metadata?.source}\n${d.metadata?.content}`)
.join('\n\n---\n\n');
const sources = docs.matches.map(d => d.metadata?.source as string);
// 4. Generate grounded response
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{
role: 'user',
content: `Answer the following question using ONLY the provided context. If the answer is not in the context, say so.\n\nContext:\n${context}\n\nQuestion: ${query}`,
}],
});
return {
answer: response.content[0].type === 'text' ? response.content[0].text : '',
sources,
};
}RAG requires retrieval to work. If retrieval fails (wrong query formulation, poor chunking strategy, irrelevant search results), the answer quality collapses.
RAG also cannot teach the model how to reason differently. If the model is bad at SQL generation, giving it a database schema via RAG does not make it better at SQL. It gives it the schema but not the skill.
Fine-tuning is appropriate when:
You need to change the model's core behavior. Response format, writing style, reasoning approach, persona. These cannot be reliably achieved through prompting alone at scale.
Domain-specific vocabulary is critical. Medical terminology, legal language, financial jargon, proprietary technical terms. A fine-tuned model internalizes these. A prompted model has to interpret them from context each time.
You have many examples of the exact behavior you want. Fine-tuning learns from examples. Without hundreds or thousands of good examples, you cannot fine-tune effectively.
You need to reduce system prompt length at scale. For very high-volume applications, baking behavior into the model via fine-tuning reduces the system prompt you need to send with every request, cutting costs.
You need to teach the model something that cannot be stated in a prompt. Subtle quality judgments. "Sound like our brand" is a prompt instruction that fine-tuning can make reliable where prompting alone struggles.
| Factor | Reality |
|---|---|
| Data requirement | 500-1000+ examples minimum; thousands preferred |
| Data quality | Garbage in, garbage out. Curating good examples is expensive labor. |
| Training time | Hours to days depending on dataset size and provider |
| Training cost | $10-$1,000+ depending on model size and data volume |
| Iteration speed | Slow. Each experiment takes hours and costs money. |
| Update speed | Updating knowledge requires retraining |
| Maintenance | Model versions require migration as base models update |
Fine-tuning is not a quick experiment. It is a long-term commitment to a custom model that needs maintenance.
When someone asks "fine-tune or RAG?" I walk through this sequence:
1. Can you solve the problem with prompt engineering alone? Better system prompts, few-shot examples, chain-of-thought instructions. Many teams discover the problem is a prompting issue, not a knowledge or behavior issue. Try this first. It costs almost nothing.
2. Does the problem require access to information the model does not have? If yes, start with RAG. Build a retrieval system and see how far it takes you.
3. Is the problem primarily behavioral? Consistent output format, specific writing style, domain jargon handling. If RAG does not address it, fine-tuning is worth considering.
4. Do you have high-quality training examples at scale? If you cannot produce five hundred or more examples of the exact behavior you want, fine-tuning will disappoint you.
5. Does the answer change frequently? Frequently updating knowledge is incompatible with fine-tuning. Use RAG.
| Problem Type | Best Solution |
|---|---|
| Accessing company-specific knowledge | RAG |
| Answering questions about current events | RAG |
| Searching a large document corpus | RAG |
| Consistent output format at scale | Fine-tuning (or structured output with prompting) |
| Domain-specific writing style | Fine-tuning |
| Medical/legal/technical vocabulary | Fine-tuning |
| Reducing prompt length at scale | Fine-tuning |
| Behavior that resists prompt engineering | Fine-tuning |
| Mixture of knowledge and behavior | RAG + Fine-tuning |
For complex production systems, the answer is often both.
Fine-tune the model for behavioral consistency: the right writing style, the right response format, the right handling of domain vocabulary. Then use RAG to provide dynamic knowledge at query time.
The fine-tuned model has internalized the "how." RAG provides the "what."
This is more expensive and complex to build and maintain. It is also the highest-quality combination for applications where both behavior and knowledge matter.
The customer support system for a specialized medical device company is a good example. Fine-tuning teaches it medical device terminology, regulatory language, and the exact communication style required for compliance. RAG provides the current product specifications, known issues, and support procedures that change with each product release.
Most teams reading this should:
Spend three days optimizing your system prompt. Genuinely optimize it. Add examples. Tighten the constraints. Test against a diverse input set.
If you need domain knowledge, build a basic RAG system with pgvector or Chroma. Get it working. Measure retrieval quality.
Evaluate the output quality honestly. Is it good enough? If yes, you are done.
Only if prompt optimization + RAG is genuinely insufficient for behavior-related reasons, consider fine-tuning. Start with OpenAI or Anthropic's fine-tuning APIs on a small model. Measure the improvement against the baseline.
Fine-tuning is seductive because it feels like the serious, "real" solution. It is not inherently more serious than RAG. It is just different. And for the majority of applications, RAG plus good prompting gets you 90% of the value at 20% of the cost and complexity.
Both approaches require systematic evaluation. The comparison needs a ground truth dataset.
interface EvalResult {
question: string;
expected_answer: string;
rag_answer: string;
finetuned_answer: string;
rag_score: number; // 1-5
finetuned_score: number;
human_preference?: 'rag' | 'finetuned' | 'tie';
}
async function compareApproaches(
testCases: Array<{ question: string; expected: string }>
): Promise<EvalResult[]> {
return Promise.all(
testCases.map(async ({ question, expected }) => {
const [ragAnswer, finetunedAnswer] = await Promise.all([
getRAGAnswer(question),
getFinetunedAnswer(question),
]);
const [ragScore, finetunedScore] = await Promise.all([
scoreAnswer(question, expected, ragAnswer),
scoreAnswer(question, expected, finetunedAnswer),
]);
return {
question,
expected_answer: expected,
rag_answer: ragAnswer,
finetuned_answer: finetunedAnswer,
rag_score: ragScore,
finetuned_score: finetunedScore,
};
})
);
}Do not make this decision based on vibes. Measure it. The approach that scores better on your specific test cases is the right approach, regardless of what seems more impressive.
Q: When should you use fine-tuning vs RAG?
Use RAG when you need to ground AI in specific, frequently changing data (documentation, knowledge bases, customer data). Use fine-tuning when you need to change the model's behavior, tone, or reasoning patterns for a specialized domain. RAG is cheaper, faster, and easier to update. Most production systems use RAG.
Q: What is the difference between fine-tuning and RAG?
Fine-tuning modifies the model's weights through additional training, permanently changing how it responds. RAG adds relevant external information to each prompt at runtime without changing the model. Fine-tuning is like teaching someone a new skill; RAG is like giving them a reference book to consult.
Q: Can you combine fine-tuning and RAG?
Yes, combining both approaches can be powerful: fine-tune for domain-specific tone and reasoning patterns, then use RAG for specific factual grounding. However, this adds complexity and cost. Most applications achieve excellent results with RAG alone, reserving fine-tuning for cases where the base model's behavior fundamentally needs to change.
Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise. Gareth built Agentik {OS} to prove that one person with the right AI system can outperform an entire traditional development team. He has personally architected and shipped 7+ production applications using AI-first workflows.

Vector Databases: A Practical Guide for Developers
Vector databases search by meaning, not keywords. Here's how embeddings work, which indexing strategy to pick, and when to use Pinecone vs Chroma vs pgvector.

Prompt Engineering: The Craft Behind Reliable AI Output
The difference between garbage and production-quality AI output is not magic. It is craft. System prompts, few-shot examples, chain-of-thought, structure.

RAG for AI Agents: Grounding Decisions in Real Data
Your agent confidently cites a policy updated six months ago. Not a hallucination problem. A knowledge problem. RAG fixes it. Here is how.
Stop reading about AI and start building with it. Book a free discovery call and see how AI agents can accelerate your business.