Loading...
Loading...
Weekly AI insights —
Real strategies, no fluff. Unsubscribe anytime.
Written by Gareth Simono, Founder and CEO of Agentik {OS}. Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise platforms. Gareth orchestrates 267 specialized AI agents to deliver production software 10x faster than traditional development teams.
Founder & CEO, Agentik {OS}
Vector databases search by meaning, not keywords. Here's how embeddings work, which indexing strategy to pick, and when to use Pinecone vs Chroma vs pgvector.

Traditional databases answer the question "does this row contain this exact value?" Vector databases answer a completely different question: "what data is most similar to this query?"
That shift seems small until you realize how many problems become solvable. Search that understands synonyms. Recommendations based on conceptual similarity, not collaborative filtering. Document retrieval that finds relevant content even when none of the keywords match. Anomaly detection based on whether a data point is semantically unusual.
Vector databases are the infrastructure layer behind all of it. Here is how they actually work and how to pick the right one.
Keyword search is brittle. A user searching for "heart attack" does not find documents about "myocardial infarction" unless you explicitly build that synonym mapping. A user searching for "cheap hotels" does not find listings described as "budget-friendly accommodation" unless your search index handles those relationships.
These problems are old. The solutions have always been clumsy: synonym dictionaries maintained by hand, stemming algorithms that handle morphological variations but not semantic ones, relevance tuning that requires constant human intervention.
Embeddings changed the equation. When you convert text to a vector using a model trained on language, semantically similar content ends up close together in that high-dimensional space. "Heart attack" and "myocardial infarction" are neighbors. "Cheap hotels" and "budget accommodation" are neighbors.
Vector databases store those high-dimensional coordinates and, critically, find the nearest neighbors efficiently at scale. The entire retrieval problem becomes geometry.
Without a vector database, finding the most similar vector in a collection of a million items requires computing the distance to every single item. Brute force. Works at small scale. Collapses at production scale.
Vector databases solve this with approximate nearest neighbor (ANN) algorithms that find the closest vectors without checking every one. Faster by orders of magnitude. Slightly less precise, but the trade-off is almost always worth it.
An embedding model converts any piece of content, text, image, audio, into a list of numbers. Typically 768 to 3072 numbers, depending on the model. This list is the vector.
The critical property: content that is semantically similar produces vectors that are mathematically close. "Closeness" is usually measured by cosine similarity (the angle between vectors) or Euclidean distance.
// Generating an embedding with OpenAI
const { OpenAI } = require('openai');
const client = new OpenAI();
async function getEmbedding(text: string): Promise<number[]> {
const response = await client.embeddings.create({
model: 'text-embedding-3-small', // 1536 dimensions
input: text,
});
return response.data[0].embedding;
}
// Generating an embedding with Voyage AI (better for retrieval)
async function getVoyageEmbedding(text: string): Promise<number[]> {
const response = await fetch('https://api.voyageai.com/v1/embeddings', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.VOYAGE_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'voyage-3',
input: text,
}),
});
const data = await response.json();
return data.data[0].embedding;
}
// Cosine similarity calculation
function cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
return dotProduct / (magnitudeA * magnitudeB);
// Returns 1.0 for identical, 0.0 for unrelated, -1.0 for opposite
}The choice of embedding model matters significantly. Models trained on different data have different notions of "similarity." A model trained on code treats variable naming conventions as meaningful. A model trained on biomedical literature treats clinical terminology precisely. Use a model appropriate for your domain.
How a vector database organizes its index determines query speed, memory usage, and accuracy. Understanding the main approaches helps you make informed configuration choices.
HNSW builds a multi-layer graph. Upper layers are coarse, with long-range connections across the vector space. Lower layers are dense, with connections to nearby neighbors.
Search starts at the top, finds the approximate region of interest, then drills down through finer layers to pinpoint the nearest neighbors. Extremely fast. High recall (finds the true nearest neighbors most of the time). High memory usage because the graph structure must be in RAM.
HNSW is the right default for most applications where query speed is critical and you can afford the memory.
IVF clusters vectors into groups using k-means clustering. A query first identifies the most relevant clusters, then searches within those clusters.
Less memory than HNSW. Slightly lower recall for the same query speed. Requires a training step on your data before the index is useful. Better for very large collections where memory is the constraint.
PQ compresses vectors by quantizing each dimension group. Massive memory reduction at the cost of some recall accuracy. Used as a compression layer on top of IVF (IVF-PQ) for very large-scale deployments.
| Index Type | Speed | Memory | Recall | Best For |
|---|---|---|---|---|
| HNSW | Fast | High | High | Default, latency-sensitive |
| IVF | Medium | Medium | Medium | Large collections |
| IVF-PQ | Medium | Low | Lower | Billions of vectors |
| Brute Force | Slow | Low | Perfect | Small collections, testing |
Three options dominate developer usage. Each has a distinct philosophy.
Pinecone is a fully managed vector database service. No infrastructure to operate. No index tuning. Scales automatically. Global replication available.
The trade-off is cost and vendor dependency. Pinecone is expensive relative to self-hosted alternatives. At serious scale, the monthly bill is meaningful.
Best for: teams without infrastructure expertise, applications requiring very high availability, use cases where operational simplicity is worth paying for.
Pinecone's metadata filtering is excellent. You can combine vector similarity with exact metadata filters in a single query: "find the 10 most similar documents where category='legal' and date>2024-01-01". This hybrid filtering is production-critical for most real applications.
import { Pinecone } from '@pinecone-database/pinecone';
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
const index = pinecone.index('my-index');
// Upsert vectors with metadata
await index.upsert([
{
id: 'doc-001',
values: embedding, // number[]
metadata: {
title: 'Document Title',
category: 'legal',
date: '2025-06-15',
source: 'internal',
},
},
]);
// Query with metadata filter
const results = await index.query({
vector: queryEmbedding,
topK: 10,
includeMetadata: true,
filter: {
category: { $eq: 'legal' },
date: { $gte: '2024-01-01' },
},
});Chroma is an open-source vector database designed for the developer experience. Runs locally with no setup. Excellent Python and TypeScript SDKs. Persistent storage or in-memory. Ships with its own embedding functions.
Perfect for development, prototyping, and small-scale production. The self-hosted deployment story has improved significantly but Chroma is not designed for the same scale as Pinecone.
Best for: rapid prototyping, local development, small to medium production deployments, RAG systems where the collection fits comfortably on one machine.
import { ChromaClient } from 'chromadb';
const client = new ChromaClient(); // Connects to local Chroma instance
const collection = await client.createCollection({
name: 'my-documents',
metadata: { 'hnsw:space': 'cosine' }, // Distance metric
});
// Add documents (Chroma can embed them for you)
await collection.add({
ids: ['doc-001', 'doc-002'],
documents: ['Full text of document 1', 'Full text of document 2'],
metadatas: [{ source: 'web' }, { source: 'internal' }],
});
// Query by natural language
const results = await collection.query({
queryTexts: ['find me documents about contract law'],
nResults: 5,
where: { source: 'internal' }, // Metadata filter
});pgvector is a PostgreSQL extension that adds a vector column type and ANN index support. If you are already running PostgreSQL, pgvector lets you store vectors alongside your relational data without adding a new system to your stack.
The query syntax is SQL. The operational model is your existing Postgres setup. Joins between vector similarity results and relational data happen natively.
Best for: applications already on PostgreSQL, use cases requiring joins between vector and relational data, teams that want to minimize infrastructure complexity.
-- Enable the extension
CREATE EXTENSION vector;
-- Create a table with a vector column
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT,
category TEXT,
embedding vector(1536), -- OpenAI text-embedding-3-small dimension
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create HNSW index for fast ANN search
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Semantic search query
SELECT
id,
content,
category,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE category = 'legal'
ORDER BY embedding <=> $1::vector
LIMIT 10;The <=> operator computes cosine distance. <-> computes Euclidean distance. <#> computes negative inner product.
pgvector's limitation is scale. At tens of millions of vectors, performance degrades compared to dedicated vector databases. For most applications, this is not a problem. But if you are building a service where the vector collection will grow without bound, plan the migration to a dedicated solution early.
The most common production use case:
async function ragQuery(userQuestion: string): Promise<string> {
// 1. Embed the query
const queryEmbedding = await getEmbedding(userQuestion);
// 2. Retrieve relevant documents
const relevant = await vectorDB.query({
vector: queryEmbedding,
topK: 5,
includeMetadata: true,
});
// 3. Build context from results
const context = relevant.matches
.map(match => match.metadata?.content)
.join('\n\n---\n\n');
// 4. Generate answer with context
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{
role: 'user',
content: `Context:\n${context}\n\nQuestion: ${userQuestion}\n\nAnswer based only on the context provided.`,
}],
});
return response.content[0].type === 'text'
? response.content[0].text
: '';
}Vector similarity alone sometimes misses exact matches that keyword search catches. Hybrid search combines both:
async function hybridSearch(query: string, limit: number = 10) {
// Run both searches in parallel
const [vectorResults, keywordResults] = await Promise.all([
vectorDB.query({ vector: await getEmbedding(query), topK: limit * 2 }),
keywordDB.search({ query, limit: limit * 2 }),
]);
// Reciprocal Rank Fusion to combine rankings
const scores = new Map<string, number>();
const k = 60; // RRF constant
vectorResults.matches.forEach((result, rank) => {
const score = scores.get(result.id) || 0;
scores.set(result.id, score + 1 / (k + rank + 1));
});
keywordResults.forEach((result, rank) => {
const score = scores.get(result.id) || 0;
scores.set(result.id, score + 1 / (k + rank + 1));
});
// Sort by combined score and return top results
return Array.from(scores.entries())
.sort(([, a], [, b]) => b - a)
.slice(0, limit)
.map(([id]) => id);
}Hybrid search consistently outperforms pure vector or pure keyword search for most real-world retrieval tasks. The implementation complexity is justified by the quality improvement.
Embedding everything without chunking. Long documents produce embeddings that average over many topics. A ten-thousand-word research paper produces one vector. That vector is vaguely about everything and precisely about nothing. Chunk documents into sections (typically 200-500 tokens with overlap) before embedding.
Ignoring metadata filtering. Similarity alone is not enough. A query about "current tax regulations" should not retrieve a document from 2019. Build metadata filters into your query logic from the start.
Using the wrong embedding model for the domain. General-purpose embedding models work reasonably well everywhere. Domain-specific models work significantly better in their domain. For code, use a code-specific embedding model. For medical content, consider a biomedical model.
Not monitoring embedding quality over time. As your content collection evolves, what counts as "similar" may shift. Evaluate retrieval quality regularly, not just at launch.
Skipping re-ranking. ANN search returns approximate results. A re-ranking step using a more expensive model to score the top candidates against the query significantly improves precision without the latency of searching every vector.
Start with pgvector if you are on PostgreSQL and your collection will stay under a few million vectors. Zero new infrastructure.
Start with Chroma if you are prototyping or building a small production system and want simplicity. Migrate later if needed.
Start with Pinecone if you need managed infrastructure with guaranteed availability and your budget accommodates it.
All three are production-ready. The choice is mostly about scale, cost, and operational preferences.
For the full picture on how embeddings work before you store them, read embeddings explained. For implementing production semantic search on top of your vector database, see semantic search implementation.
Q: What is a vector database?
A vector database stores and searches data as high-dimensional numerical vectors (embeddings). Instead of matching exact keywords, vector databases find semantically similar content — understanding that 'car' and 'automobile' are related. This enables AI applications like semantic search, recommendation systems, and RAG.
Q: When should you use a vector database?
Use vector databases when you need semantic search (finding related content by meaning), recommendation systems, RAG for AI applications, image or audio similarity search, or anomaly detection. Do not use them for exact lookups, transactional data, or simple CRUD operations where relational databases excel.
Q: Which vector database should developers choose?
Choose Pinecone for fully managed simplicity and production scale, Chroma for development and prototyping, Weaviate for hybrid search (combining vector and keyword), pgvector for existing PostgreSQL setups, and Qdrant for high-performance self-hosted needs.
Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise. Gareth built Agentik {OS} to prove that one person with the right AI system can outperform an entire traditional development team. He has personally architected and shipped 7+ production applications using AI-first workflows.

Embeddings Explained: Coordinates for Meaning
Embeddings are just coordinates for meaning. Once you understand that, semantic search, recommendations, and classification all click into place.

Semantic Search in Production: From Embeddings to Results
Keyword search returns documents containing your words. Semantic search returns documents matching your intent. Here is how to build one for production.

RAG for AI Agents: Grounding Decisions in Real Data
Your agent confidently cites a policy updated six months ago. Not a hallucination problem. A knowledge problem. RAG fixes it. Here is how.
Stop reading about AI and start building with it. Book a free discovery call and see how AI agents can accelerate your business.