Technical Deep DivesJanuary 30, 202619 min read

Vector Databases: A Practical Guide for Developers

Founder & CEO, Agentik{OS}

Vector databases search by meaning, not keywords. Here's how embeddings work, which indexing strategy to pick, and when to use Pinecone vs Chroma vs pgvector.

Vector Databases: A Practical Guide for Developers

Traditional databases answer the question "does this row contain this exact value?" Vector databases answer a completely different question: "what data is most similar to this query?"

That shift seems small until you realize how many problems become solvable. Search that understands synonyms. Recommendations based on conceptual similarity, not collaborative filtering. Document retrieval that finds relevant content even when none of the keywords match. Anomaly detection based on whether a data point is semantically unusual.

Vector databases are the infrastructure layer behind all of it. Here is how they actually work and how to pick the right one.

The Problem They Solve

Keyword search is brittle. A user searching for "heart attack" does not find documents about "myocardial infarction" unless you explicitly build that synonym mapping. A user searching for "cheap hotels" does not find listings described as "budget-friendly accommodation" unless your search index handles those relationships.

These problems are old. The solutions have always been clumsy: synonym dictionaries maintained by hand, stemming algorithms that handle morphological variations but not semantic ones, relevance tuning that requires constant human intervention.

Embeddings changed the equation. When you convert text to a vector using a model trained on language, semantically similar content ends up close together in that high-dimensional space. "Heart attack" and "myocardial infarction" are neighbors. "Cheap hotels" and "budget accommodation" are neighbors.

Vector databases store those high-dimensional coordinates and, critically, find the nearest neighbors efficiently at scale. The entire retrieval problem becomes geometry.

Without a vector database, finding the most similar vector in a collection of a million items requires computing the distance to every single item. Brute force. Works at small scale. Collapses at production scale.

Vector databases solve this with approximate nearest neighbor (ANN) algorithms that find the closest vectors without checking every one. Faster by orders of magnitude. Slightly less precise, but the trade-off is almost always worth it.

How Vectors Actually Work

An embedding model converts any piece of content, text, image, audio, into a list of numbers. Typically 768 to 3072 numbers, depending on the model. This list is the vector.

The critical property: content that is semantically similar produces vectors that are mathematically close. "Closeness" is usually measured by cosine similarity (the angle between vectors) or Euclidean distance.

typescript

// Generating an embedding with OpenAI
const { OpenAI } = require('openai');
const client = new OpenAI();

async function getEmbedding(text: string): Promise<number[]> {
  const response = await client.embeddings.create({
    model: 'text-embedding-3-small', // 1536 dimensions
    input: text,
  });
  return response.data[0].embedding;
}

// Generating an embedding with Voyage AI (better for retrieval)
async function getVoyageEmbedding(text: string): Promise<number[]> {
  const response = await fetch('https://api.voyageai.com/v1/embeddings', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.VOYAGE_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'voyage-3',
      input: text,
    }),
  });
  const data = await response.json();
  return data.data[0].embedding;
}

// Cosine similarity calculation
function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dotProduct / (magnitudeA * magnitudeB);
  // Returns 1.0 for identical, 0.0 for unrelated, -1.0 for opposite
}

The choice of embedding model matters significantly. Models trained on different data have different notions of "similarity." A model trained on code treats variable naming conventions as meaningful. A model trained on biomedical literature treats clinical terminology precisely. Use a model appropriate for your domain.

Indexing Strategies: The Real Performance Difference

How a vector database organizes its index determines query speed, memory usage, and accuracy. Understanding the main approaches helps you make informed configuration choices.

HNSW (Hierarchical Navigable Small World)

HNSW builds a multi-layer graph. Upper layers are coarse, with long-range connections across the vector space. Lower layers are dense, with connections to nearby neighbors.

Search starts at the top, finds the approximate region of interest, then drills down through finer layers to pinpoint the nearest neighbors. Extremely fast. High recall (finds the true nearest neighbors most of the time). High memory usage because the graph structure must be in RAM.

HNSW is the right default for most applications where query speed is critical and you can afford the memory.

IVF (Inverted File Index)

IVF clusters vectors into groups using k-means clustering. A query first identifies the most relevant clusters, then searches within those clusters.

Less memory than HNSW. Slightly lower recall for the same query speed. Requires a training step on your data before the index is useful. Better for very large collections where memory is the constraint.

Product Quantization (PQ)

PQ compresses vectors by quantizing each dimension group. Massive memory reduction at the cost of some recall accuracy. Used as a compression layer on top of IVF (IVF-PQ) for very large-scale deployments.

Index Type	Speed	Memory	Recall	Best For
HNSW	Fast	High	High	Default, latency-sensitive
IVF	Medium	Medium	Medium	Large collections
IVF-PQ	Medium	Low	Lower	Billions of vectors
Brute Force	Slow	Low	Perfect	Small collections, testing

Pinecone vs Chroma vs pgvector: The Real Comparison

Three options dominate developer usage. Each has a distinct philosophy.

Pinecone: Managed and Scalable

Pinecone is a fully managed vector database service. No infrastructure to operate. No index tuning. Scales automatically. Global replication available.

The trade-off is cost and vendor dependency. Pinecone is expensive relative to self-hosted alternatives. At serious scale, the monthly bill is meaningful.

Best for: teams without infrastructure expertise, applications requiring very high availability, use cases where operational simplicity is worth paying for.

Pinecone's metadata filtering is excellent. You can combine vector similarity with exact metadata filters in a single query: "find the 10 most similar documents where category='legal' and date>2024-01-01". This hybrid filtering is production-critical for most real applications.

typescript

import { Pinecone } from '@pinecone-database/pinecone';

const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
const index = pinecone.index('my-index');

// Upsert vectors with metadata
await index.upsert([
  {
    id: 'doc-001',
    values: embedding, // number[]
    metadata: {
      title: 'Document Title',
      category: 'legal',
      date: '2025-06-15',
      source: 'internal',
    },
  },
]);

// Query with metadata filter
const results = await index.query({
  vector: queryEmbedding,
  topK: 10,
  includeMetadata: true,
  filter: {
    category: { $eq: 'legal' },
    date: { $gte: '2024-01-01' },
  },
});

Chroma: Local-First for Development

Chroma is an open-source vector database designed for the developer experience. Runs locally with no setup. Excellent Python and TypeScript SDKs. Persistent storage or in-memory. Ships with its own embedding functions.

Perfect for development, prototyping, and small-scale production. The self-hosted deployment story has improved significantly but Chroma is not designed for the same scale as Pinecone.

Best for: rapid prototyping, local development, small to medium production deployments, RAG systems where the collection fits comfortably on one machine.

typescript

import { ChromaClient } from 'chromadb';

const client = new ChromaClient(); // Connects to local Chroma instance

const collection = await client.createCollection({
  name: 'my-documents',
  metadata: { 'hnsw:space': 'cosine' }, // Distance metric
});

// Add documents (Chroma can embed them for you)
await collection.add({
  ids: ['doc-001', 'doc-002'],
  documents: ['Full text of document 1', 'Full text of document 2'],
  metadatas: [{ source: 'web' }, { source: 'internal' }],
});

// Query by natural language
const results = await collection.query({
  queryTexts: ['find me documents about contract law'],
  nResults: 5,
  where: { source: 'internal' }, // Metadata filter
});

pgvector: Vectors Inside PostgreSQL

pgvector is a PostgreSQL extension that adds a vector column type and ANN index support. If you are already running PostgreSQL, pgvector lets you store vectors alongside your relational data without adding a new system to your stack.

The query syntax is SQL. The operational model is your existing Postgres setup. Joins between vector similarity results and relational data happen natively.

Best for: applications already on PostgreSQL, use cases requiring joins between vector and relational data, teams that want to minimize infrastructure complexity.

sql

-- Enable the extension
CREATE EXTENSION vector;

-- Create a table with a vector column
CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  content TEXT,
  category TEXT,
  embedding vector(1536),  -- OpenAI text-embedding-3-small dimension
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create HNSW index for fast ANN search
CREATE INDEX ON documents 
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Semantic search query
SELECT 
  id,
  content,
  category,
  1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE category = 'legal'
ORDER BY embedding <=> $1::vector
LIMIT 10;

The <=> operator computes cosine distance. <-> computes Euclidean distance. <#> computes negative inner product.

pgvector's limitation is scale. At tens of millions of vectors, performance degrades compared to dedicated vector databases. For most applications, this is not a problem. But if you are building a service where the vector collection will grow without bound, plan the migration to a dedicated solution early.

Production Architecture Patterns

Pattern 1: Basic RAG (Retrieval Augmented Generation)

The most common production use case:

typescript

async function ragQuery(userQuestion: string): Promise<string> {
  // 1. Embed the query
  const queryEmbedding = await getEmbedding(userQuestion);
  
  // 2. Retrieve relevant documents
  const relevant = await vectorDB.query({
    vector: queryEmbedding,
    topK: 5,
    includeMetadata: true,
  });
  
  // 3. Build context from results
  const context = relevant.matches
    .map(match => match.metadata?.content)
    .join('\n\n---\n\n');
  
  // 4. Generate answer with context
  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    messages: [{
      role: 'user',
      content: `Context:\n${context}\n\nQuestion: ${userQuestion}\n\nAnswer based only on the context provided.`,
    }],
  });
  
  return response.content[0].type === 'text' 
    ? response.content[0].text 
    : '';
}

Pattern 2: Hybrid Search

Vector similarity alone sometimes misses exact matches that keyword search catches. Hybrid search combines both:

typescript

async function hybridSearch(query: string, limit: number = 10) {
  // Run both searches in parallel
  const [vectorResults, keywordResults] = await Promise.all([
    vectorDB.query({ vector: await getEmbedding(query), topK: limit * 2 }),
    keywordDB.search({ query, limit: limit * 2 }),
  ]);
  
  // Reciprocal Rank Fusion to combine rankings
  const scores = new Map<string, number>();
  const k = 60; // RRF constant
  
  vectorResults.matches.forEach((result, rank) => {
    const score = scores.get(result.id) || 0;
    scores.set(result.id, score + 1 / (k + rank + 1));
  });
  
  keywordResults.forEach((result, rank) => {
    const score = scores.get(result.id) || 0;
    scores.set(result.id, score + 1 / (k + rank + 1));
  });
  
  // Sort by combined score and return top results
  return Array.from(scores.entries())
    .sort(([, a], [, b]) => b - a)
    .slice(0, limit)
    .map(([id]) => id);
}

Hybrid search consistently outperforms pure vector or pure keyword search for most real-world retrieval tasks. The implementation complexity is justified by the quality improvement.

Common Mistakes and How to Avoid Them

Embedding everything without chunking. Long documents produce embeddings that average over many topics. A ten-thousand-word research paper produces one vector. That vector is vaguely about everything and precisely about nothing. Chunk documents into sections (typically 200-500 tokens with overlap) before embedding.

Ignoring metadata filtering. Similarity alone is not enough. A query about "current tax regulations" should not retrieve a document from 2019. Build metadata filters into your query logic from the start.

Using the wrong embedding model for the domain. General-purpose embedding models work reasonably well everywhere. Domain-specific models work significantly better in their domain. For code, use a code-specific embedding model. For medical content, consider a biomedical model.

Not monitoring embedding quality over time. As your content collection evolves, what counts as "similar" may shift. Evaluate retrieval quality regularly, not just at launch.

Skipping re-ranking. ANN search returns approximate results. A re-ranking step using a more expensive model to score the top candidates against the query significantly improves precision without the latency of searching every vector.

The Right Choice for Your Use Case

Start with pgvector if you are on PostgreSQL and your collection will stay under a few million vectors. Zero new infrastructure.

Start with Chroma if you are prototyping or building a small production system and want simplicity. Migrate later if needed.

Start with Pinecone if you need managed infrastructure with guaranteed availability and your budget accommodates it.

All three are production-ready. The choice is mostly about scale, cost, and operational preferences.

For the full picture on how embeddings work before you store them, read embeddings explained. For implementing production semantic search on top of your vector database, see semantic search implementation.

FAQ

Q: What is a vector database?

A vector database stores and searches data as high-dimensional numerical vectors (embeddings). Instead of matching exact keywords, vector databases find semantically similar content — understanding that 'car' and 'automobile' are related. This enables AI applications like semantic search, recommendation systems, and RAG.

Q: When should you use a vector database?

Use vector databases when you need semantic search (finding related content by meaning), recommendation systems, RAG for AI applications, image or audio similarity search, or anomaly detection. Do not use them for exact lookups, transactional data, or simple CRUD operations where relational databases excel.

Q: Which vector database should developers choose?

Choose Pinecone for fully managed simplicity and production scale, Chroma for development and prototyping, Weaviate for hybrid search (combining vector and keyword), pgvector for existing PostgreSQL setups, and Qdrant for high-performance self-hosted needs.

Sources

The Problem They Solve

Vector databases store those high-dimensional coordinates and, critically, find the nearest neighbors efficiently at scale. The entire retrieval problem becomes geometry.

How Vectors Actually Work

An embedding model converts any piece of content, text, image, audio, into a list of numbers. Typically 768 to 3072 numbers, depending on the model. This list is the vector.

typescript

// Generating an embedding with OpenAI
const { OpenAI } = require('openai');
const client = new OpenAI();

async function getEmbedding(text: string): Promise<number[]> {
  const response = await client.embeddings.create({
    model: 'text-embedding-3-small', // 1536 dimensions
    input: text,
  });
  return response.data[0].embedding;
}

// Generating an embedding with Voyage AI (better for retrieval)
async function getVoyageEmbedding(text: string): Promise<number[]> {
  const response = await fetch('https://api.voyageai.com/v1/embeddings', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.VOYAGE_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'voyage-3',
      input: text,
    }),
  });
  const data = await response.json();
  return data.data[0].embedding;
}

// Cosine similarity calculation
function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dotProduct / (magnitudeA * magnitudeB);
  // Returns 1.0 for identical, 0.0 for unrelated, -1.0 for opposite
}

Indexing Strategies: The Real Performance Difference

How a vector database organizes its index determines query speed, memory usage, and accuracy. Understanding the main approaches helps you make informed configuration choices.

HNSW (Hierarchical Navigable Small World)

HNSW builds a multi-layer graph. Upper layers are coarse, with long-range connections across the vector space. Lower layers are dense, with connections to nearby neighbors.

HNSW is the right default for most applications where query speed is critical and you can afford the memory.

IVF (Inverted File Index)

IVF clusters vectors into groups using k-means clustering. A query first identifies the most relevant clusters, then searches within those clusters.

Product Quantization (PQ)

Index Type	Speed	Memory	Recall	Best For
HNSW	Fast	High	High	Default, latency-sensitive
IVF	Medium	Medium	Medium	Large collections
IVF-PQ	Medium	Low	Lower	Billions of vectors
Brute Force	Slow	Low	Perfect	Small collections, testing

Pinecone vs Chroma vs pgvector: The Real Comparison

Three options dominate developer usage. Each has a distinct philosophy.

Pinecone: Managed and Scalable

Pinecone is a fully managed vector database service. No infrastructure to operate. No index tuning. Scales automatically. Global replication available.

The trade-off is cost and vendor dependency. Pinecone is expensive relative to self-hosted alternatives. At serious scale, the monthly bill is meaningful.

Best for: teams without infrastructure expertise, applications requiring very high availability, use cases where operational simplicity is worth paying for.

typescript

import { Pinecone } from '@pinecone-database/pinecone';

const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
const index = pinecone.index('my-index');

// Upsert vectors with metadata
await index.upsert([
  {
    id: 'doc-001',
    values: embedding, // number[]
    metadata: {
      title: 'Document Title',
      category: 'legal',
      date: '2025-06-15',
      source: 'internal',
    },
  },
]);

// Query with metadata filter
const results = await index.query({
  vector: queryEmbedding,
  topK: 10,
  includeMetadata: true,
  filter: {
    category: { $eq: 'legal' },
    date: { $gte: '2024-01-01' },
  },
});

Chroma: Local-First for Development

Perfect for development, prototyping, and small-scale production. The self-hosted deployment story has improved significantly but Chroma is not designed for the same scale as Pinecone.

Best for: rapid prototyping, local development, small to medium production deployments, RAG systems where the collection fits comfortably on one machine.

typescript

import { ChromaClient } from 'chromadb';

const client = new ChromaClient(); // Connects to local Chroma instance

const collection = await client.createCollection({
  name: 'my-documents',
  metadata: { 'hnsw:space': 'cosine' }, // Distance metric
});

// Add documents (Chroma can embed them for you)
await collection.add({
  ids: ['doc-001', 'doc-002'],
  documents: ['Full text of document 1', 'Full text of document 2'],
  metadatas: [{ source: 'web' }, { source: 'internal' }],
});

// Query by natural language
const results = await collection.query({
  queryTexts: ['find me documents about contract law'],
  nResults: 5,
  where: { source: 'internal' }, // Metadata filter
});

pgvector: Vectors Inside PostgreSQL

The query syntax is SQL. The operational model is your existing Postgres setup. Joins between vector similarity results and relational data happen natively.

Best for: applications already on PostgreSQL, use cases requiring joins between vector and relational data, teams that want to minimize infrastructure complexity.

sql

-- Enable the extension
CREATE EXTENSION vector;

-- Create a table with a vector column
CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  content TEXT,
  category TEXT,
  embedding vector(1536),  -- OpenAI text-embedding-3-small dimension
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create HNSW index for fast ANN search
CREATE INDEX ON documents 
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Semantic search query
SELECT 
  id,
  content,
  category,
  1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE category = 'legal'
ORDER BY embedding <=> $1::vector
LIMIT 10;

The <=> operator computes cosine distance. <-> computes Euclidean distance. <#> computes negative inner product.

Production Architecture Patterns

Pattern 1: Basic RAG (Retrieval Augmented Generation)

The most common production use case:

typescript

async function ragQuery(userQuestion: string): Promise<string> {
  // 1. Embed the query
  const queryEmbedding = await getEmbedding(userQuestion);
  
  // 2. Retrieve relevant documents
  const relevant = await vectorDB.query({
    vector: queryEmbedding,
    topK: 5,
    includeMetadata: true,
  });
  
  // 3. Build context from results
  const context = relevant.matches
    .map(match => match.metadata?.content)
    .join('\n\n---\n\n');
  
  // 4. Generate answer with context
  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    messages: [{
      role: 'user',
      content: `Context:\n${context}\n\nQuestion: ${userQuestion}\n\nAnswer based only on the context provided.`,
    }],
  });
  
  return response.content[0].type === 'text' 
    ? response.content[0].text 
    : '';
}

Pattern 2: Hybrid Search

Vector similarity alone sometimes misses exact matches that keyword search catches. Hybrid search combines both:

typescript

async function hybridSearch(query: string, limit: number = 10) {
  // Run both searches in parallel
  const [vectorResults, keywordResults] = await Promise.all([
    vectorDB.query({ vector: await getEmbedding(query), topK: limit * 2 }),
    keywordDB.search({ query, limit: limit * 2 }),
  ]);
  
  // Reciprocal Rank Fusion to combine rankings
  const scores = new Map<string, number>();
  const k = 60; // RRF constant
  
  vectorResults.matches.forEach((result, rank) => {
    const score = scores.get(result.id) || 0;
    scores.set(result.id, score + 1 / (k + rank + 1));
  });
  
  keywordResults.forEach((result, rank) => {
    const score = scores.get(result.id) || 0;
    scores.set(result.id, score + 1 / (k + rank + 1));
  });
  
  // Sort by combined score and return top results
  return Array.from(scores.entries())
    .sort(([, a], [, b]) => b - a)
    .slice(0, limit)
    .map(([id]) => id);
}

Hybrid search consistently outperforms pure vector or pure keyword search for most real-world retrieval tasks. The implementation complexity is justified by the quality improvement.

Common Mistakes and How to Avoid Them

Not monitoring embedding quality over time. As your content collection evolves, what counts as "similar" may shift. Evaluate retrieval quality regularly, not just at launch.

The Right Choice for Your Use Case

Start with pgvector if you are on PostgreSQL and your collection will stay under a few million vectors. Zero new infrastructure.

Start with Chroma if you are prototyping or building a small production system and want simplicity. Migrate later if needed.

Start with Pinecone if you need managed infrastructure with guaranteed availability and your budget accommodates it.

All three are production-ready. The choice is mostly about scale, cost, and operational preferences.

FAQ

Q: What is a vector database?

Q: When should you use a vector database?

Q: Which vector database should developers choose?

Vector Databases: A Practical Guide for Developers

The Problem They Solve

How Vectors Actually Work

Indexing Strategies: The Real Performance Difference

HNSW (Hierarchical Navigable Small World)

IVF (Inverted File Index)

Product Quantization (PQ)

Pinecone vs Chroma vs pgvector: The Real Comparison

Pinecone: Managed and Scalable

Chroma: Local-First for Development

pgvector: Vectors Inside PostgreSQL

Production Architecture Patterns

Pattern 1: Basic RAG (Retrieval Augmented Generation)

Pattern 2: Hybrid Search

Common Mistakes and How to Avoid Them

The Right Choice for Your Use Case

FAQ

Sources

Further Reading

Related Articles

Want to Implement This?

Vector Databases: A Practical Guide for Developers

The Problem They Solve

How Vectors Actually Work

Indexing Strategies: The Real Performance Difference

HNSW (Hierarchical Navigable Small World)

IVF (Inverted File Index)

Product Quantization (PQ)

Pinecone vs Chroma vs pgvector: The Real Comparison

Pinecone: Managed and Scalable

Chroma: Local-First for Development

pgvector: Vectors Inside PostgreSQL

Production Architecture Patterns

Pattern 1: Basic RAG (Retrieval Augmented Generation)

Pattern 2: Hybrid Search

Common Mistakes and How to Avoid Them

The Right Choice for Your Use Case

FAQ

Sources

Further Reading

Related Articles

Want to Implement This?