Loading...
Loading...
Weekly AI insights —
Real strategies, no fluff. Unsubscribe anytime.
Written by Gareth Simono, Founder and CEO of Agentik {OS}. Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise platforms. Gareth orchestrates 267 specialized AI agents to deliver production software 10x faster than traditional development teams.
Founder & CEO, Agentik {OS}
Embeddings are just coordinates for meaning. Once you understand that, semantic search, recommendations, and classification all click into place.

An embedding is a list of numbers. That is it. The magic is in which numbers.
When you pass a piece of text through an embedding model, it converts that text into a list of numbers, typically somewhere between 768 and 3072 numbers, in a way that captures meaning. Text with similar meaning produces similar lists of numbers. Text with different meaning produces different lists.
Once you understand that simple idea, everything that uses embeddings, search engines that understand synonyms, recommendation systems that find conceptually related content, classification systems that need no predefined rules, falls into place.
Imagine each piece of text has a location in a very high-dimensional space. Each dimension corresponds to some aspect of meaning. A document about cats is close to a document about kittens. Far from a document about interest rates.
This is not a metaphor. This is literally what embeddings are. Coordinates in a semantic space.
The practical consequence: to find "documents similar to this query," you just find the coordinates that are geometrically closest. A distance calculation. Fast. Scalable. Exact in meaning, where keyword matching is exact only in orthography.
The reason "myocardial infarction" and "heart attack" are semantically similar in an embedding space is that the model that produced those embeddings learned from billions of documents where those terms appeared in similar contexts. Context similarity in training data becomes geometric proximity in embedding space.
This is why embedding models trained on domain-specific data outperform general models for domain-specific tasks. A model trained on biomedical literature knows that "MI" is likely "myocardial infarction" in a clinical context, not "Michigan" or "mission impossible." A general model has diluted representations.
Modern embedding models are transformer networks trained with contrastive objectives.
Contrastive training works by showing the model pairs of similar texts (positives) and pairs of dissimilar texts (negatives), then training it to produce nearby embeddings for positives and distant embeddings for negatives.
The "contrastive" part: the model learns what similar and dissimilar mean from the data. You do not hand-code these relationships. The model discovers them.
For text embeddings, the positive pairs might be:
The resulting model has learned a representation space where these relationships hold. Query texts end up near relevant document texts. Related topics end up near each other.
// Simplified conceptual flow of text embedding
// 1. Tokenize the input text
const tokens = tokenizer.encode("The cat sat on the mat");
// ["The", "cat", "sat", "on", "the", "mat"] -> [1023, 4523, 2341, ...]
// 2. Feed tokens through transformer layers
// Each layer attends to all other tokens
// Building up contextual representations
// 3. Pool the final layer representations
// into a single fixed-size vector
// (mean pooling, CLS token, or learned pooling)
// 4. Optionally normalize to unit length
// (required for cosine similarity)
// 5. Output: one number[] representing the full text's meaning
const embedding: number[] = model.embed(tokens);
// [0.023, -0.145, 0.892, ...] // 768-3072 numbersThe pooling step matters more than most tutorials acknowledge. Different pooling strategies (mean pooling, max pooling, CLS token) produce embeddings with different properties. For retrieval tasks, mean pooling of the final layer generally works best.
Model choice significantly affects retrieval quality. A few key dimensions:
Higher-dimensional embeddings (1536 vs 768) generally carry more information but cost more storage and computation. The relationship is not linear. A 1536-dimensional model is not twice as good as a 768-dimensional model. The improvement depends on the task.
For most production applications, 1536 dimensions is the sweet spot. Beyond 3072, marginal gains rarely justify the overhead.
For general text: OpenAI's text-embedding-3-small and text-embedding-3-large are excellent defaults. Voyage AI's models consistently score well on retrieval benchmarks. Cohere's models are strong for multilingual applications.
For code: use a code-specific model. voyage-code-3 or GitHub's models understand programming language semantics that general models treat as opaque strings.
For biomedical: models trained on PubMed and clinical text vastly outperform general models. BioBERT, PubMedBERT, and their successors.
MTEB (Massive Text Embedding Benchmark) is the authoritative leaderboard for embedding model evaluation. Check it before choosing a model. Filter by the task type that matches your use case: retrieval, clustering, classification, semantic similarity.
| Model | Dimensions | MTEB Score | Cost | Best For |
|---|---|---|---|---|
| text-embedding-3-small | 1536 | Good | Low | General, high volume |
| text-embedding-3-large | 3072 | Excellent | Medium | General, quality-first |
| voyage-3 | 1024 | Excellent | Low | Retrieval |
| voyage-code-3 | 1024 | Excellent (code) | Low | Code search |
| Cohere embed-v4 | 1024 | Excellent | Medium | Multilingual |
The most common mistake when building embedding-based systems: embedding entire documents.
A long document produces one embedding that represents the average of all its content. That averaged representation is vague. A ten-page technical document embedded as one vector is simultaneously about everything in it and precisely about none of it.
Chunk documents into sections before embedding. Each chunk should be:
interface Chunk {
text: string;
startChar: number;
endChar: number;
metadata: Record<string, unknown>;
}
function chunkDocument(
text: string,
chunkSize: number = 512, // tokens
overlap: number = 50, // tokens
metadata: Record<string, unknown> = {}
): Chunk[] {
const chunks: Chunk[] = [];
const words = text.split(' ');
const wordsPerChunk = chunkSize; // approximate: 1 token ~= 1 word
const overlapWords = overlap;
let start = 0;
while (start < words.length) {
const end = Math.min(start + wordsPerChunk, words.length);
const chunkWords = words.slice(start, end);
const chunkText = chunkWords.join(' ');
const charStart = words.slice(0, start).join(' ').length;
const charEnd = charStart + chunkText.length;
chunks.push({
text: chunkText,
startChar: charStart,
endChar: charEnd,
metadata: { ...metadata, chunkIndex: chunks.length },
});
start += wordsPerChunk - overlapWords; // Overlap for context continuity
}
return chunks;
}The overlap between chunks ensures that context at chunk boundaries does not get lost. A sentence split between two chunks would lose its context without overlap.
Embedding documents one at a time is slow and inefficient. Batch your embedding calls.
async function batchEmbed(
texts: string[],
batchSize: number = 100
): Promise<number[][]> {
const embeddings: number[][] = [];
for (let i = 0; i < texts.length; i += batchSize) {
const batch = texts.slice(i, i + batchSize);
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: batch,
});
const batchEmbeddings = response.data.map(d => d.embedding);
embeddings.push(...batchEmbeddings);
// Rate limit protection
if (i + batchSize < texts.length) {
await new Promise(resolve => setTimeout(resolve, 100));
}
}
return embeddings;
}
// Usage: embed an entire document collection
async function indexDocumentCollection(
documents: Array<{ id: string; content: string; metadata: Record<string, unknown> }>
): Promise<void> {
const chunks = documents.flatMap(doc =>
chunkDocument(doc.content, 512, 50, { ...doc.metadata, docId: doc.id })
);
const embeddings = await batchEmbed(chunks.map(c => c.text));
await vectorDB.upsert(
chunks.map((chunk, i) => ({
id: `${chunk.metadata.docId}-chunk-${chunk.metadata.chunkIndex}`,
values: embeddings[i],
metadata: chunk.metadata,
}))
);
}Three distance metrics dominate vector search:
Cosine similarity: Measures the angle between two vectors. Ranges from -1 to 1. Not affected by vector magnitude, only direction. The most common choice for text embeddings. Most embedding models are designed with cosine similarity in mind.
Euclidean distance (L2): Measures straight-line distance in vector space. Affected by magnitude. Works well for embeddings that are NOT normalized to unit length.
Dot product: Fast to compute. Equivalent to cosine similarity when vectors are normalized (which most embedding models produce). Use this if you are sure vectors are unit-length and you need maximum speed.
// When to use each:
// Cosine: always safe default for text embeddings
// Good when: vectors may have different magnitudes
const cosineSim = (a: number[], b: number[]): number => {
const dot = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
const magA = Math.sqrt(a.reduce((sum, ai) => sum + ai * ai, 0));
const magB = Math.sqrt(b.reduce((sum, bi) => sum + bi * bi, 0));
return dot / (magA * magB);
};
// Dot product: fastest when vectors are normalized
// Good when: you know vectors are unit-length
const dotProduct = (a: number[], b: number[]): number =>
a.reduce((sum, ai, i) => sum + ai * b[i], 0);
// Euclidean: rarely the best choice for text
// Good when: magnitude differences are meaningful (e.g., image embeddings)
const euclidean = (a: number[], b: number[]): number =>
Math.sqrt(a.reduce((sum, ai, i) => sum + (ai - b[i]) ** 2, 0));Most vector databases let you configure which metric to use at index creation time. Choose cosine for text and stick with it.
Semantic search is the most common use case but not the only one.
You can classify text into categories without training a classifier, using only embeddings.
async function zeroShotClassify(
text: string,
categories: string[]
): Promise<{ category: string; confidence: number }> {
// Embed the text and all category labels
const [textEmbedding, ...categoryEmbeddings] = await batchEmbed([
text,
...categories,
]);
// Find the most similar category label
const similarities = categoryEmbeddings.map((catEmb, i) => ({
category: categories[i],
similarity: cosineSim(textEmbedding, catEmb),
}));
similarities.sort((a, b) => b.similarity - a.similarity);
return {
category: similarities[0].category,
confidence: similarities[0].similarity,
};
}
// Works without ANY training data
// Accuracy depends on how well category labels describe the categoriesFind near-duplicate content in large collections.
async function findDuplicates(
documents: Array<{ id: string; embedding: number[] }>,
threshold: number = 0.95
): Promise<Array<[string, string]>> {
const duplicatePairs: Array<[string, string]> = [];
for (let i = 0; i < documents.length; i++) {
for (let j = i + 1; j < documents.length; j++) {
const similarity = cosineSim(documents[i].embedding, documents[j].embedding);
if (similarity >= threshold) {
duplicatePairs.push([documents[i].id, documents[j].id]);
}
}
}
return duplicatePairs;
// For large collections, use ANN instead of brute force
}Find content that does not belong in a collection.
async function findAnomalies(
newDocument: number[],
collection: number[][],
topK: number = 10
): Promise<{ isAnomaly: boolean; avgSimilarity: number }> {
// Find the K most similar items in the collection
const similarities = collection
.map(doc => cosineSim(newDocument, doc))
.sort((a, b) => b - a)
.slice(0, topK);
const avgSimilarity = similarities.reduce((sum, s) => sum + s, 0) / similarities.length;
// If even the closest matches are far away, it's likely anomalous
return {
isAnomaly: avgSimilarity < 0.6, // Threshold depends on your collection
avgSimilarity,
};
}Embeddings are foundational. They are not a technique you apply to one specific problem. They are the representation layer that makes an entire class of intelligent behavior possible.
Search that understands meaning. Recommendations based on conceptual similarity. Classification without labeled training data. Duplicate detection that catches paraphrases. Anomaly detection that understands context.
All of these are the same operation at the core: convert content to coordinates, find the geometric relationships that matter for your use case.
Once you can produce good embeddings and query them efficiently, you have the infrastructure for most of what makes modern AI applications feel intelligent.
Q: What are embeddings in AI?
Embeddings are numerical representations (vectors) of text, images, or other data that capture semantic meaning. Similar concepts have similar vectors, enabling AI to understand relationships between content. The word 'king' has a similar embedding to 'queen' and 'monarch' but a different one from 'banana'.
Q: How are embeddings used in AI applications?
Embeddings power semantic search (finding content by meaning, not keywords), recommendation systems (suggesting similar items), RAG systems (retrieving relevant context for AI), clustering (grouping similar content), and anomaly detection (finding outliers). They are the foundation of most modern AI search and retrieval systems.
Q: How do you generate embeddings?
Generate embeddings using embedding models like OpenAI's text-embedding-3-small, Cohere's embed-v3, or open-source models like sentence-transformers. You pass text through the model and receive a vector of numbers (typically 256-3072 dimensions) that represents the semantic content.
Full-stack developer and AI architect with years of experience shipping production applications across SaaS, mobile, and enterprise. Gareth built Agentik {OS} to prove that one person with the right AI system can outperform an entire traditional development team. He has personally architected and shipped 7+ production applications using AI-first workflows.

Vector Databases: A Practical Guide for Developers
Vector databases search by meaning, not keywords. Here's how embeddings work, which indexing strategy to pick, and when to use Pinecone vs Chroma vs pgvector.

Semantic Search in Production: From Embeddings to Results
Keyword search returns documents containing your words. Semantic search returns documents matching your intent. Here is how to build one for production.

Build a RAG System That Actually Works in Production
Basic RAG works in demos and breaks in production. Here is what naive implementations get wrong and how to build the version that handles real users.
Stop reading about AI and start building with it. Book a free discovery call and see how AI agents can accelerate your business.