Weekly AI insights —
Real strategies, no fluff. Unsubscribe anytime.
A technique where a model adapts to a new task using only 2 to 20 labeled examples provided directly in the prompt, without updating model weights.
Few-shot learning is a machine learning paradigm where a model learns to perform a new task from a small number of labeled examples, typically between 2 and 20 samples. Unlike traditional supervised learning, which requires thousands or millions of training examples to achieve good performance, few-shot learning leverages prior knowledge encoded in a pretrained model to generalize rapidly from minimal data. This capability is essential in domains where labeled data is scarce, expensive to collect, or where tasks change frequently enough that retraining is impractical.
The core mechanism behind few-shot learning in large language models relies on in-context learning (ICL). During pretraining, the model is exposed to a vast distribution of text patterns, building internal representations that capture how tasks are structured across domains. When presented with a new task at inference time, the model uses those representations to infer the expected output format and logic from the examples provided in the prompt. Critically, no weight updates occur: the adaptation is purely a function of the forward pass through the model. The examples act as a runtime specification of the task rather than a training signal.
There are three standard variants defined by example count: zero-shot (no examples, only a task description), one-shot (a single example), and few-shot (two or more examples). The performance gap between these variants depends heavily on task complexity. Simple classification tasks often work well with zero-shot prompting, while structured generation tasks such as SQL synthesis or JSON extraction benefit significantly from several well-chosen examples.
The following snippet demonstrates few-shot sentiment classification using the Anthropic SDK:
```python import anthropic
client = anthropic.Anthropic()
prompt = """Classify the sentiment of each review as positive or negative.
Review: The product arrived on time and works perfectly. Sentiment: positive
Review: Terrible quality, broke after two days. Sentiment: negative
Review: Absolutely love this, best purchase I have made this year. Sentiment:"""
message = client.messages.create( model="claude-sonnet-4-6", max_tokens=10, messages=[{"role": "user", "content": prompt}] ) print(message.content[0].text) # Output: positive ```
Two labeled examples are embedded in the prompt. The model infers the classification pattern and applies it to the third review without any fine-tuning or weight updates. This zero-infrastructure approach is why few-shot learning is the default starting point for most NLP prototypes.
Few-shot learning and fine-tuning both adapt a pretrained foundation model to a specific task, but they differ fundamentally in mechanism, cost, and performance ceiling. Fine-tuning modifies the model's weights through additional gradient-descent training on a labeled dataset, typically requiring hundreds to thousands of examples, dedicated compute, and a separate model deployment. Few-shot learning requires none of that: the entire adaptation lives in the prompt and runs on the same deployed model endpoint.
The practical tradeoff is performance versus iteration speed. For tasks where accuracy is critical and sufficient labeled data exists, fine-tuning typically outperforms in-context learning by a meaningful margin. For rapid prototyping, low-data scenarios, or tasks that evolve week to week, few-shot learning is dramatically faster and cheaper. Many mature production systems combine both approaches: a fine-tuned base model handles the common case, while few-shot examples in the system prompt cover edge cases and format customization.
Few-shot learning appears across nearly every industry that has adopted LLMs. In customer support automation, teams embed a handful of tone and format examples to ensure the model matches brand voice without a full fine-tuning run. In medical documentation, where privacy constraints limit labeled data availability, few-shot learning enables clinical NLP on small curated example sets. In software development, code generation tools use few-shot examples to match a codebase's naming conventions and architectural patterns.
Agent systems benefit particularly from few-shot learning. When building a planning agent or code agent, developers embed representative reasoning traces directly in the system prompt. The agent learns the expected chain-of-thought format, tool-calling structure, and output schema from those examples, producing consistent behavior across diverse user inputs without a full training pipeline.
Few-shot learning is sensitive to example selection and ordering in ways that traditional fine-tuning is not. Research has consistently shown that changing which examples are included, or even the order they appear in the prompt, can shift model accuracy by 15 to 30 percentage points on the same task. This instability makes raw few-shot prompting less reliable for high-stakes production systems without systematic evaluation and example curation.
Context window constraints impose a hard ceiling on how many examples can be provided, and longer examples consume budget that might otherwise hold useful retrieved context or detailed instructions. Few-shot learning also inherits the distribution of the chosen examples: if the selected examples skew toward a narrow subset of the task space, the model's outputs will reflect that skew and fail on out-of-distribution inputs.
Few-shot learning is one of the most practically important capabilities in modern foundation models because it removes the data collection and training infrastructure barrier from task adaptation. Before LLMs, every new NLP task required a dedicated labeling effort, a training pipeline, and a model deployment. With few-shot learning, a developer can adapt a foundation model to a new task in minutes by writing a handful of representative examples. This dramatically accelerates time-to-market and makes AI accessible to teams without deep machine learning infrastructure.
Understanding when few-shot learning is sufficient versus when fine-tuning is necessary is a core competency for anyone building on top of foundation models. It shapes prompt engineering strategy, informs dataset investment decisions, and determines how quickly a team can ship reliable AI-powered features. Practitioners who master example selection, ordering strategies, and evaluation of few-shot stability are significantly more effective at deploying LLM-based systems in production.
Want to see AI agents in action?