In-context learning is one of the best ways to get better and more reliable outputs from LLMs. The principle of showing the model what you want versus telling it, is not only more attainable, but also easier in some ways.
Leveraging in-context learning is also relatively easy to do. You don’t need to modify the underlying parameters of the model, you don’t need to be a machine learning engineer, and it’s easy to test (especially if you have a nifty tool like PromptHub).
In this post we’ll dive deep into in-context learning, how it differs from few-shot prompting, and if the models really are “learning” in context (the PHDs in the audience are either going to love or absolutely hate this). Most importantly, we’ll share a few templates and prompt chains to help you automate the process of generating high-quality in-context learning examples.
We’ll be pulling a lot of data and context from two papers:
What is In-Context Learning
In-context Learning (ICL) is a method where Large Language Models (LLMs) learn tasks by using a few examples embedded directly in the prompt, known as “context.”
Compared to other optimization methods like fine-tuning, in-context learning relies solely on the examples to guide the model’s understanding of the task and behavior. By showing the model what is expected, rather than just telling it, in-context learning allows for a more nuanced and flexible task-solving process, without needing to modify the model’s internal parameters.
It is still a good idea to include instructions and supplement them with the demonstrations.
Let’s dive into a few examples of how you can leverage in-context learning.
Zero-shot vs One-shot vs Few-shot learning
One of the common ways to implement in-context learning is via one or few-shot learning.
Zero-shot learning refers to when you don’t include any demonstrations in your prompt.
One-shot learning is when you include one demonstration in your prompt to help guide the model when producing its output.
Few-shot learning takes this further by including multiple demonstrations in the prompt to show the model different inputs and outputs. Here’s an example with multiple demonstrations:
What’s the difference between In-Context Learning and Few-Shot prompting
Back to the question that sent me down this rabbit hole. I’ll admit, the answer isn’t super exciting or surprising.
In-context learning is the ability of large language models to adapt to the input context without changing internal parameters. Few-shot prompting is a specific technique within in-context learning, where a few examples are provided to guide the model’s behavior.
In-context learning describes the model’s overall adaptability to the input, few-shot prompting leverages this ability specifically by providing multiple task demonstrations.
How In-Context Learning works
When a user provides a prompt that includes examples or instructions, the model uses its attention mechanisms to analyze the input. It identifies patterns from the context (tone, style, structure) and applies these patterns when generating an output.
In-context learning doesn’t change anything about the model’s underlying parameters. Instead, the provided examples guide the model to adapt its responses in real-time, leveraging the specific patterns and context within the examples.
In some ways, it’s more about pattern recognition and fine-tuning instructions, rather than "learning" something completely new.
For example, when a few input-output pairs are provided in the prompt, the model can recognize the desired output structure, format, and style, applying these to new inputs—essentially “learning” on the fly during inference.
Since models are great at recognizing patterns, in-context learning is a method you can leverage for almost any type of task, which is one of the reasons we like it so much.
Challenges and limitations of In-Context Learning
Every prompt engineering method has flaws, here are the biggest ones for in-context learning:
- Efficiency costs: By including examples in your prompt, you will use more input tokens, which will drive up the cost and latency (just a little). Latency is mostly driven by how many output tokens need to be generated, because they are generated sequentially, versus input tokens which are processed in parallel. You can read more about the world of latencies here: Comparing Latencies: Get Faster Responses From OpenAI, Azure, and Anthropic.
- Scalability: There eventually is a limit in the amount of tokens you can send in a prompt. Also, the number of demonstrations used versus performance eventually plateaus. In some research papers they’ve found that at a certain point, including more examples can be detrimental. That’s why we recommend staying under 8 examples to start.
- Sensitivity to examples: The performance of in-context learning is highly dependent on the examples you provide. Factors, that we will touch on shortly, like quality, order, and diversity of examples can make a big impact.
- Ambiguity in the “learning” process: Although in-context learning provides adaptability, the underlying mechanism of how the model "learns" from the examples is still not clear.
Optimizing prompts for In-Context Learning
When it comes to actually writing prompts that leverage in-context learning here are some research backed best practices to follow:
- Use high-quality examples: This probably goes without saying, but the examples you choose need to be directly relevant to the task.
- Varied examples: The examples you choose should cover a wide range of aspects about the task. For example, in our feedback sentiment analyzer prompt, we included 3 examples, one for each of the options (positive, negative, neutral).
- Focus on formatting: Consistency in your examples is key. Make sure they all follow the same format to better help the model learn the pattern.
- Order matters: Some strategies suggest ordering examples from simple to complex. Others say to place the most relevant examples closest to the query as to take advantage of the model’s bias to apply more importance to what it “read” last.
- Avoid example clustering: Your examples should essentially be randomly ordered—avoid grouping similar ones together, as this could bias the model’s response.
- Example label distribution: Make sure you use a balanced distribution of examples. Going back to our feedback sentiment analyzer prompt, we wouldn’t want to overload the prompt with only positive or negative examples, because it could skew the model’s output.
- Don’t use too many examples: Too many examples can lead to diminishing returns, try to stay under 8 to start.
How to automatically generate In-Context Learning examples
One of the biggest downsides of in-context learning is that it requires some manual work to set up the demonstrations. Wouldn’t it be great if we could use LLMs to automate this process? Enter Auto-ICL, a framework that uses LLMs to autonomously generate examples, removing the need for us humans to do this manually.
Auto ICL works in two steps
- Generate contextual information: When a task or question is presented, the model generates relevant demonstrations.
- Combine context with query/task: The model then integrates these generated examples with the original query to produce the final prompt.
In their experiments, Auto-ICL outperformed a number of prompt engineering methods like few-shot prompting, few-shot-chain-of-thought , Automatic Prompt Engineering (APE) and others.
As promised here, are a few templates so you can start using this framework right away.
Pro tip: Create a chain in PromptHub to run them sequentially, generating examples and then a final output in one click.
Is In-Context Learning really ‘learning’
This is the section where I may be a little out of my depth, but I’ll do my best here.
There is a debate about whether in-context learning can be considered true “learning”. Unlike traditional learning methods, which involve updating a model’s internal parameters, in-context learning relies on pattern recognition and adaptation to context without changing the model.
Critics argue that since in-context learning doesn’t involve long-term adjustments to the model, it’s not true learning but rather dynamic pattern matching based on input examples.
In-context learning believers would say that the method mimics human-like learning by adapting to new tasks on the fly, using the examples as a guide. This ability to "learn from analogy" can be seen as a form of learning, albeit temporary and context-dependent.
Ultimately, whether in-context learning counts as real learning depends on how learning is defined, which is above my pay grade. For now, I’m happy to reap the benefits.
Conclusion
In-context learning is a great way to take your prompts to the next level. It’s relatively easy to implement, doesn’t require any technical skills, and can be tested quickly—especially if you leverage a framework like Auto-ICL to help with example generation.
There are some best practices to keep in mind, particularly regarding the ordering and structure of examples. But overall, adding any amount of high-quality examples to your prompt should lead to better results.