Large Language Models (LLMs) have an extensive knowledge base, having been trained on virtually all text available on the internet. When we prompt LLMs, they go to specific parts of that knowledge base to retrieve information. Sometimes all it takes is a misused word or two and you could send the model in the wrong direction, models are sensitive.
One strategy to better leverage an LLM’s knowledge is to have it generate related knowledge before answering the question at hand or completing the task. This method is known as Generated Knowledge Prompting.
What is Generated Knowledge Prompting?
Generated Knowledge Prompting is a prompt engineering method that first prompts the LLM to generate useful knowledge related to the task, and then incorporate the knowledge into the prompt alongside the question or task description.
Generated Knowledge Prompting was first written about in a paper from 2022. It is particularly helpful for tasks that require a deep understanding of context, like generated code inside a codebase, but can be used across a wide range of tasks.
Let’s look at a quick example: a customer is using a chatbot to ask about rebooking a flight.
Customer question
"What are the rebooking options if my flight from New York to London is canceled?"
Prompt to generate knowledge
"Retrieve current UK travel restrictions for passengers flying from New York and check the availability of the next flights from New York to London."
Final integrated prompt
Knowledge: "The current UK travel restrictions allow only limited flights. The next available flight from New York to London is on [date].
User Query: What are the rebooking options for a passenger whose flight has been canceled?"
Originally, Generated Knowledge Prompting was designed as a two-step process involving separate prompts.
However, it's possible to streamline this into a single prompt, which aligns closely with another method we really like called, Analogical Prompting.
Benefits of Generated Knowledge Prompting
Generated Knowledge Prompting can helpful in a few ways:
- Higher Accuracy: The additional context helps the model provide more precise and relevant answers
- Adaptability: Generated Knowledge Prompting enables models to adapt to new information quickly without needing extensive retraining or fine-tuning
- Depth of Understanding: With the proper guardrails in place, models can explore topics in greater depth
How the model generates knowledge
The original researchers generated knowledge by prompting the model with an instruction, a few demonstrations, and new questions with placeholders.
The demonstrations in this case were human-written, but you could use an LLM for this, as long as you verify the data.
Experiment results
While the study is slightly outdated at this point (GPT-3 was the top OpenAI model at the time), there are still some insights to takeaway. Before we get to charts and tables, let's set the scene.
The researchers tested Generated Knowledge Prompting against a few baselines:
- No Knowledge (Ø): The vanilla baseline
- Random Sentences (R): Involves sampling random sentences from the LLM without tailoring them to the specific question
- Context Sentences (C): Consists of sampling text continuations from the context of the question, which means generating sentences that logically follow the question's content
- Template-generated Knowledge (T): Uses manually-designed templates to extract knowledge statements from the models
- Retrieval-based Knowledge (IR): Retrieves knowledge from external sources like Wikipedia, Google, etc
- Answers (A): Directly prompts the model to generate answers, using a single answer to assess few-shot performance or 20 answers to prompt SOTA models
Alright, now with that out of the way, let’s look at some results
Takeaways
- Zero-shot settings: Shows solid improvements of 7% -10% across NumerSense, CSQA, and QASC
- Generated knowledge > few-shot: Generated knowledge outperformed few-shot, improving performance by 14% to 20% across commonsense reasoning tasks.
- Generated knowledge > retrieval-based: Generated knowledge outperformed retrieval-based knowledge from large sources (Wikipedia, Google), improving performance by up to ~9%, showing that tailored generated knowledge is more effective than loosely-related retrieved knowledge.
- Generated knowledge < retrieval-based: On the QASC dataset, the retrieval-based method outperforms generated knowledge prompting. This is because the retrieved knowledge is sourced directly from a gold-standard knowledge base specifically designed to construct the dataset, making it highly relevant.
How much knowledge is needed?
Does quantity matter? This graph looks very similar to the performance gains relative to the number of examples included in a few-shot prompt (you can check it out in our Few Shot Prompting Guide) .
As shown above, performance generally increases with the quantity of knowledge statements included, but the majority of the gains come from the inclusion of any knowledge statements.
Wrapping up
Generated Knowledge Prompting, while an older method, is highly effective. It is still very useful with today’s models, and has even been remixed into other types of methods like Analogical prompting. Generated Knowledge Prompting is extremely adaptable and can help across a variety of types of tasks. Keep this one top of mind. It’s easy to implement, very flexible, and extremely effective.