One of the biggest problems with LLMs is their tendency to make up incorrect information, also known as hallucinations.
Several solutions and tactics have emerged to try to tackle hallucinations, like Retrieval-Augmented Generation (RAG), fine-tuning models, and prompt engineering.
Setting up RAG can be challenging, and fine-tuning models can take some time. Prompt engineering is fast, easy to test, and anyone can do it. So let’s look at three methods you can test today to reduce hallucinations and get more accurate results.
“According to…” prompting
“According to…” prompting is the easiest of the three methods we’ll review. It’s rooted in the idea of guiding the model to get information from a specific source when answering your question.
For example, “What part of the brain is responsible for long-term memory, according to Wikipedia.” This prompts the model to ground its answer with information from a trusted source.
A few more examples across different industries:
Law:"According to interpretations in the American Bar Association Journal..."
Medicine:"In the findings published by the New England Journal of Medicine..."
Entertainment:"As highlighted in the latest analyses by Variety..."
Finance:"Based on the latest financial reports by The Economist..."
Technology:"Reflecting on insights from Wired's recent coverage..."
Education:"Following the educational standards set forth by UNESCO..."
Environment:"Drawing on conclusions from the World Wildlife Fund's latest research..."
You can try this method via a template we put together in PromptHub.
Researchers found this method was able to outperform standard prompting, improving accuracy by up to 20% in some cases. Here is the full paper. We also put together a detailed run-down on this method and the experiments here.
Chain-of-Verification Prompting
The Chain-of-Verification (CoVe) prompt engineering method aims to reduce hallucinations through a verification loop. CoVe has four steps:
- Generate an initial response to the prompt
- Based on the original prompt and output, the model is prompted again to generate multiple questions that verify and analyze the original answers.
- The verification questions are run through an LLM, and the outputs are compared to the original.
- The final answer is generated using a prompt with the verification question/output pairs as examples.
CoVe is typically implemented via a few prompts. One for the original question, one for the verification questions, and one to generate the final answer. But we made a single-shot template that performs comparably to the traditional multi-prompt setup.
The researchers tested CoVe across a wide variety of experiments and found, in some cases, that CoVE increased performance by up to 23%. Here is the full paper. We have a full run-down on this method and the experiments here.
Step-Back Prompting
A crucial rule to remember when prompting LLMs is to give them space to 'think'. You don’t want to overly constrain the model such that it can’t explore various solutions.
Chain of thought reasoning is one way to push the model to think through the problem. A simple way to implement this type of reasoning is to add a statement like “think through this task step-by-step” at the end of your prompt.
Step-Back prompting is an even better way to tap into this type of reasoning, leading to higher accuracy and lower hallucination rates.
Step-Back Prompting pushes the model to “think” at a high-level before diving directly into the task at hand. Here’s what the template looks like.
You can see in the template above, there are generally two steps: Abstraction and reasoning.
Abstraction correlates to high-level thinking, while reasoning is concerned with the low-level details related to figuring out the answer.
For example, let’s say the main question/problem we want the LLM to handle is "How can I optimize my website's loading speed?" The step-back question might be something like "What factors influence website performance?" This higher-level thinking preps and guides the model for a more holistic approach to the task.
The researchers tested step-back prompting across a few different datasets and found it was able to outperform chain of thought prompting by up to 36% in some cases. Here is the full paper and our detailed run-down on this method and the experiments.
Wrapping up
All of these prompting methods are available for free as templates in PromptHub. Try them out and see if they can help you get more accurate outputs!