Three Prompt Engineering Methods to Reduce Hallucinations

One of the biggest problems with LLMs is their tendency to make up incorrect information, also known as hallucinations.

Several solutions and tactics have emerged to try to tackle hallucinations, like Retrieval-Augmented Generation (RAG), fine-tuning models, and prompt engineering.

Setting up RAG can be challenging, and fine-tuning models can take some time. Prompt engineering is fast, easy to test, and anyone can do it. So let’s look at three methods you can test today to reduce hallucinations and get more accurate results.

“According to…” prompting

“According to…” prompting is the easiest of the three methods we’ll review. It’s rooted in the idea of guiding the model to get information from a specific source when answering your question.

For example, “What part of the brain is responsible for long-term memory, according to Wikipedia.” This prompts the model to ground its answer with information from a trusted source.

‍

Black text on white background for the According to prompting method — Another example of grounding the model to a specific source

‍

A few more examples across different industries:

Law:"According to interpretations in the American Bar Association Journal..."

Medicine:"In the findings published by the New England Journal of Medicine..."

Entertainment:"As highlighted in the latest analyses by Variety..."

Finance:"Based on the latest financial reports by The Economist..."

Technology:"Reflecting on insights from Wired's recent coverage..."

Education:"Following the educational standards set forth by UNESCO..."

Environment:"Drawing on conclusions from the World Wildlife Fund's latest research..."

‍

You can try this method via a template we put together in PromptHub.

According to prompt template in PromptHub

‍

Researchers found this method was able to outperform standard prompting, improving accuracy by up to 20% in some cases. Here is the full paper. We also put together a detailed run-down on this method and the experiments here.

‍

Chain-of-Verification Prompting

The Chain-of-Verification (CoVe) prompt engineering method aims to reduce hallucinations through a verification loop. CoVe has four steps:

Generate an initial response to the prompt
Based on the original prompt and output, the model is prompted again to generate multiple questions that verify and analyze the original answers.
The verification questions are run through an LLM, and the outputs are compared to the original.
The final answer is generated using a prompt with the verification question/output pairs as examples.

‍

Prompt flow for prompt Chain-of-Verification prompting

‍

CoVe is typically implemented via a few prompts. One for the original question, one for the verification questions, and one to generate the final answer. But we made a single-shot template that performs comparably to the traditional multi-prompt setup.

‍

Chain of verification prompt template in PromptHub dashboard

‍

The researchers tested CoVe across a wide variety of experiments and found, in some cases, that CoVE increased performance by up to 23%. Here is the full paper. We have a full run-down on this method and the experiments here.

‍

Step-Back Prompting

A crucial rule to remember when prompting LLMs is to give them space to 'think'. You don’t want to overly constrain the model such that it can’t explore various solutions.

‍

Two prompt examples, one without reasoning steps, one with reasoning steps — Don't constrain the model, give it room to explore options

‍

Chain of thought reasoning is one way to push the model to think through the problem. A simple way to implement this type of reasoning is to add a statement like “think through this task step-by-step” at the end of your prompt.

Step-Back prompting is an even better way to tap into this type of reasoning, leading to higher accuracy and lower hallucination rates.

Step-Back Prompting pushes the model to “think” at a high-level before diving directly into the task at hand. Here’s what the template looks like.

‍

A screenshot of the Step-Back prompt as a template in PromptHub

‍

You can see in the template above, there are generally two steps: Abstraction and reasoning.

Abstraction correlates to high-level thinking, while reasoning is concerned with the low-level details related to figuring out the answer.

For example, let’s say the main question/problem we want the LLM to handle is "How can I optimize my website's loading speed?" The step-back question might be something like "What factors influence website performance?" This higher-level thinking preps and guides the model for a more holistic approach to the task.

‍

The researchers tested step-back prompting across a few different datasets and found it was able to outperform chain of thought prompting by up to 36% in some cases. Here is the full paper and our detailed run-down on this method and the experiments.