Chain-of-thought prompting was first written about in 2022 and is still one of the top prompt engineering methods for increasing performance through enhanced reasoning.
Like all prompt engineering methods, chain-of-thought has limitations. Most notably, it performs poorly on tasks that require solving problems that are harder than the examples shown.
Least-to-most prompting was designed to overcome this specific limitation in chain-of-thought prompting (link to original paper).
In this article, we’ll dive deep into everything related to least-to-most prompting, including a bunch of examples and templates (including a no-code prompt chain) that will make it easy to start testing it right away.
What is least-to-most prompting?
Least-to-most prompting is a prompt engineering method that increases the problem-solving capabilities of LLMs by breaking down complex problems into a series of simpler subproblems that get executed sequentially.
Least-to-most prompting has two stages:
- Decomposition stage: Break down a complex problem into a list of subproblems. The prompt used in this step has few-shot examples that show the model how to decompose problems.
- Subproblem solving stage: After breaking the problem into simpler subproblems, the model solves each subproblem sequentially, or all at once if you'd like. This prompt has three parts: (1) few-shot prompting examples demonstrating how subproblems are solved, (2) previously solved subproblems and solutions, and (3) the next question to be answered.
In the example above you’ll see that the first step is to decompose the question into subquestions. The prompt that is used to instruct the model to decompose the problem is not included in the image above, but it could be something like this:
Least-to-most prompt template (decomposition)
This prompt above shows the model how to decompose the question through an example.
Referring to the graphic above, after the decomposition step, the model solves the first subproblem, and then moves on to the next subproblem, while retaining the previous subproblems and their answers.
Eventually, the original problem is appended as the final subproblem ("subquestion 2" in the graphic).
Another benefit of least-to-most prompting is that it can be combined with other prompt engineering methods like chain-of-thought or self-consistency, but it does not need to.
Least-to-most prompting can be executed in two stages, as shown above, or in a single prompt. We will look at an example of this shortly.
Least-to-most prompting examples
We are all about actionable information here, so let's look at some examples. We will pull from the experiments run in the original least-to-most prompting paper. We will dive deeper into the performance of these prompts later on.
Last-letter concatenation
Let us look at an example use case where we want the model to concatenate the last letter of a set of words. For example, for an input of ice, steak, pickleball, the output would be ekl.
Here is a standard few-shot prompt version (that performs poorly, even with new models).
Few-shot prompt template
The least-to-most prompt template used for this example will leverage few-shot prompting to show the model how to decompose similar problems.
Least-to-most prompt example
For comparison, here’s a chain-of-thought prompt template example:
Chain-of-thought prompt example
The differences are subtle but important. Chain-of-thought tries to string all the letters together, one at a time, in a single stream. In contrast, least-to-most prompting works by adding just one additional letter at a time, using the output from the previous concatenation. This helps least-to-most prompting continue to perform well, even as the number of letters to concatenate increases.
Compositional Generalization
Next up is a task where the model has to translate natural language into a sequence of actions. For instance, “run left and walk twice” would be translated to "TURN_LEFT + RUN + WALK * 2".
Here is a standard few-shot prompt version.
Few-shot prompt example
For least-to-most prompting, we will use a two-prompt setup. The first prompt will simplify the problem into a sequence of steps ("walk opposite left thrice"), and the second prompt will map those steps into actual actions ("TURN LEFT" * 2 + "WALK" * 3").
Least-to-most decomposition prompt example
Least-to-most prompt output (decomposition)
In the second prompt here, we are doing two things:
- Adding examples via few-shot prompting (as we did in the prompt above);
- Injecting the last part of the output of the reduction step (""jump left", "jump around left", "jump around left twice", "walk opposite left", "walk opposite left thrice")" as the question we want to answer. By doing this, we are passing along a more informative example for the model to then go and translate to actions.
Least-to-most prompting stage two example
Mathematical reasoning
For a mathematical reasoning problem, we’ll use least-to-most prompting in a single prompt.
We’ll pass a single example that shows how a question can be decomposed
Least-to-most prompt example
Conversely here is what a chain-of-thought version of this prompt would look like:
Chain-of-thought prompt template
Least-to-most experiment results
Now we will dive deeper into the results of the various experiments that the researchers ran, starting with the last-letter concatenation task.
Last-letter Concatenation
Here’s another example of that first decomposition step that we looked at above.
Here are the results from the experiment, where L = the number of letters to concatenate.
- Standard prompting absolutely fails here (models listed below)
- Chain-of-thought prompting performs well with a low number of words to concatenate. However, as the number increases, performance drops significantly. The chain becomes too long for the model to manage effectively.
The researchers broke down performance by the number of examples. See results below.
Interestingly, chain-of-thought prompting achieves an accuracy of 37.4% with 4 independent examples and only 38.4% with 8 independent examples. This points back to a graph that I often reference and is present in our few-shot prompting guide.
Performance gains plateau quickly with few-shot prompting.
Compositional generalization: SCAN benchmark
As a reminder from the example we looked at earlier, the SCAN benchmark is the task that entails translating commands into action sequences.
As we saw in the earlier example, least-to-most prompting uses a two-step prompt for this task:
- A prompt to decompose long commands into a list of short commands.
- A prompt to map natural language commands to action sequences.
The results really speak for themselves.
- The models at the time had a hard time with this task, even current models struggle with it.
- Interestingly, code-davinci-002, a model optimized for tasks involving code, outperformed text-davinci-002, regardless of the prompting method. This idea of LLMs leveraging code generation capabilities on non-coding tasks is something we talked about in our recent Program of Thoughts Prompting Guide.
Math Reasoning
Last but not least, classic math datasets.
As a refresher, below is a chain-of-thought prompt template and a least-to-most prompt template:
Chain-of-thought prompt template
Least-to-most prompt template
The main difference? Chain-of-thought solves the problem in a continuous flow, addressing the subproblems within the same response, while Least-to-most prompting explicitly decomposes the problem into subproblems. Least-to-most prompting first identifies the intermediate steps required to solve the main problem and then addresses each subproblem sequentially.
Let’s take a look at the experiment results:
- Least-to-most prompting consistently outperforms chain-of-thought prompting, although the degree of improvement varies across different datasets.
- Least-to-most prompting significantly outperforms chain-of-thought prompting on the DROP dataset, most likely because those math problems are easier to decompose
The researchers also tested the two prompt engineering methods on math problems that required multiple steps. This is where least-to-most prompting really shines.
- Least-to-most prompting outperforms chain-of-thought prompting in math problems with 5+ steps by ~15%
Least-to-most prompt templates
We have a few additional templates you can try out. The first set will be tailored to a specific task, planning a vacation.
We'll follow the two stage process of least to most prompting.
Least-to-most prompt template - Vacation planning
Stage 1 - breaking down the problem into subproblems
Stage 2 - pass the subproblems through from stage 1, and solve them sequentially
Least-to-most prompt template - any task
Next we'll create a more generalizable set of prompts so that you can apply least-to-most prompting on any task, not just to plan a vacation.
We're also gong to add another step to automate few-shot example generation, which will really take this chain to the next level.
Step 1 - generate dynamic few-shot examples of problems and subproblems, for any task
Step 2 - pass the few-shot examples from the previous step, and decompose the problem at hand
Step 3 - pass the subproblems from step 2, and solve them sequentially
Now let's pull it all together using a prompt chain
To reiterate, the steps in the chain are:
- Input any problem and generate few-shot examples that have problems and subproblems
- Generate subproblems for the problem, using the output from step one as few-shot examples
- Sequentially solve the subproblems generated from step 2
Now we have a least-to-most prompt template chain that can be used for any type of problem.
If you have any questions about how to set this up in PromptHub just let us know!
When to use least-to-most prompting
Least-to-most prompting is particularly helpful in situations where:
- The task or question is complex
- The task or question can be broken down into simpler subproblems
Here are a few quick examples:
- Chat Support Bots:
- A customer support bot handling a complex order question that involves checking product availability on specific dates, applying discount codes, and processing returns.
- Decomposition:
- Check product availability for the specified dates.
- Verify and apply the discount code.
- Process the return request.
- By breaking down the customer’s request into these subproblems, there is a greater chance that the bot will handle each step accurately.
- E-commerce Recommendations:
- An LLM-based system for generating personalized product recommendations based on user preferences, browsing history, and current promotions.
- Decomposition:
- Analyze user preferences.
- Review browsing history for recent interests.
- Integrate current promotions into recommendations.
- This setup ensures that recommendations are both relevant and up-to-date with user activity.
- Financial Planning Tools:
- An AI financial advisor that helps users create budget plans that takes into account their free cash, expenses, savings goals, and investment options.
- Decomposition:
- Calculate monthly income.
- Categorize and sum monthly expenses.
- Allocate funds to savings goals.
- Suggest investment options based on remaining budget.
- By addressing each financial component separately, it is much more likely that the LLM will handle the math correctly.
Limitations of least-to-most prompting
While least-to-most prompting is powerful, it has some limitations:
- Task-Specific Nature: Decomposition prompts are often specific to the task and may not generalize well across different types of problems. For example, a prompt that decomposes math problems isn’t going to help on common sense reasoning problems like “Can a fish climb a tree?”. A new prompt is needed.
- Generalization Challenges: Even within the same domain, generalizing decomposition strategies can be difficult and requires an understanding of each unique problem.
- Dependency on Accurate Subproblem Solutions: Errors in subproblems can cascade, affecting the final outcome.
- Inability to Decompose a Problem into Subproblems: This happened occasionally on the math datasets, but this should be less of an issue as models have become much smarter. Whenever the model was able to break down the math problem into simpler subproblems, it was able to solve the original problem.
Wrapping up
Least-to-most prompting is one of our favorite prompt engineering methods because of it's enhanced reasoning capabilities.
Breaking down complex problems into more digestible subproblems helps increase performance, and also gives users a better understanding of how models approach problems
One of the biggest limitations is when least-to-most prompting doesn't or can't correctly decompose a problem. Luckily, as models continue to get better, this will be less of an issue!