Least-to-Most Prompting Guide

‍Chain-of-thought prompting was first written about in 2022 and is still one of the top prompt engineering methods for increasing performance through enhanced reasoning.

Like all prompt engineering methods, chain-of-thought has limitations. Most notably, it performs poorly on tasks that require solving problems that are harder than the examples shown.

Least-to-most prompting was designed to overcome this specific limitation in chain-of-thought prompting (link to original paper).

In this article, we’ll dive deep into everything related to least-to-most prompting, including a bunch of examples and templates (including a no-code prompt chain) that will make it easy to start testing it right away.

‍

What is least-to-most prompting?

Least-to-most prompting is a prompt engineering method that increases the problem-solving capabilities of LLMs by breaking down complex problems into a series of simpler subproblems that get executed sequentially.

Least-to-most prompting has two stages:

Decomposition stage: Break down a complex problem into a list of subproblems. The prompt used in this step has few-shot examples that show the model how to decompose problems.
Subproblem solving stage: After breaking the problem into simpler subproblems, the model solves each subproblem sequentially, or all at once if you'd like. This prompt has three parts: (1) few-shot prompting examples demonstrating how subproblems are solved, (2) previously solved subproblems and solutions, and (3) the next question to be answered.

‍

A graphic showing the flow of prompts for least-to-most prompting — Least-to-most prompting flow

‍

In the example above you’ll see that the first step is to decompose the question into subquestions. The prompt that is used to instruct the model to decompose the problem is not included in the image above, but it could be something like this:

‍

Least-to-most prompt template (decomposition)

‍

This prompt above shows the model how to decompose the question through an example.

Referring to the graphic above, after the decomposition step, the model solves the first subproblem, and then moves on to the next subproblem, while retaining the previous subproblems and their answers.

Eventually, the original problem is appended as the final subproblem ("subquestion 2" in the graphic).

Another benefit of least-to-most prompting is that it can be combined with other prompt engineering methods like chain-of-thought or self-consistency, but it does not need to.

Least-to-most prompting can be executed in two stages, as shown above, or in a single prompt. We will look at an example of this shortly.

‍

Hey everyone, how's it going? This is Dan here from PromptHub. Happy Friday! Today we're going to be looking at a pretty cool prompting method called "Least-to-Most Prompting," and we'll be looking at how you can implement it as well. Rather than just diving deep into the research, we'll also explore some implementation steps.

The paper, which will be linked below, came out in 2023. The core of what Least-to-Most Prompting does is that it basically tries to break down problems into subproblems and then solve those subproblems sequentially. It involves two steps: first, it takes a problem and decomposes it into subproblems. The prompt used to decompose the problem into subproblems usually has a few examples showing the model, "Hey, here are some similar problems, and here are the decomposition steps for those problems. Now, do this next one."

Next, it starts to sequentially solve those smaller problems. Few-shot examples are usually included here, showing the previously solved subproblems and solutions, and then the next subproblem to be answered. You can do this in a single prompt, having all the subproblems solved at once, or you can have a different request for each of the subproblems and then append them.

Here’s what the flow looks like: The first stage is the decomposition stage, where the prompt shows the problem being decomposed into subproblems. Then, it starts to sequentially solve each subquestion, appending both the subquestion and the answer to the LLM with its next subquestion. Eventually, it solves the main question, such as "How many times can she slide before it closes?" In this case, there’s only one subquestion to answer.

Here’s an example of a decomposition prompt, which is the first step. It sends an example of a Q&A where it has a question and an answer that breaks down the steps to solve the problem. Then, you send the question you want to be answered. The output from here will be similar steps mimicking this pattern above, like, "To solve the problem of how many times she can slide before it closes, we need to do steps one, two, and three." Then you’ll have your subproblems and can move on to solving them.

When we think about Least-to-Most versus Chain-of-Thought, there are a lot of similarities—they both push the model to do some sort of reasoning. The difference with Least-to-Most is that it explicitly breaks it down into subproblems. The top example shows Least-to-Most with clear subproblems to solve, while the Chain-of-Thought example is more about reasoning in one stream. Least-to-Most, by intentionally breaking down into subproblems, can yield better results.

Looking at some experiment results quickly: this was on a last letter concatenation test, where LLMs were sent a list of words and had to pull out the last letter of each and then concatenate them. We can see Least-to-Most performs similarly to Chain-of-Thought when the number of words to concatenate is smaller, but the delta between the two grows as the number of letters increases. This happens because Chain-of-Thought tries to do all of this in one stream, while Least-to-Most breaks it down into subproblems.

For Chain-of-Thought, when using four examples, we see a plateau in performance, similar to what we’ve discussed in our Few-Shot Prompting Guide, which will also be linked below.

Next up was another task called the SCAN dataset, which I’ll skip over, and then a math reasoning dataset. On these three datasets, Least-to-Most usually outperforms Chain-of-Thought. The difference becomes more pronounced when more reasoning is required.

But that's the research—let's look at how we can actually use this. We’ll write three prompts: one to generate few-shot examples of related or similar problems being broken down into subproblems (this is the first step), another to have the model decompose the prompt, and finally, one to solve the subproblems.

To generate few-shot examples, the prompt might look something like this: "Generate few-shot problems for the following task. The example should have a problem and decomposed subproblems, and it should follow this structure." This will give you examples related to the task, such as answering customer support tickets, with decomposed subproblems. We have a template for this, which we’ll look at in a second.

In the second step, we’ll pass the same task and say, "List the decomposed subproblems before solving the task—only send us the subproblems." We’ll provide some examples, so the LLM knows how to decompose the problems. This comes from the previous step, and we also have a template for this in PromptHub.

In the final step, now that we have the examples and subproblems, we can actually solve the task. The prompt might be, "Solve the task by addressing the subproblems listed below." Then, you pass the task and the subproblems, which are the output from the previous step. Again, we have a template for this.

Now, to chain them together, you can do this in PromptHub. Under the Templates tab, you’ll see all three steps. Add them to your library, and then under the Chains tab, you can create a chain, add some links, and go from generating few-shot examples to decomposing the task into subproblems to solving the task.

This method is great because it can be used for anything. In our example, we’re using it for a customer support ticket, but it can dynamically adjust to create few-shot examples, subproblems, and solutions for any type of task.

That's it for today. Happy prompting! Let me know if you need any help getting up and running with those templates and chains.

‍

Least-to-most prompting examples

We are all about actionable information here, so let's look at some examples. We will pull from the experiments run in the original least-to-most prompting paper. We will dive deeper into the performance of these prompts later on.

Last-letter concatenation

Let us look at an example use case where we want the model to concatenate the last letter of a set of words. For example, for an input of ice, steak, pickleball, the output would be ekl.

‍

Here is a standard few-shot prompt version (that performs poorly, even with new models).

Few-shot prompt template

‍

The least-to-most prompt template used for this example will leverage few-shot prompting to show the model how to decompose similar problems.

Least-to-most prompt example

Q: think, machine
A: The last letter of "think" is "k". The last letter of "machine" is "e". Concatenating "k" and "e" gives "ke". So "think, machine" output "ke".

Q: think, machine, learning
A: "think, machine" outputs "ke". The last letter of "learning" is "g". Concatenating "ke" and "g" gives "keg". So "think, machine, learning" is "keg".

Q: transformer, language
A: The last letter of "transformer" is "r". The last letter of "language" is "e". Concatenating "r" and "e" gives "re". So "transformer, language" is "re".

Q: transformer, language, vision
A: "transformer, language" outputs "re". The last letter of "vision" is "n". Concatenating "re" and "n" gives "ren". So "transformer, language, vision" is "ren".

Q: foo,bar,baz,blip,learn,prompting,world,shaking,event,dancefloor,prisma,giraffe
A:

‍

For comparison, here’s a chain-of-thought prompt template example:

Chain-of-thought prompt example

‍

The differences are subtle but important. Chain-of-thought tries to string all the letters together, one at a time, in a single stream. In contrast, least-to-most prompting works by adding just one additional letter at a time, using the output from the previous concatenation. This helps least-to-most prompting continue to perform well, even as the number of letters to concatenate increases.

‍

Compositional Generalization

Next up is a task where the model has to translate natural language into a sequence of actions. For instance, “run left and walk twice” would be translated to "TURN_LEFT + RUN + WALK * 2".

Here is a standard few-shot prompt version.

Few-shot prompt example

‍

For least-to-most prompting, we will use a two-prompt setup. The first prompt will simplify the problem into a sequence of steps ("walk opposite left thrice"), and the second prompt will map those steps into actual actions ("TURN LEFT" * 2 + "WALK" * 3").

Least-to-most decomposition prompt example

Q: look left after look twice
A: "look left after look twice" can be solved by: "look left", "look twice".

Q: hop opposite left thrice and walk
A: "hop opposite left thrice" can be solved by: "hop opposite left", "hop opposite left thrice". "walk" can be solved by: "walk". So, "hop opposite left thrice and walk" can be solved by: "hop opposite left", "hop opposite left thrice", "walk".

Q: look around left thrice and walk
A: "look around left thrice" can be solved by: "look left", "look around left", "look around left thrice". "walk" can be solved by "walk". So, "look around left thrice and walk" can be solved by: "look left", "look around left", "look around left thrice", "walk".

Q: turn left after run left thrice
A: "turn left" can be solved by: "turn left". "run left thrice" can be solved by: "run left", "run left thrice". So, "turn left after run left thrice" can be solved by: "turn left", "run left", "run left thrice".

Q: jump around right twice after walk opposite right thrice
A:

‍

Least-to-most prompt output (decomposition)

‍

In the second prompt here, we are doing two things:

Adding examples via few-shot prompting (as we did in the prompt above);
Injecting the last part of the output of the reduction step (""jump left", "jump around left", "jump around left twice", "walk opposite left", "walk opposite left thrice")" as the question we want to answer. By doing this, we are passing along a more informative example for the model to then go and translate to actions.

‍

Least-to-most prompting stage two example

Q: turn opposite right
A: The output of "turn opposite right" concatenates: the output of "turn right", the output of "turn right". "turn right" outputs "TURN RIGHT". So repeating the output of "turn right" twice leads to "TURN RIGHT" * 2. So the output of "turn opposite right" is "TURN RIGHT" * 2.

Q: turn around right
A: The output of "turn around right" concatenates: the output of "turn right", the output of "turn right", the output of "turn right", the output of "turn right". "turn right" outputs "TURN RIGHT". So repeating the output of "turn right" four times leads to "TURN RIGHT" * 4. So the output of "turn around right" is "TURN RIGHT" * 4.

Q: walk around left
A: The output of "walk around left" concatenates: the output of "walk left", the output of "walk left", the output of "walk left", the output of "walk left". "walk left" outputs "TURN LEFT" + "WALK". So repeating the output of "walk around left" four times leads to ("TURN LEFT" + "WALK") * 4. So the output of "walk around left" is ("TURN LEFT" + "WALK") * 4.

Q: {{output.step1}}
A:

‍

Mathematical reasoning

For a mathematical reasoning problem, we’ll use least-to-most prompting in a single prompt.

We’ll pass a single example that shows how a question can be decomposed

Least-to-most prompt example

‍

Conversely here is what a chain-of-thought version of this prompt would look like:

Chain-of-thought prompt template

‍

Least-to-most experiment results

Now we will dive deeper into the results of the various experiments that the researchers ran, starting with the last-letter concatenation task.

Last-letter Concatenation

Here’s another example of that first decomposition step that we looked at above.

‍

Least-to-most prompt decomposition example

‍

Here are the results from the experiment, where L = the number of letters to concatenate.

‍

Table of results from least-to-most and chain-of-thought experiment

Standard prompting absolutely fails here (models listed below)
Chain-of-thought prompting performs well with a low number of words to concatenate. However, as the number increases, performance drops significantly. The chain becomes too long for the model to manage effectively.

The researchers broke down performance by the number of examples. See results below.

‍

Table of results from least-to-most and chain-of-thought experiment broken down by number of examples

‍

Interestingly, chain-of-thought prompting achieves an accuracy of 37.4% with 4 independent examples and only 38.4% with 8 independent examples. This points back to a graph that I often reference and is present in our few-shot prompting guide.

Performance gains plateau quickly with few-shot prompting.

‍

a graph showing performance versus number of examples in context — Source: Language Models are Few-Shot Learners

‍

Compositional generalization: SCAN benchmark

As a reminder from the example we looked at earlier, the SCAN benchmark is the task that entails translating commands into action sequences.

‍

Table with examples of commands and related action sequences from the SCAN dataset

‍

As we saw in the earlier example, least-to-most prompting uses a two-step prompt for this task:

A prompt to decompose long commands into a list of short commands.
A prompt to map natural language commands to action sequences.

‍

Table of results of multipole models and chain-of-thought versus least-to-most prompting on the SCAN dataset

‍

The results really speak for themselves.

The models at the time had a hard time with this task, even current models struggle with it.
Interestingly, code-davinci-002, a model optimized for tasks involving code, outperformed text-davinci-002, regardless of the prompting method. This idea of LLMs leveraging code generation capabilities on non-coding tasks is something we talked about in our recent Program of Thoughts Prompting Guide.

‍

Math Reasoning

Last but not least, classic math datasets.

As a refresher, below is a chain-of-thought prompt template and a least-to-most prompt template:

Chain-of-thought prompt template

‍

Least-to-most prompt template

‍

The main difference? Chain-of-thought solves the problem in a continuous flow, addressing the subproblems within the same response, while Least-to-most prompting explicitly decomposes the problem into subproblems. Least-to-most prompting first identifies the intermediate steps required to solve the main problem and then addresses each subproblem sequentially.

Let’s take a look at the experiment results:

‍

Results of chain-of-thought prompting and least-to-most prompting on 3 math based datasets

‍

Least-to-most prompting consistently outperforms chain-of-thought prompting, although the degree of improvement varies across different datasets.
Least-to-most prompting significantly outperforms chain-of-thought prompting on the DROP dataset, most likely because those math problems are easier to decompose

The researchers also tested the two prompt engineering methods on math problems that required multiple steps. This is where least-to-most prompting really shines.

‍

Results of least-to-most prompting versus chain-of-thought prompting broken down by number of steps to solve the math question

‍

Least-to-most prompting outperforms chain-of-thought prompting in math problems with 5+ steps by ~15%

‍

Least-to-most prompt templates

We have a few additional templates you can try out. The first set will be tailored to a specific task, planning a vacation.

We'll follow the two stage process of least to most prompting.

Least-to-most prompt template - Vacation planning

Stage 1 - breaking down the problem into subproblems

‍

Stage 2 - pass the subproblems through from stage 1, and solve them sequentially

‍

Least-to-most prompt template - any task

Next we'll create a more generalizable set of prompts so that you can apply least-to-most prompting on any task, not just to plan a vacation.

‍
We're also gong to add another step to automate few-shot example generation, which will really take this chain to the next level.

‍

Step 1 - generate dynamic few-shot examples of problems and subproblems, for any task

‍

Step 1 of least-to-most prompt template chain

‍

Step 2 - pass the few-shot examples from the previous step, and decompose the problem at hand

‍

Step 2 of least-to-most prompt template chain

‍

Step 3 - pass the subproblems from step 2, and solve them sequentially

‍

Step 3 of least-to-most prompt template chain

‍

Now let's pull it all together using a prompt chain

‍

Three steps of least to most prompt template in a chain in PromptHub Dashboard

‍

To reiterate, the steps in the chain are:

Input any problem and generate few-shot examples that have problems and subproblems
Generate subproblems for the problem, using the output from step one as few-shot examples
Sequentially solve the subproblems generated from step 2

‍

Now we have a least-to-most prompt template chain that can be used for any type of problem.
If you have any questions about how to set this up in PromptHub just let us know!

‍

When to use least-to-most prompting

Least-to-most prompting is particularly helpful in situations where:

The task or question is complex
The task or question can be broken down into simpler subproblems

Here are a few quick examples:

Chat Support Bots:
- A customer support bot handling a complex order question that involves checking product availability on specific dates, applying discount codes, and processing returns.
- Decomposition:
  1. Check product availability for the specified dates.
  2. Verify and apply the discount code.
  3. Process the return request.
- By breaking down the customer’s request into these subproblems, there is a greater chance that the bot will handle each step accurately.
E-commerce Recommendations:
- An LLM-based system for generating personalized product recommendations based on user preferences, browsing history, and current promotions.
- Decomposition:
  1. Analyze user preferences.
  2. Review browsing history for recent interests.
  3. Integrate current promotions into recommendations.
- This setup ensures that recommendations are both relevant and up-to-date with user activity.
Financial Planning Tools:
- An AI financial advisor that helps users create budget plans that takes into account their free cash, expenses, savings goals, and investment options.
- Decomposition:
  1. Calculate monthly income.
  2. Categorize and sum monthly expenses.
  3. Allocate funds to savings goals.
  4. Suggest investment options based on remaining budget.
- By addressing each financial component separately, it is much more likely that the LLM will handle the math correctly.

‍

Limitations of least-to-most prompting

While least-to-most prompting is powerful, it has some limitations:

Task-Specific Nature: Decomposition prompts are often specific to the task and may not generalize well across different types of problems. For example, a prompt that decomposes math problems isn’t going to help on common sense reasoning problems like “Can a fish climb a tree?”. A new prompt is needed.
Generalization Challenges: Even within the same domain, generalizing decomposition strategies can be difficult and requires an understanding of each unique problem.
Dependency on Accurate Subproblem Solutions: Errors in subproblems can cascade, affecting the final outcome.
Inability to Decompose a Problem into Subproblems: This happened occasionally on the math datasets, but this should be less of an issue as models have become much smarter. Whenever the model was able to break down the math problem into simpler subproblems, it was able to solve the original problem.

Wrapping up

Least-to-most prompting is one of our favorite prompt engineering methods because of it's enhanced reasoning capabilities.

Breaking down complex problems into more digestible subproblems helps increase performance, and also gives users a better understanding of how models approach problems

One of the biggest limitations is when least-to-most prompting doesn't or can't correctly decompose a problem. Luckily, as models continue to get better, this will be less of an issue!

Dan Cleary

Founder