One of the first benefits I found when using Large Language Models (LLMs) was that they helped with the blank page problem. I could quickly go from a blog idea to a rough outline in just a few minutes.
While understanding the core principles of prompt engineering is still important, not using LLMs in the process of writing prompts just doesn’t make any sense.
In the same way that an LLM can help a writer overcome the blank page problem, it can also help prompt engineers establish a solid prompt structure.
This process of using LLMs to write prompts and using prompts to create other prompts is called meta prompting. Rather than crafting every detail of your prompt, you can use other prompts, systems, and LLMs to help expedite the process.
In this article, we’ll dive into how meta prompting works, breaking down some of the latest and most popular methods from recent research papers. We’ll cover plenty of examples so you can start using these techniques in your workflows and apps right away. Plus, you’ll be able to see how we’ve integrated meta prompting into PromptHub to make the prompt engineering process extremely efficient.
What is Meta Prompting?
Meta prompting is a prompt engineering method that uses large language models (LLMs) to create and refine prompts.
Unlike traditional prompt engineering, where you write a prompt from scratch and hope for the best, meta prompting guides the LLM to adapt and adjust your prompt dynamically, based on your feedback, allowing it to handle more complex tasks and evolving contexts.
Now we’ll jump into a variety of the latest meta-prompting methods based on latest research.
Meta-Prompting
The first meta-prompting method we'll look at is simply called Meta-Prompting, developed through a collaboration from Stanford and OpenAI.
The core idea behind Meta-Prompting is to use an LLM as a conductor that can manage complex tasks by leveraging multiple independent LLMs that are experts in certain areas. This is similar to multi-persona prompting in that it is a collaboration between different LLMs with a lead conductor.
How It Works
- The central LLM receives a high-level “meta” prompt (copied below), that includes instructions to break down tasks into subtasks.
- Each subtask is assigned to an "expert" LLM with specific, detailed instructions
- The central LLM oversees communication between the expert LLMs, synthesizes their outputs, applies its own judgement, and generates a final output.
Pros
- Meta-Prompting is task-agnostic.
- The central model’s ability to coordinate multiple experts enhances problem-solving capabilities, which means more accurate and aligned results
- Doesn’t require a bank of test data
Cons
- Increased cost and latencies due to many LLM calls and interactions between the experts
- Requires some configuration and set up to handle the inter-model interactions
- Potential context window issues handling large message histories
Meta Prompting prompt template
Here is the Meta prompting prompt template straight from the research paper:
In this example, the Meta Model ('Meta-Expert') directs the task to an Expert Mathematician and provides a clear instruction. This instruction is isolated, allowing the expert to focus on their specific task. The Meta Model then verifies the response and integrates it into the overall solution.
The template is also available in PromptHub, feel free to add it to your library!
Learning from Contrastive Prompts (LCP)
The next meta-prompting method we'll check out is from an Amazon paper called Learning from Contrastive Prompts (LCP): Automated Optimization and Adaption.
How It Works
- Starting with an initial prompt and a small training set containing input-output pairs, multiple prompt candidates are generated.
- The LLM generates outputs for each prompt candidate, which are then evaluated to identify where these candidates are succeeding and where they are falling short.
- LCP instructs the LLM to compare good prompts against bad ones, allowing the model to identify what is working and what isn’t.
- Based on this comparison, the LLM generates a new, refined prompt and continues to iteratively improve it.
Put briefly, the LLM is tasked with contrasting good prompts against bad ones, leveraging the comparison to refine prompts.
Pros
- LCP’s focus on contrasting both good and bad prompts can lead to a better optimization process and better final prompt variant
- LCP addresses the risk of overfitting by using multiple incorrect samples to create summaries of failure reasons, which enables the generation of a set of diverse prompt candidates
- This diversity enables exploration of the prompt space, preventing the model from getting trapped in a local minimum.
- LCP is also task agnostic
Cons
- High costs and latencies due to the need to generate, evaluate, and contrast multiple prompts in each iteration.
- The framework requires consistent feedback and evaluation to adapt effectively
LCP prompt template
Here are the LCP prompt templates straight from the research paper:
In this example, the LLM evaluates multiple prompt candidates, learning from both their strengths and weaknesses. By contrasting the best-performing prompts with less effective ones, the LLM iteratively generates an optimized prompt, adapting to changes in the model version or task requirements.
Automatic Prompt Engineer (APE)
The next meta-prompting method comes from a paper out of the University of Waterloo called, Large Language Models Are Human-Level Prompt Engineers. Their meta-prompting method, Automatic Prompt Engineer (APE), treats the prompt as a "program," optimizing it by searching over a pool of prompt candidates with the goal to maximize a specific score function.
How It Works
- Instruction Generation: An LLM generates a set of prompt candidates based on input-output demonstrations.
- Scoring and Evaluation: Each prompt is evaluated using a scoring function to measure its effectiveness.
- Iterative Search: The process iteratively uses a Monte Carlo search method, where LLMs refine the best prompts by proposing semantically similar prompt variants.
Generate a bunch of prompts, score and evaluate them, generate new semantically similar versions and select the prompt version with the highest score.
Pros
- APE’s experiment results consistently outperformed human-engineered prompts
- APE is adaptable and can optimize prompts for different scenarios (zero-shot, chain-of-thought, etc).
- APE is task agnostic
Cons
- The iterative search process can be computationally intensive
- Requires some dev work to spin up the system (the search process in particular)
Example Implementation using APE Templates
APE guides the LLM through generating, evaluating, and refining prompts in a structured, step-by-step process. Here's a quick example flow of how it works using the templates from the research paper:
In this first step, APE generates an initial instruction that could have led to the provided input-output pairs, setting up the foundation for the prompt.
At this stage, the LLM evaluates how effectively the generated instruction from Step 1 performs when given new inputs. This helps identify how well the prompt works without any additional examples.
Using the feedback from Step 2, the LLM refines the initial instruction, creating a new version that aims to be more effective based on the evaluation.
This step helps APE further understand how the refined instruction maps to the given input-output pairs, ensuring that the prompt remains accurate and adaptable.
Finally, the LLM incorporates a Chain of Thought reasoning into the prompt, enhancing its ability to handle more complex tasks in a zero-shot setting.
Summing it up:
- The LLM generates multiple instruction candidates based on initial input-output pairs.
- These candidates are scored based on their effectiveness.
- Through an iterative process, the LLM generates improved candidates, refining the instruction to achieve optimal performance.
PromptAgent
Next up is a meta-prompting method we'll check out is called PromptAgent. PromptAgent views the prompt generation and optimization process as a planning problem and really tries to focus on leverage expert/SME knowledge in the prompt engineering process.
How It Works
- The process starts with an initial prompt and a target task.
- Outputs are generated and evaluated
- PromptAgent integrates expert-level knowledge in the feedback loops
- PromptAgent iteratively refines the prompt based on the feedback, growing the prompt space in a tree structure and prioritizing high-reward paths
Pros
- PromptAgent’s major differentiator is its focus on mimicking subject matter experts in the prompt engineering process, using expert-like feedback and insights to iteratively refine and optimize prompts.
- PromptAgent uses some prompt engineering to focus on leveraging expert-level knowledge
- The method effectively uses self-reflection and error feedback to achieve a higher level of precision and adaptability.
- AgentPrompt is task agnostic
Cons
- Implementing AgentPrompt in full requires setting up a tree like reasoning structure, which may be complicated and lead to higher costs
Example Implementation with PromptAgent
Here are the four core prompt templates directly from the PromptAgent.
Conversational Prompt Engineering (CPE)
I’m a huge fan of this next meta-prompting method. Conversational Prompt Engineering (CPE) is a simple chat-interface that guides users in creating and refining prompts through an interactive conversation.
How It Works
- First up, the user selects which model it wants to use (the “target model) and uploads a file with some input examples. For example, if the task is summarizing articles, the file could contain several full-length articles.
- The user and CPE go back in forth via a chat, asking relevant data-driven questions about output preferences. This interaction helps shape an initial prompt.
- Based on user feedback, CPE refines the initial prompt, and incorporates relevant details
- A prompt is created and is sent to a different model (”target” model) to generate outputs
- The user reviews the outputs and provides feedback some more feedback, adjustments are made as necessary
- Once all prompt outputs are approved, CPE generates a final few-shot prompt that includes examples approved by the user.
Pros
- CPE makes it easy for users to generate personalized, high-quality prompts without needing labeled data or pre-existing prompts
- The simple setup of a chat interface makes it extremely user friendly and leads to better outputs based on feedback and quick iterations
- CPE is task agnostic
Cons
- The iterative nature of CPE might be time-consuming, as it requires multiple rounds of user feedback to refine the prompts fully.
- Potential memory issues
Example Implementation
The following template demonstrates how CPE operates:
DSPy
The next meta-prompting method we'll examine is very popular framework called DSPy (Pronounced Dee-Ess-Pie). DSPy enables technical users to create, optimize, and manage complex pipelines of LLM calls in a structured, programmatic manner.
DSPy treats LLMs as modular components in a programming model, enabling users to combine, refine, and enhance their interactions over time.
How It Works
- Users define DSPy modules with specific tasks, like creative writing or sentiment analysis. Each module has a defined signature that outlines the input and expected output.
- DSPy constructs a sequence of LLM calls, allowing for step-by-step processing and optimization.
- DSPy iteratively refines the output by utilizing modules like
ChainOfThought
to guide the LLM's reasoning process. It collects data from user feedback and uses scoring mechanisms to identify and prioritize high-quality outputs. - Teleprompter modules evaluate the LLM's performance on generated outputs, refining prompts based on scoring metrics and improving the overall quality through adaptive feedback.
LLMs serve multiple roles within DSPy, from generating the initial prompts, to generating evaluating outputs, refining prompts, and learning from user interactions. This adaptive, and code-first, framework allows the LLM to evolve and improve its performance over time.
Pros
- DSPy allows for creating complex, multi-step prompt workflows that adapt based on user input and feedback, making it highly versatile.
- The system learns and improves over time, becoming more efficient at generating accurate and high-quality outputs.
- The integration of modules like
ChainOfThought
and teleprompters enables more structured, logic-driven prompt generation and evaluation. - DSPy’s ability to manage multiple LLM calls allows it to refine prompts through self-improving feedback loops, enhancing output quality over successive iterations.
- DSPy is task agnostic
Cons
- DSPy can be quite complex to set up and requires technical knowledge
- Managing large pipelines with iterative evaluations can become challenging
Example implementation
We’ll run through an example use case of content generation to see how DSPy works. Specifically lets say a user wants to create a blog post about "AI Trends 2024":
Step 1: Initialization
The user initializes a DSPy module to generate blog content.
content_generator = dspy.Predict("topic -> blog_content")
Step 2: Initial Content Setup
The user provides the topic.
content_generator(topic="AI Trends 2024")
Step 3: Initial Content Generation
The LLM generates the following content:
"AI Trends 2024: In the coming year, AI is expected to advance in natural language processing, autonomous vehicles, and healthcare. There will be greater adoption of AI in finance and retail sectors..."
Step 4: User Review and Feedback
The user reviews the generated content and provides feedback.
Feedback: "Include more examples of AI applications in healthcare."
The revised content:
"In 2024, AI will greatly impact healthcare, with AI-driven diagnostics enabling faster, more accurate medical assessments. Robotic surgery, guided by AI algorithms, will improve precision..."
Step 6: Scoring and Evaluation by Teleprompter
DSPy uses another LLM to score the content based on criteria like specificity, coherence, and completeness.
- Evaluation Score: 8/10
- Feedback: "The content is clear and covers examples well, but could benefit from a conclusion summarizing the key trends."
Final output:
"In summary, AI trends in 2024 will span across natural language processing, autonomous vehicles, and healthcare, revolutionizing how we interact with technology and improving outcomes across industries..."
End-to-End Process Recap:
- User Inputs: Initial topic, feedback for improvement.
- LLM Calls:
- Generated initial content based on the topic.
- Adapted prompts based on user feedback.
- Scored and provided evaluations to optimize the prompt.
- Finalized content with concluding elements.
DSPy is a little complicated but is also a very powerful way to leverage LLMs to generate, evaluate, and refine prompts through user interaction and automation.
TEXTGRAD: Textual Gradient-Based Optimization
The next and final meta-prompting method we'll explore is TEXTGRAD. In many ways, TEXTGRAD can be seen as a successor to DSPy, drawing inspiration from its approach while building upon it to make it even better.
The main differentiator that TEXTGRAD introduces is its emphasis on using natural language feedback as "textual gradients." This allows the model to iteratively refine prompts based on detailed, human-like suggestions, leading to more nuanced and accurate outputs over successive iterations.
How It Works
- It all starts with a base version of the prompt
- A second LLM or (human) reviews and evaluates the output and provides detailed natural language feedback. This feedback serves as the "textual gradient," highlighting areas for improvement.
- The original prompt and feedback are then sent to another LLM to generate an improved version of the prompt. This iterative process repeats until the prompt’s output meets the desired criteria.
In TEXTGRAD an LLM is used as both a generator and evaluator. The feedback from one model is used to help another refine the prompt in an iterative fashion.
Pros
- TEXTGRAD uses natural language feedback to refine outputs, which makes it very flexible
- Since TEXTGRAD focuses on natural language feedback, it allows for more nuanced and specific changes compared to more rigid, numerical optimization techniques
- TEXTGRAD is task agnostic, but really shines on tasks that require detail feedback like creative writing
Cons
- The iterative nature of TEXTGRAD makes it time-consuming and potentially expensive
Example Implementation
To show how TEXTGRAD operates, let's run through an example where a user wants to generate a blog post about "AI Trends in 2024":
The feedback might be:
"The introduction could be more engaging by starting with a surprising statistic or question. Include more specific examples, such as AI-driven diagnostics in healthcare and autonomous driving advancements."
Content Regeneration:
Using the new optimized prompt, the LLM generates revised content that better incorporates engagement and specific examples as per the feedback.
Further Iterations:
This process repeats, with the LLM incorporating additional feedback, such as improving the conclusion or adding more depth to specific sections.
Completion:
The cycle continues until the generated blog post meets the desired quality criteria, resulting in an engaging, well-structured, and comprehensive output.
Prompt generator tools
Prompt generators are an easy way to leverage meta prompting with minimal work. Fortunately, there are several platforms that streamline the whole process.
PromptHub’s Prompt Generator
Our new prompt generator tool brings together all the insights and techniques we've shared on our blog, packaging them into a free tool that you can use to create high-quality prompts.
Here's what our prompt generator offers:
- Tailored prompts: Adjusts prompts based on the model provider you're using, because one size doesn’t fit all.
- Best practices built-in: Leverages prompt engineering best practices—just describe your task, and the tool handles the rest.
- Completely free: Yes, you heard that right—our prompt generator is free!
Anthropic's Prompt Generator
Anthropic stands out as one of the leading companies in prompt engineering, and their prompt generator is a fantastic tool, especially if you frequently work within their development console. Optimized for Anthropic-specific prompts, their generator is both fast and an excellent starting point for creating effective prompts.
OpenAI's Prompt Generator
At OpenAI's most recent Dev Day, they launched a new feature that specifically generates system instruction prompts based on a task description However, it's worth noting that this feature isn't available for the o1 models, as they currently don't support system message role types.
Through some prompt injections we believe we were able to retrieve the meta prompt powering OpenAI's new tool. You can access it via the template we made in PromptHub here.
Conclusion
Meta prompting and using prompt generators can be a great place to start when working on a prompt. While they aren’t a complete replacement for understanding some of the underlying mechanisms of LLMs or prompt engineering best practices, but we highly recommend using them.
We touched on a variety of different meta-prompting methods and different prompt generators throughout this post. Implementing any of these frameworks will enhance your prompt's performance; it’s just a matter of finding the one that makes the most sense for your needs.
We're excited to see how you leverage our prompt generator, and we look forward to continuing to develop new prompt optimization tools that will make meta prompting more accessible for everyone.