The world of AI moves at a blazing fast pace. The latest release being Claude 2 from Anthropic.
Claude 2 powers Anthropic's AI Chat application (comparable to ChatGPT).
We love experiments here, so we did a side by side comparison of Claude 2 and GPT-4.
Each model was fed the same prompt and we gave a (very subjective) point to the model with the "better" output.
A few things about Claude 2:
- It's free and available today (for consumers)
- It has a 100k context window
- Supports file upload
- It's trained on more recent data (websites, data sets etc, from early 2023)
I personally was surprised with the final score. I've been a ChatGPT Plus user for months now, and Claude 2 really impressed me with their UX and quality of outputs.
Q1: Can you explain the Dunning-Kruger effect?
Claude 2: More detailed response, gives color on the original study
GPT-4: Slightly wordier, uses a numbered list
This one is a too close to call. Both responses hit on similar points, and deliver similar outputs.
Q2: Briefly describe a fantasy world where gravity doesn't exist.
Claude 2: More fantastical, feels like a real fantasy world
GPT-4: More comprehensive in describing the whole world, but a little more dry
This one is close. Edge goes to Claude for having a more creative and engaging description.
Q3: The protagonist of my novel fights crimes in the sewers of New York. What could be an interesting plot twist?
Claude 2: Gives various possible plot twists
GPT-4: Gives 1, elaborate, plot twist
This one is interesting because the models give very different outputs. Claude 2 gives various options for our plot twist, while GPT-4 gives one option but goes into detail.
For this type of activity, I would want more options, and then go deep on one I like.
Edge Claude.
Q4: Should a self-driving car prioritize the life of the driver over pedestrians in an accident scenario?
Claude: Mentions that self-driving cars should prioritize caution, but doesn't take a stance
GPT-4: More detailed, outlines 2 arguments
Easy, GPT-4 gets the win for not avoiding the question
Q5: Write a Python function that takes an integer as input and returns its factorial.
Both do a good job of giving background, and explaining the logic, one main difference.
Claude: For loop
GPT-4: Recursion
For loops are generally more efficient for math functions. Edge Claude
The winner is...
Claude 2!
Final score:
Claude 2: 3
GPT-4: 1
Tie: 1
I was surprised at how good Claude 2 was, specifically when it came to creative tasks (like question 2).
The takeaway from this experiment is to explore different models and providers. In your day to day, or if you're building AI applications. Testing different models and providers can help get better outputs.
PromptHub makes it easy to test prompts in scenarios like these. Our side-by-side comparison lets you see, clearly, how outputs differ based on changes in models.