from Hacker News

Claude vs. OpenAI GPT-4 generated content, side-by-side comparison

by adaboese on 3/4/24, 11:48 PM with 2 comments

- OpenAI GPT-4 https://gist.github.com/adaboese/12e3c3d28783bc831c202ad1e55d932b

- Claude 3 (Opus) https://gist.github.com/adaboese/d0b7397381726a7d394920e6a82ee39c

Both of these are outputs of AIMD app. They are not made using a single prompt, but rather using RAG with over a dozen instructions. This allows to test a quite broad range of expectations, such as the adherence to instructions, error rate, speed, etc. Since the two model APIs are mostly compatible, I've decided to compare it side-by-side.

A few interesting observations:

- Claude followed instructions a lot closer than OpenAI. The outline that was provided to the initial instructions is pretty close to the final article structure despite multiple revisions.

- Claude output scored better in terms of use of broader set of data formats (tables, lists, quotes).

- Contrary to many tweets, Claude output is not excessively verbose. Worth mentioning that part of RAG instructions to rewrite content for brevity.

- Claude took 5 minutes to execute 52 prompts. OpenAI took 7 minutes.

by adaboese on 3/4/24, 11:49 PM
Forgot to mention, Claude appears to be a lot more rate-limited that OpenAI. Hit quite a few concurrency rate limits, but as long as you have auto-retry, it is non-issue.
by adaboese on 3/4/24, 11:50 PM
Image inputs (prompts) are generated with respective models, but the actual images are generated using DALLE.