by adaboese on 3/4/24, 11:48 PM with 2 comments
- Claude 3 (Opus) https://gist.github.com/adaboese/d0b7397381726a7d394920e6a82ee39c
Both of these are outputs of AIMD app. They are not made using a single prompt, but rather using RAG with over a dozen instructions. This allows to test a quite broad range of expectations, such as the adherence to instructions, error rate, speed, etc. Since the two model APIs are mostly compatible, I've decided to compare it side-by-side.
A few interesting observations:
- Claude followed instructions a lot closer than OpenAI. The outline that was provided to the initial instructions is pretty close to the final article structure despite multiple revisions.
- Claude output scored better in terms of use of broader set of data formats (tables, lists, quotes).
- Contrary to many tweets, Claude output is not excessively verbose. Worth mentioning that part of RAG instructions to rewrite content for brevity.
- Claude took 5 minutes to execute 52 prompts. OpenAI took 7 minutes.
by adaboese on 3/4/24, 11:49 PM
by adaboese on 3/4/24, 11:50 PM