by s-macke on 4/20/24, 11:10 AM with 16 comments
by nicklecompte on 4/20/24, 12:19 PM
1) Claude 3 is poor at knowledge transfer and can't connect the events of the game with a Wikipedia summary of those events.
2) Claude is simply imitating surprise because it noticed text that "looked like" a twist ending.
In general I wish the author had considered possible data contamination: how much does Claude 3 know about this game in a fresh session? If it can recite details from some online walkthrough then I am not sure this experiment is even meaningful.
by lolinder on 4/20/24, 12:22 PM
Also, its ability to understand and remember the orientation of the rooms shows levels of comprehension that Llama 2 would have completely flopped on:
> SITUATION: I've searched the floor in the living room, but found nothing of interest.
> THOUGHT: Maybe I should head to the kitchenette and see if I can find any clues there.
> COMMAND: Go east
[0] https://github.com/s-macke/AdventureAI/blob/master/assets/90...
by marviel on 4/20/24, 12:20 PM
https://github.com/s-macke/AdventureAI/blob/master/assets/90...
by GaggiX on 4/20/24, 12:16 PM
by tmitchel2 on 4/20/24, 1:42 PM
by jonplackett on 4/20/24, 12:11 PM
by singularity2001 on 4/20/24, 12:25 PM
> In conclusion most of the Large Language Models can play and win text adventures
Doesn't the table above show that NONE of the LLMs won the game, not even GPT-4?