from Hacker News

Llama3 is the first open model that can also win text adventures

by s-macke on 4/20/24, 11:10 AM with 16 comments

by nicklecompte on 4/20/24, 12:19 PM
I am extremely skeptical of LLMs and don't want to be mean to what's clearly a fun little experiment. But I thought that bit about "Claude 3 had the most fun playing the game" was strangely naive. Not just because these systems are incapable of emotions, but because Claude 3 was saying clearly "insincere" things like "wow, what a twist ending!" The reason why this is almost certainly "insincere" is that Claude 3 was trained on Wikipedia, including the Wikipedia article for this short, 24-year-old adventure game, which includes a full plot summary. So there are two possible things going on (not mutually exclusive)
1) Claude 3 is poor at knowledge transfer and can't connect the events of the game with a Wikipedia summary of those events.
2) Claude is simply imitating surprise because it noticed text that "looked like" a twist ending.
In general I wish the author had considered possible data contamination: how much does Claude 3 know about this game in a fresh session? If it can recite details from some online walkthrough then I am not sure this experiment is even meaningful.
by lolinder on 4/20/24, 12:22 PM
Direct link to Llama3's run through [0]. Having played around extensively with Llama 2 70B, these results are extremely impressive and exciting to me. It can do really interesting things like correct itself when it uses the wrong verb or isn't specific enough.
Also, its ability to understand and remember the orientation of the rooms shows levels of comprehension that Llama 2 would have completely flopped on:
> SITUATION: I've searched the floor in the living room, but found nothing of interest.
> THOUGHT: Maybe I should head to the kitchenette and see if I can find any clues there.
> COMMAND: Go east
[0] https://github.com/s-macke/AdventureAI/blob/master/assets/90...
by marviel on 4/20/24, 12:20 PM
The transcript is here
https://github.com/s-macke/AdventureAI/blob/master/assets/90...
by GaggiX on 4/20/24, 12:16 PM
Reading the github repo, it doesn't seem that Llama 3 70B has won the game but "is the only open weight model, which can play through the first pass of the game".
by tmitchel2 on 4/20/24, 1:42 PM
Do you think some form of memory would help, i.e. RAG using semantic lookup of prior situations and thoughts and outcomes. I feel like this is something humans do intuitively to bring similar scenarios to the front of mind so we can get a basis for how to deal with novel situations...
by jonplackett on 4/20/24, 12:11 PM
I’ve been really impressed with Llama3. I’m deeply sceptical of benchmarks and just tend to go on gut feeling and this is the only chatbot I’ve talked to besides gpt4 (I haven’t tried opus yet) that feels like another intelligence on the other end of the line.
by singularity2001 on 4/20/24, 12:25 PM
```
   > In conclusion most of the Large Language Models can play and win text adventures
```
Doesn't the table above show that NONE of the LLMs won the game, not even GPT-4?