by dinp on 2/17/25, 8:06 AM with 1 comments
by dinp on 2/17/25, 8:06 AM
Click on 'Play moves' to watch a replay.
I initially planned to run a chess tournament for LLMs but they are not good: besides obvious mistakes, they output incorrect moves, get stuck in loops by repeating the same moves and the smaller models fail to output valid json frequently. I thought the reasoning models like o3 mini might be good, but they are an incremental improvement in chess.
Feedback and suggestions for other games to explore welcome.