from Hacker News

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

by aklein on 7/11/22, 3:32 PM with 77 comments

  • by hervature on 7/11/22, 6:25 PM

    I'm still trying to grok and implement the paper, but I studied AlphaGo/AlphaZero/MuZero during my PhD. The core contribution here is the Nash equilibrium component to imperfect information games using only self-play. Note, there is no MCTS being done in this paper. This differs from counter factual regret methods (like the most famous Poker AIs) because it does not need to compute for all possible "information sets" which makes it intractable for even sufficiently complicated poker variants. It should also be noted (as they do in the paper) that this is more incremental than methodologically innovative as AlphaGo. This is the AlphaZero step increment to NeuRD. As is my general critique with their previous papers, they generally omit many engineering details that prove to be very important. Here, they admit that fine-tuning is vitally important (one of the 3 core steps) but details are relegated to the supplementary materials. It also opens up the question of if this new "fine-tuned" policy still guarantees the Nash equilibrium which it obviously does not as some mixed strategies are going to have sufficiently small probability. I wish researchers would be more honest with "this is a hack to get things to work on a computer because neural networks have floating point inaccuracies". It doesn't ruin any of the theory and no one is going to hold it against you. But it causes all sorts of confusion when trying to reimplement.
  • by miiiiiike on 7/11/22, 5:44 PM

    Got a copy of Stratego (one of the old-style ones with descending rank, as pleases the gods) so I could show my board game loving girlfriend one of my favorite games from when I was a kid. She hated it.
  • by thomasahle on 7/12/22, 12:03 AM

    I really like the section on initial piece deployment:

    > The Flag is almost always put on the back row, and often protected by Bombs. Occasionally, however, DeepNash will not surround the Flag with Bombs. Experts (e.g. Vincent de Boer, 3-fold World Champion) believe that it is indeed good to occasionally not protect the Flag because this unpredictability makes it harder for the opponent in the end-game. Another pattern observed is that the highest pieces, the 10 and 9, are often deployed on different sides of the board. Additionally, the Spy is quite often located not too far away from the 9 (or 8), which protects it against the opponent’s 10. DeepNash does not often deploy Bombs on the front row, which complies with the behavior seen from strong human players. The 3’s (Miner), which can defuse Bombs, are often placed on the back row, which makes sense because their importance typically increases throughout a game as more opponent Bombs and potential Flag positions get revealed. The eight 2’s (Scout) are typically deployed both in the front and more in the back, allowing to scout opponent pieces initially but also in later phases of the game.

  • by spywaregorilla on 7/11/22, 4:00 PM

    Stratego is an odd choice I feel. Evaluating it must be really hard. A significant chunk of the game is just trying to remember which unit is which of the things you've seen so far. Which humans generally can't do very well but machines can do easily. Beating hand crafted bots is good.

    Can expert humans beat the hand crafted bots? I'm guessing no. Also, what's stratego like without the hidden units? Is that... hard?

  • by hirundo on 7/11/22, 4:29 PM

    When I was a kid I "won" a Stratego game with a non-move that my friend claimed was against the rules. So he claimed the win. Could I get an umpire's call here?

    The issue is that when considering my next move, I picked up the bomb piece, thought for a few moments, put it back down, and moved another piece. My friend then, assuming that I had just given away that it was not a bomb, attempted to capture it, and lost the attacker.

    He claimed that it was illegal to pick up that piece and put it down again, although he had no objection until he learned that I'd tricked him. We had never previously announced or enforced a touch-it-move-it rule.

    So did I win that game or did he? That's not a question machine learning could answer.

  • by evouga on 7/11/22, 5:05 PM

    I mean. Stratego is a great game; I had a lot of fun playing it at summer camps when I was a young boy. It's cool there's a good AI for it.

    But this result feels a bit anticlimactic in a world where AIs can already beat expert humans at go, six-player poker, Starcraft, ...

  • by Someone on 7/11/22, 11:12 PM

    My gut feeling is that optimum play in Stratego is not to play.

    It feels better to let your opponent try and take your piece because, if they take it, you can make sure there will be at least one neighboring piece that can strike back.

    If so, every game should end in a draw because of inactivity of both players.

    My limited experience confirms that. Playing defensively, only offering my scouts to get intel tends to win games for me.

    But then, I’ve never found any strategy guides, and wouldn’t know how good players play.

  • by voidfunc on 7/11/22, 4:47 PM

    I haven't played Stratego since I lost my board when I was in third grade and brought it to play during recess...

    Is there a good online version these days?

  • by mensetmanusman on 7/11/22, 8:46 PM

    Would it be interesting if the posted the approximate kWhr energy required to train?
  • by bezoz on 7/12/22, 12:02 AM

    So we have gone from DQN to Alpha Go to Alpha Zero to Mu Zero to Deep Nash? Every time I thought I have figured out their naming scheme, they come out with something even more unpredictable.
  • by warrenm on 7/11/22, 3:34 PM

    I haven't played Stratego in decades!

    Loved it as a kid, though