from Hacker News

How AlphaZero Mastered Its Games

by jsomers on 12/28/18, 8:36 PM with 78 comments

  • by glinscott on 12/29/18, 8:25 AM

    James put together a really nice summary of the ideas and the projects!

    It was almost a year ago that lc0 was launched, since then the community (led by Alexander Lyashuk, author of the current engine) has taken it to a totally different level. Follow along at http://lczero.org!

    Gcp has also done an amazing job with Leela Zero, with a very active community on the Go side. http://zero.sjeng.org

    Of course, DeepMind really did something amazing with AlphaZero. It’s hard to overstate how dominant minimax search has been in chess. For another approach (MCTS/NN) to even be competitive with 50+ years of research is amazing. And all that without any human knowledge!

    Still, Stockfish keeps on improving - Stockfish 10 is significantly stronger than the version AlphaZero played in the paper (no fault of DeepMind; SF just improves quickly). We need a public exhibition match to setttle the score, ideally with some GM commentary :). To complete the links you can watch Stockfish improve here: http://tests.stockfishchess.org.

  • by alan_wade on 12/29/18, 11:14 AM

    God what a well written article! I don't have much to say on the subject, but this was pure joy to read, it's crazy good. Clear, engaging, to the point, making a difficult subject accessible without dumbing it down, no fluff or unnecessary side stories, just awesomeness.
  • by stabbles on 12/29/18, 10:53 AM

    What's very interesting is that the Komodo developers have implemented a Monte Carlo Tree Search version of their engine without neural nets for evaluation / move selection. This brand new engine can actually compete at the top level (still much worse than Stockfish and slightly worse than Lc0) [1] [2]

    The exact implementation details are probably kept secret, but the idea is to do a few steps of minimax / alpha-beta rather than completely random play in the playout phase of MCTS.

    This makes me think that the contribution of AlphaZero is not necessarily neural nets, but rather MCTS as a succesful method to search the game tree efficiently.

    [1] http://tcec.chessdom.com/ [2] http://www.chessdom.com/komodo-mcts-monte-carlo-tree-search-...

  • by YeGoblynQueenne on 12/29/18, 11:19 AM

    >> In fact, less than two months later, DeepMind published a preprint of a third paper, showing that the algorithm behind AlphaGo Zero could be generalized to any two-person, zero-sum game of perfect information (that is, a game in which there are no hidden elements, such as face-down cards in poker).

    I can't find this claim in the linked paper. What I can find is a statement that AlphaZero has demonstrated that 'a general-purpose reinforcement learning algorithm can achieve, tabula rasa, superhuman performance across many challenging domains'.

    Personally, and I'm sorry to be so very negative about this, but I don't even see the "many" domains. AlphaZero plays three games that are very similar to each other. Indeed, shoggi is a variant of chess. There are certainly two-person, zero-sum, perfect-information games with radically different boards and pieces to either Go, or chess and shoggi - say, the Royal Game of Ur [1], or Mancala [2], etc, not to mention stochastic games of perfect information, like backgrammon, or assymetric games like the hnefatafl games [3], and so on.

    Most likely, AlphaZero can be trained to play many such games very powerfully, or at a superhuman level. The point however is that, currently, it hasn't. So no "demonstration" of general game-playing has taken place, and of course there is no such thing as some sort of theoretical analysis that would serve as proof, or indication, of such ability in any of the DeepMind papers.

    I was hoping for less ra-ra cheerleading from the New Yorker, to be honest.

    ________________

    [1] https://en.wikipedia.org/wiki/Royal_Game_of_Ur

    [2] https://en.wikipedia.org/wiki/Mancala

    [3] https://en.wikipedia.org/wiki/Tafl_games

  • by cdelsolar on 12/29/18, 4:49 PM

    Awesome article. Does anyone know how to begin applying the AlphaZero techniques to games where information is NOT perfect? I'm trying to apply it to Scrabble. There hasn't been much AI research in this game and right now the best AI just uses brute force Monte Carlo with a flawed evaluation function (which doesn't take into account the state of the board at all, just points and tiles remaining on the opponent's rack). It's still good enough to beat top human experts about half the time, but I want to make something better.

    Is it impossible to apply to these types of games? Every time I read about AlphaZero the articles mention that the techniques are meant for games of perfect information.

  • by FPGAhacker on 12/29/18, 8:24 AM

    They mention a documentary on Netflix about AlphaGo. Any recommendations for or against?
  • by eismcc on 12/29/18, 2:23 PM

    If you are interested in how to build a bot, Manning is having a Go bot competition:

    https://deals.manning.com/go-comp/

    It’s been really fun to work through the books.

  • by lgeorget on 12/29/18, 4:59 PM

    The article is very well written but that sentence felt a bit wierd:

    > Before there could be acceptance, there was depression. “I want to apologize for being so powerless,” he said in a press conference.

    Lee Sedol was clearly upset, especially after the first two matches, but I think that apology was more out of politeness than depression, really.

  • by deegles on 12/29/18, 1:15 PM

    Is there a way to play against an AlphaGo or equivalent but with adaptive difficulty? I know next to nothing about go and think it would be interesting to learn it just by playing vs. a neural network. Maybe over time the strategies it uses would be "transferred" over to me!
  • by pie_hacker on 12/29/18, 1:33 PM

    The match between Stockfish and AlphaZero was played with certain unjustified parameters (time control, ponder off, different hardware, no opening book or endgame tablebase for Stockfish etc.). By "unjustified," I mean that the authors of the paper did not justify their choice of parameters in the paper as being designed to implement a fair match.

    At a glance, the parameters of the match seem unfair to me -- and tilted heavily towards AlphaZero. If the code, were open source, this would not matter; anyone could run a rematch. As it is, I haven't seen any convincing evidence that AlphaZero is stronger than Stockfish when Stockfish is allowed to use its full breadth of knowledge and run on equal hardware.

  • by tosser0001 on 12/29/18, 3:10 PM

    > An expert human player is an expert precisely because her mind automatically identifies ...

    The "Patronizing 'Her'"

    Almost invariably, when the author decides to use the patronizing 'her' instead of the gender-neutral 'they' it's written by a man.