from Hacker News

Mastering Chess and Shogi by Self-Play with General Reinforcement Learning

by dennybritz on 12/6/17, 3:17 AM with 270 comments

by gwern on 12/6/17, 4:12 AM
This is an incredible demonstration that the AG Zero expert iteration method is a general method. If you go back to the discussions of AG Zero lo a month ago, there was a lot of skepticism that NNs would ever challenge Stockfish et al - they are just too good, too close to perfection, and chess not well suited for MCTS and NNs. Well, it turns out that AG Zero doesn't work as well in chess: it works better as it only takes 4 hours of training to beat Stockfish. This is going to be an impetus for researchers to explore solving many more MDPs than just chess or Go using expert iteration... ("There is no fire alarm.")
by soveran on 12/6/17, 10:03 AM
The ten sample games:
Sample game 1 https://lichess.org/VMe0gfa2
Sample game 2 https://lichess.org/Zqwn4Gzk
Sample game 3 https://lichess.org/G2fPHci8
Sample game 4 https://lichess.org/LLt8wyYp
Sample game 5 https://lichess.org/3r6CXx3H
Sample game 6 https://lichess.org/sbdyUYS4
Sample game 7 https://lichess.org/88vsAftE
Sample game 8 https://lichess.org/1uvCwaeB
Sample game 9 https://lichess.org/743quCXj
Sample game 10 https://lichess.org/SkCjxXkb
by xianshou on 12/6/17, 4:36 AM
One impressive statistic from the paper: AlphaZero analyzes 80,000 chess positions per second, while Stockfish looks at 70,000,000. Seventy million, three orders of magnitude higher. Yet AG0 beats Stockfish half the time as White and never loses with either color.
A stunning demonstration of generality indeed.
by magoghm on 12/6/17, 4:09 AM
"We also analysed the relative performance of AlphaZero’s MCTS search compared to the state-of-the-art alpha-beta search engines used by Stockfish and Elmo. AlphaZero searches just 80 thousand positions per second in chess and 40 thousand in shogi, compared to 70 million for Stockfish and 35 million for Elmo. AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variations – arguably a more “human-like” approach to search, as originally proposed by Shannon." <- Amazing!
by partycoder on 12/6/17, 3:59 AM
If you have seen the Stockfish project you will see many hardcoded weights in the configuration, found through experimentation. All these adjustments took probably years to achieve... and now Alpha Go Zero just self-learns everything and surpasses it.
Would be good to see Deepmind's solution play Arimaa and Stratego, and see what kind of strategy it comes up with. Or weird variations of Go.
Eventually this tech will make it into military strategy simulators and that's where things will get really messed up. 4 star generals will be replaced by bots.
by zwischenzug on 12/6/17, 7:04 AM
I smell a rat.
The paper says:
'AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi'
In the first game Stockfish's, 9. Qe1 is one of the strangest moves I've ever seen, which would never be considered by a human, let alone a superhuman.
11. Kh1 also makes little sense, but is not as bad. My Stockfish sees it as losing 0.2 pawns, which makes it highly suspect in such a position.
35. Nc4 is also a deeply puzzling move that my Stockfish sees as losing half a pawn immediately, and a whole pawn soon after.
50. g4 also suspect
52. e5 is insane.
This is bullshit.
Edit: bullshit is too much - see comments below.
Edit: Oh dear. We're doomed.
https://lichess.org/study/qiwMCyNQ
by cdelsolar on 12/6/17, 5:39 PM
I wanted to contact the authors directly but can't seem to find contact info at the moment, with a question. I hope some of you might know enough to answer it.
I'm interested in applying this method, or a similar neural-network / tabula rasa based method to the game of Scrabble. I read the original AlphaGo Zero paper and they mentioned that this method works best for games of perfect information. The standard Scrabble AI right now is quite good and can definitely beat top experts close to 50% of the time, but it uses simple Monte Carlo simulations to evaluate positions and just picks the ones that perform better. It doesn't quite account for defensive considerations or other subtleties of the game. I was wondering if anyone who had more insight into MCTS and NN would be able to talk me through how to apply this to Scrabble, or if it even makes sense. One of the issues I can see currently would be very slow convergence; as it has a luck factor, the algorithm could make occasional terrible moves and still win games, and thus be "wrongly trained".
by ericand on 12/6/17, 4:04 AM
Two things to note:
1) Alpha Zero beats AlphaGo Zero and AlphaGo Lee and starts tabla rasa
2) "Shogi is a significantly harder game, in terms of computational complexity, than chess (2, 14): it is played on a larger board, and any captured opponent piece changes sides and may subsequently be dropped anywhere on the board. The strongest shogi programs, such as Computer Shogi Association (CSA) world-champion Elmo, have only recently defeated human champions (5)"
by Scarblac on 12/6/17, 10:24 AM
As a chess player I find the win rate astonishing.
Given the drawish tendency at top level, among human players, in correspondence chess and also in the TCEC final, I thought that even absolutely perfect play wouldn't score so well against a decent Stockfish setup (which 64 cores and 1 minute per move should be).
by thom on 12/6/17, 7:11 AM
I can’t see any reference to whether Stockfish was configured with an endgame tablebase. It’d be interesting to see results then, as you’d expect AlphaZero’s superior evaluation to give it an advantage out of the opening, but later in the game Stockfish would have access to perfect evaluations. Obviously there’s nothing stopping you from plugging a tablebase into AlphaZero but that feels wrong.
by Invictus0 on 12/6/17, 3:59 AM
I'm not sure it's really fair to compare Stockfish to AlphaZero; AlphaZero used 24h of 5000 TPUs in compute time, and still needed 4 TPUs in real play, while Stockfish ran on just 64 threads and 1GB RAM. Nonetheless, still an impressive achievement.
by Aissen on 12/6/17, 9:03 AM
Serious question: how does one evaluate the results reproducibility of this paper ?
Maybe I'm missing some things but:
- Are 1st gen TPUs even accessible ? You have to fill out a form to learn more about those second generation TPUs: https://cloud.google.com/tpu/
- I can't find the source code
This does not look like a scientific paper, but a (very impressive) tech demo.
by thomasahle on 12/6/17, 11:59 AM
Discussion at the Computer Chess Club (CCC) forum: http://www.talkchess.com/forum/viewtopic.php?topic_view=thre...
and
http://www.talkchess.com/forum/viewtopic.php?topic_view=thre...
by tboerstad on 12/6/17, 6:00 AM
Stockfish plays like an ambitious amateur in the first game, giving away a piece for two pawns on move 13.
Perhaps this move was justified though, as later in the same game Stockfish gets a position which is at worst drawn, likely winning. Moves later however, around move 40, Stockfish gets its own knight trapped and the game is over.
This is not the kind of chess we normally see from Stockfish.
by naveen99 on 12/6/17, 4:04 AM
Very happy to see this result. It's like a moral victory for humans, as alphago is more human like (discounting montecarlo search) than stockfish. Maybe deep learning will give us the next Euler, Newton, or Einstein.
by nl on 12/6/17, 9:08 AM
For those complaining about the TPU resources used during self training it is worth noting that Stockfish has used over 10,000 CPU hours for tuning its parameters. See https://github.com/mcostalba/Stockfish/blob/master/Top%20CPU...
by 110011 on 12/6/17, 9:42 AM
What an amazing result! Evaluating fewer (by a factor of 1000) positions AlphaZero still beats Stockfish.
In the figure on its preferred openings I find it very interesting that it doesn't like the Ruy Lopez very much over training time (there is a small bump but that is transient). I am hardly a chess expert but I know that it was very favored at the world championships so maybe the chess world will be turned upside down by this result now?
Positing that the chess world is bigger than the Go world (in terms of interest and finances) there is probably going to be a race to replicate these results "at home" and train yourself before your competitors :)
by elcapitan on 12/6/17, 9:17 AM
What would be a good starting point to learn about the AI behind that for a "normal" programmer? There seem to be so many resources now that it's hard to choose. Combination of hands-on plus theory would be good.
by asdfologist on 12/6/17, 5:36 AM
While this sounds impressive, I'll believe it when AlphaZero wins TCEC.
by gallerdude on 12/6/17, 3:58 AM
I wonder if being an expert at one game makes it easier to be an expert at another. If so, then maybe the examples are datasets, and convergence would be able to complete new tasks after a few examples.
by luckyt on 12/6/17, 4:54 AM
It doesn't seem to like the Sicilian Defense (1.e4 c5), which is the most popular opening by human players. I wonder if this will change opening theory?
by narrator on 12/6/17, 9:21 AM
So when are they going to apply this to Atari Games or well anything? The next step is they have one AI figure out the rules by making a GAN that imitates player behavior and the other AI be Alpha Go which tweaks the GAN inputs to generate different moves to win. Voila...Almost General Purpose AI that can learn to play any game.
by Sukotto on 12/6/17, 4:55 AM
Is this a library or something I can download and try training myself (on a small scale)?
I'm not in a position to read the paper right now, so my apologies if that's covered in there. I want to ask just in case it's not, while this is still on the front page.
by lern_too_spel on 12/6/17, 4:29 AM
What is its win percentage against itself on each side of the board in each game? Is chess a draw for its style of play? Is there a first move advantage for the other games with its play style?
by hmate9 on 12/6/17, 12:38 PM
So AlphaGo Zero used 4 TPUs while AlphaZero used 1500. It’s not immediately obvious to me why there is this massive difference. Can anyone elaborate?
by skc on 12/7/17, 1:14 PM
I'm only a fairly pedestrian chess player, but I looked at one of these games between AGZ and SF and aside from the endgame, AGZ played in a manner that almost seemed alien. It seemed to completely ignore various little rules of thumb which is to be expected in hindsight but fairly mind-blowing when you actually watch a game.
by bfirsh on 12/6/17, 10:08 AM
Here's an HTML version of the paper:
https://www.arxiv-vanity.com/papers/1712.01815/
Table 2 is broken, but the rest is much more readable if you're on a phone.
by wskish on 12/7/17, 4:48 AM
The more interesting metric going forward is performance at a given power budget (not unlike with motorsports). The TPUs are consuming sooo much power here! Most interesting real-world problems are power-limited, including in nature (e.g. metabolic limits).
by k2xl on 12/6/17, 2:19 PM
Chess.com forum thread https://www.chess.com/forum/view/general/stockfish-dethroned
by hyperpape on 12/6/17, 4:23 AM
This paper compares AlphaZero to the 20 block version of AlphaGo Zero that was trained for 3 days. Am I right in thinking that this version was significantly less strong than the 40 block version? If so, does it matter?
by TwoBit on 12/8/17, 8:35 AM
Wasn't Stockfish gimped for this competition? No openings, no endgame tables, low RAM, etc? If that's so then this AI did not in fact beat the computer chess champ.
by naveen99 on 12/6/17, 1:21 PM
Is there an sdk or compiler for using the google tpu's beyond just using tensorflow ? Is the tpu backend of tensorflow based on cuda, opencl, plain c or something else ?
by imrehg on 12/6/17, 12:56 PM
As a Shogi enthusiast (but complete beginner), I'd like to have seen more Shogi details in the article. Nevertheless there's plenty of other things to geek out on...
by auggierose on 12/6/17, 3:47 PM
Great result, but without access to source code this is not a scientific paper.
by SubiculumCode on 12/6/17, 7:45 PM
There is only one way for a human to win at chess against these computers; and it involves violence against the chess board.
by foobaw on 12/6/17, 6:57 AM
Did Magnus play against this? Is there a way we can see the game?
by plg on 12/6/17, 4:57 PM
source code?
by stretchwithme on 12/6/17, 8:16 AM
See, Mom? Self play is a good thing.
by firebones on 12/6/17, 4:15 AM
A lot of the graphs in the paper seem to level out as they hit the level of the opponent. It makes me wonder to what extent AlphaGo Zero is merely optimizing to beat flaws in existing opponents' current implementations (even if "existing opponents" == all available opponents' data and algorithms today) rather than generalizable insights into the underlying game. Because wouldn't you expect that unless we are at the theoretical limit of perfect chess that a tabula rasa approach might exceed existing best practice significantly, especially with the massive computation advantage it has?
Not that there's anything wrong with that; AlphaGo Zero supposedly optimized for the "just enough" win rather than the crushing win. It doesn't even mean Stockfish is doomed--I suspect Stockfish could beat it in a future heads up match provided that Zero didn't have time to retrain, but that a retrained Zero (having the benefit of optimizing against a new Stockfish) would be able to supersede it once again.
by ericand on 12/6/17, 3:57 AM
Certainly a significant achievement. Also, kind of interesting that the AlphaGo team spent a lot of energy to convince us Go is much harder than Chess, only to turn around and tell us that it is amazing that it can also win at Chess.