by dennybritz on 12/6/17, 3:17 AM with 270 comments
by gwern on 12/6/17, 4:12 AM
by soveran on 12/6/17, 10:03 AM
Sample game 1 https://lichess.org/VMe0gfa2
Sample game 2 https://lichess.org/Zqwn4Gzk
Sample game 3 https://lichess.org/G2fPHci8
Sample game 4 https://lichess.org/LLt8wyYp
Sample game 5 https://lichess.org/3r6CXx3H
Sample game 6 https://lichess.org/sbdyUYS4
Sample game 7 https://lichess.org/88vsAftE
Sample game 8 https://lichess.org/1uvCwaeB
Sample game 9 https://lichess.org/743quCXj
Sample game 10 https://lichess.org/SkCjxXkb
by xianshou on 12/6/17, 4:36 AM
A stunning demonstration of generality indeed.
by magoghm on 12/6/17, 4:09 AM
by partycoder on 12/6/17, 3:59 AM
Would be good to see Deepmind's solution play Arimaa and Stratego, and see what kind of strategy it comes up with. Or weird variations of Go.
Eventually this tech will make it into military strategy simulators and that's where things will get really messed up. 4 star generals will be replaced by bots.
by zwischenzug on 12/6/17, 7:04 AM
The paper says:
'AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi'
In the first game Stockfish's, 9. Qe1 is one of the strangest moves I've ever seen, which would never be considered by a human, let alone a superhuman.
11. Kh1 also makes little sense, but is not as bad. My Stockfish sees it as losing 0.2 pawns, which makes it highly suspect in such a position.
35. Nc4 is also a deeply puzzling move that my Stockfish sees as losing half a pawn immediately, and a whole pawn soon after.
50. g4 also suspect
52. e5 is insane.
This is bullshit.
Edit: bullshit is too much - see comments below.
Edit: Oh dear. We're doomed.
by cdelsolar on 12/6/17, 5:39 PM
I'm interested in applying this method, or a similar neural-network / tabula rasa based method to the game of Scrabble. I read the original AlphaGo Zero paper and they mentioned that this method works best for games of perfect information. The standard Scrabble AI right now is quite good and can definitely beat top experts close to 50% of the time, but it uses simple Monte Carlo simulations to evaluate positions and just picks the ones that perform better. It doesn't quite account for defensive considerations or other subtleties of the game. I was wondering if anyone who had more insight into MCTS and NN would be able to talk me through how to apply this to Scrabble, or if it even makes sense. One of the issues I can see currently would be very slow convergence; as it has a luck factor, the algorithm could make occasional terrible moves and still win games, and thus be "wrongly trained".
by ericand on 12/6/17, 4:04 AM
1) Alpha Zero beats AlphaGo Zero and AlphaGo Lee and starts tabla rasa
2) "Shogi is a significantly harder game, in terms of computational complexity, than chess (2, 14): it is played on a larger board, and any captured opponent piece changes sides and may subsequently be dropped anywhere on the board. The strongest shogi programs, such as Computer Shogi Association (CSA) world-champion Elmo, have only recently defeated human champions (5)"
by Scarblac on 12/6/17, 10:24 AM
Given the drawish tendency at top level, among human players, in correspondence chess and also in the TCEC final, I thought that even absolutely perfect play wouldn't score so well against a decent Stockfish setup (which 64 cores and 1 minute per move should be).
by thom on 12/6/17, 7:11 AM
by Invictus0 on 12/6/17, 3:59 AM
by Aissen on 12/6/17, 9:03 AM
Maybe I'm missing some things but:
- Are 1st gen TPUs even accessible ? You have to fill out a form to learn more about those second generation TPUs: https://cloud.google.com/tpu/
- I can't find the source code
This does not look like a scientific paper, but a (very impressive) tech demo.
by thomasahle on 12/6/17, 11:59 AM
and
http://www.talkchess.com/forum/viewtopic.php?topic_view=thre...
by tboerstad on 12/6/17, 6:00 AM
Perhaps this move was justified though, as later in the same game Stockfish gets a position which is at worst drawn, likely winning. Moves later however, around move 40, Stockfish gets its own knight trapped and the game is over.
This is not the kind of chess we normally see from Stockfish.
by naveen99 on 12/6/17, 4:04 AM
by nl on 12/6/17, 9:08 AM
by 110011 on 12/6/17, 9:42 AM
In the figure on its preferred openings I find it very interesting that it doesn't like the Ruy Lopez very much over training time (there is a small bump but that is transient). I am hardly a chess expert but I know that it was very favored at the world championships so maybe the chess world will be turned upside down by this result now?
Positing that the chess world is bigger than the Go world (in terms of interest and finances) there is probably going to be a race to replicate these results "at home" and train yourself before your competitors :)
by elcapitan on 12/6/17, 9:17 AM
by asdfologist on 12/6/17, 5:36 AM
by gallerdude on 12/6/17, 3:58 AM
by luckyt on 12/6/17, 4:54 AM
by narrator on 12/6/17, 9:21 AM
by Sukotto on 12/6/17, 4:55 AM
I'm not in a position to read the paper right now, so my apologies if that's covered in there. I want to ask just in case it's not, while this is still on the front page.
by lern_too_spel on 12/6/17, 4:29 AM
by hmate9 on 12/6/17, 12:38 PM
by skc on 12/7/17, 1:14 PM
by bfirsh on 12/6/17, 10:08 AM
https://www.arxiv-vanity.com/papers/1712.01815/
Table 2 is broken, but the rest is much more readable if you're on a phone.
by wskish on 12/7/17, 4:48 AM
by k2xl on 12/6/17, 2:19 PM
by hyperpape on 12/6/17, 4:23 AM
by TwoBit on 12/8/17, 8:35 AM
by naveen99 on 12/6/17, 1:21 PM
by imrehg on 12/6/17, 12:56 PM
by auggierose on 12/6/17, 3:47 PM
by SubiculumCode on 12/6/17, 7:45 PM
by foobaw on 12/6/17, 6:57 AM
by plg on 12/6/17, 4:57 PM
by stretchwithme on 12/6/17, 8:16 AM
by firebones on 12/6/17, 4:15 AM
Not that there's anything wrong with that; AlphaGo Zero supposedly optimized for the "just enough" win rather than the crushing win. It doesn't even mean Stockfish is doomed--I suspect Stockfish could beat it in a future heads up match provided that Zero didn't have time to retrain, but that a retrained Zero (having the benefit of optimizing against a new Stockfish) would be able to supersede it once again.
by ericand on 12/6/17, 3:57 AM