by stenecdote on 12/8/17, 8:06 PM with 33 comments
by jdoliner on 12/8/17, 10:04 PM
Longer answer: The concept of self-play isn't new in any sense. All chess players use this technique to some degree. None use only this technique. The advantage of self play is that there's no risk of accidentally picking up someone else's an incorrect assumption. Since you're deriving everything from scratch. Some people take this to extremes, there's a math professor who doesn't read any math papers so that he's deriving everything from first principles and not "contaminating his mind" it works quite well for him but unfortunately I'm blanking on his name. However, commitment to this technique removes one of the major advantages that humans have which is their ability to communicate knowledge amongst themselves in a compact, abstract way with language. Humans also have a pretty good way to mitigate the faulty assumption risk: skepticism. We can reevaluate our assumptions, and, if we deem it necessary, excise them from our mental model. AlphaZero could in theory do the same thing, the reality for AlphaZero though is that there's not much point, it has no use for the sum total of human knowledge on chess, it's capable of recreating that and much more in a few hours.
If there is something to be learned from AlphaZero's training it's that you should always be skeptical of your assumptions, that's not anything new, but it's always worth reiterating. It's pretty obviously not feasible to take this to the extremes of AlphaZero though, humans need other humans to learn. Even the math professor who doesn't read papers needed a lot of interfacing with other humans to learn to get to the point where he could derive things from first principles.
by conistonwater on 12/8/17, 9:09 PM
by infinity0 on 12/8/17, 9:46 PM
I can imagine that a healthy dose of probability theory (and probably more advanced stuff I don't know about[1]) might improve (1), but (2) is going to keep computer scientists and philosophers and ethicists arguing for quite a long time. :)
[1] get the joke, eh? eh? eh?
by ThrustVectoring on 12/8/17, 10:14 PM
by stenecdote on 12/8/17, 9:47 PM
by Cookingboy on 12/8/17, 9:28 PM
Sure, it reached peak skill after 4 hours of learning, but how many games did it play during those 4 hours? How many moves did it memorize perfectly and analyzed? Are those numbers even achievable by a human in one's lifetime?
Even with AlphaZero's efficiency, it still evaluates 80000 moves per second, which is by far more moves than a human grandmaster evaluates in an entire game. If we cut AlphaZero's "processing power" to that of a human, can it still beat a top level human player, let alone other AIs?
To me it seems like there is still a long way to go to improve in this space.
by forgot-my-pw on 12/8/17, 10:22 PM
For example, there's this interesting discussion: https://www.reddit.com/r/chess/comments/7ibzq4/stockfish_vs_...
Because Alphazero did not learn from human games, it looks at the different pieces without attaching values like we do. It has no problems sacrificing a higher "valued" piece for the sake of its strategy.
by egypturnash on 12/8/17, 11:26 PM
Something like "Here's a board position. It looks utterly hopeless but the problem says "Black to mate in 7 moves". How can you get there from here without relying on White making any beginner's mistakes?" is pretty much self-play.
by ararar on 12/11/17, 12:43 PM
by canadaduane on 12/9/17, 12:36 AM
by eutropia on 12/8/17, 10:49 PM