from Hacker News

Ask HN: Can we adapt AlphaZero's self-play technique for better human learning?

by stenecdote on 12/8/17, 8:06 PM with 33 comments

Since I lack the ML background to debunk this suspicion, I figured I'd let HN debunk it for me. Seeing AlphaZero's success at learning Chess, Shogi, and Go, I was immediately struck with the intuition that the fact that AlphaZero could learn so much from "self-play" should provide some insight into improving human teaching and learning strategies. With the caveat that humans lack AlphaZero's ability to separate themselves into two versions, I can imagine a teaching paradigm that emphasizes simulating competitive activities but playing as both sides. Is something like this at all related to what AlphaZero's doing and are there chess training paradigms that emphasize this type of simulation?

by jdoliner on 12/8/17, 10:04 PM
Short answer: No, there's nothing new here that can inform better human learning.
Longer answer: The concept of self-play isn't new in any sense. All chess players use this technique to some degree. None use only this technique. The advantage of self play is that there's no risk of accidentally picking up someone else's an incorrect assumption. Since you're deriving everything from scratch. Some people take this to extremes, there's a math professor who doesn't read any math papers so that he's deriving everything from first principles and not "contaminating his mind" it works quite well for him but unfortunately I'm blanking on his name. However, commitment to this technique removes one of the major advantages that humans have which is their ability to communicate knowledge amongst themselves in a compact, abstract way with language. Humans also have a pretty good way to mitigate the faulty assumption risk: skepticism. We can reevaluate our assumptions, and, if we deem it necessary, excise them from our mental model. AlphaZero could in theory do the same thing, the reality for AlphaZero though is that there's not much point, it has no use for the sum total of human knowledge on chess, it's capable of recreating that and much more in a few hours.
If there is something to be learned from AlphaZero's training it's that you should always be skeptical of your assumptions, that's not anything new, but it's always worth reiterating. It's pretty obviously not feasible to take this to the extremes of AlphaZero though, humans need other humans to learn. Even the math professor who doesn't read papers needed a lot of interfacing with other humans to learn to get to the point where he could derive things from first principles.
by conistonwater on 12/8/17, 9:09 PM
Don't humans already do this, in a way? Instead of playing against yourself, you take somebody stronger and play them. You only need on the order of a 100 games of chess against a decent opposition, with some verbal explanations, to reach amateur level. Per-game, this is much more efficient than AlphaZero, which requires millions of games as well as tons of computing power. Surely the main reason AlphaZero uses that particular technique is that nobody can figure out something better? You'd really want it to copy learning techniques from humans (especially learning from many fewer examples), not the other way around.
by infinity0 on 12/8/17, 9:46 PM
AlphaZero plays games with (1) perfect information and (2) well-defined winning conditions. Neither of these hold for most human-learning scenarios.
I can imagine that a healthy dose of probability theory (and probably more advanced stuff I don't know about[1]) might improve (1), but (2) is going to keep computer scientists and philosophers and ethicists arguing for quite a long time. :)
[1] get the joke, eh? eh? eh?
by ThrustVectoring on 12/8/17, 10:14 PM
Human brain architecture already does "self-play" during REM sleep. So yeah, but the implementation details are "get more sleep" rather than some sort of novel technique.
by stenecdote on 12/8/17, 9:47 PM
It only now occurs to me that the line of thinking I follow here is sub-consciously inspired by section 3 of this Marvin Minsky talk (https://web.media.mit.edu/~minsky/papers/TuringLecture/Turin...). If you're at all interested in the intersection of learning and computer science, I highly recommend taking a look.
by Cookingboy on 12/8/17, 9:28 PM
Are we sure AlphaZero has better learning efficiency than human?
Sure, it reached peak skill after 4 hours of learning, but how many games did it play during those 4 hours? How many moves did it memorize perfectly and analyzed? Are those numbers even achievable by a human in one's lifetime?
Even with AlphaZero's efficiency, it still evaluates 80000 moves per second, which is by far more moves than a human grandmaster evaluates in an entire game. If we cut AlphaZero's "processing power" to that of a human, can it still beat a top level human player, let alone other AIs?
To me it seems like there is still a long way to go to improve in this space.
by forgot-my-pw on 12/8/17, 10:22 PM
I don't think we can learn much from how an engine learns, but we certainly can learn from its results.
For example, there's this interesting discussion: https://www.reddit.com/r/chess/comments/7ibzq4/stockfish_vs_...
Because Alphazero did not learn from human games, it looks at the different pieces without attaching values like we do. It has no problems sacrificing a higher "valued" piece for the sake of its strategy.
by egypturnash on 12/8/17, 11:26 PM
I would submit that we already have an example of self-play being used as part of a strategy to learn chess: chess problems.
Something like "Here's a board position. It looks utterly hopeless but the problem says "Black to mate in 7 moves". How can you get there from here without relying on White making any beginner's mistakes?" is pretty much self-play.
by ararar on 12/11/17, 12:43 PM
I think there is a possibility of applying machine learning to teaching humans in the sense of continuous, algorithmic tuning/personalization of lesson plans/teaching strategies to accelerate human learning ... as a teacher's aide in other words.
by canadaduane on 12/9/17, 12:36 AM
My impression of Plato is he channeled different people/characters in his writing in order to create adversarial conditions in which he could improve his rhetoric. Perhaps this is similar to AlphaZero's technique?
by eutropia on 12/8/17, 10:49 PM
Remember that AlphaZero played 44 million games of chess, whereas your average professional chess player has played somewhere on the order of 10,000-100,000. Self-play works, but rather slowly.