from Hacker News

Concrete AI Safety Problems

by aston on 6/22/16, 12:32 AM with 94 comments

by lacker on 6/22/16, 1:12 AM
This sort of "internal" approach to AI safety, where you attempt to build fundamental limits into the AI itself, seems like it is easily thwarted by someone who intentionally builds an AI without these safety mechanisms. As long as AI is an open technology there will always be some criminals who just want to see the world burn.
IMO, a better approach to AI safety research is to focus on securing the first channels that a malicious AI would be likely to exploit. Like spam, and security. Can you make communications spam-resistant? Can you make an unhackable internet service?
Those seem hard, but more plausible than the "Watch out for paperclip optimizers" approach to AI safety. It just feels like inventing a way to build a nuclear weapon that can't actually explode, and then hoping the problem of nuclear war is solved.
by colah3 on 6/22/16, 12:54 AM
Paper: https://arxiv.org/pdf/1606.06565v1.pdf
Google Post: https://research.googleblog.com/2016/06/bringing-precision-t...
It was a pleasure for us to work on this with OpenAI and others. John/Paul/Jacob are good friends, and wonderful colleagues! :)
by fiatmoney on 6/22/16, 4:59 AM
These are not asking the right questions, although they kind of hint at it, and they are not fundamentally questions about AI. Example: "Can we transform an RL agent's reward function to avoid undesired effects on the environment?" Trivially, the answer is yes; put a weight on whatever effect you're trying to mitigate, to the extent you care about trading off potential benefits. They qualify this by saying essentially "... but without specifying every little thing". So - what you're trying to do is build a rigorous (ie, specified by code or data) model of what a human would think is "reasonable" behavior, while still preserving freedom for gordian knot style solutions that trade off things you don't care about in unexpected ways.
The hard part is actually figuring out what you care about, particularly in the context of a truly universal optimizer that can decide to trade off anything in the pursuit of its objectives.
This has been a core problem of philosophy for 3000 years - that is, putting some amount of rigorous codification behind human preferences. You could think of it as a branch of deontology, or maybe aesthetics. It is extremely unlikely that a group sponsored by Sam Altman, whose brilliant idea was "let's put the government in charge of it" [1], will make a breakthrough there.
I don't actually doubt that AIs would lead to philosophical implications, and philosophers like Nick Land have actually explored some of that area. But I severely doubt the ability of AI researchers to do serious philosophy and simultaneously build an AI that reifies those concepts.
[1] http://blog.samaltman.com/machine-intelligence-part-2
by arcanus on 6/22/16, 1:24 AM
In a variety of engineering fields, including but not limited to software, we have wonderful tools to track down and eliminate 'bugs'. While high standards are often not upheld, the concepts are largely sound.
In particular, I'm talking about verification and validation testing. I'm curious why generally these approaches are not being leveraged to ensure quality of output here.
I suspect this is because of the persistent belief that AI will annihilate humanity with one mishap, but I'm suggesting that we approach this much more like traditional engineering problems, such as building a bridge or flying a plane, whereby rigorous standards of are continually applied to ensure the system behaves as designed.
The resulting system will look much more like continuous integration with robust regression testing and high line coverage than it will be the sexy research ideas presented here, but I can't help but think it will be more robust. These systems are too complicated to treat them as anything but a black box, at least from a quality assurance standpoint.
by Animats on 6/22/16, 4:12 AM
From the article: Safe exploration. Can reinforcement learning (RL) agents learn about their environment without executing catastrophic actions? For example, can an RL agent learn to navigate an environment without ever falling off a ledge?
Yes. That's why I was critical of an academic AI effort which attempts automatic driving by training a supervised learning system by observing human drivers. That's going to work OK for a while, and then do something really stupid, because it has no model of catastrophic actions.
by chrisfosterelli on 6/22/16, 12:51 AM
Direct link to the paper: https://arxiv.org/pdf/1606.06565v1.pdf
by pizza on 6/22/16, 1:11 AM
Related to the wireheading problem [0], [1]
[0] http://www.wireheading.com/ - David Pearce's ideas are.. interesting.. to say the least ;)
[1] https://wiki.lesswrong.com/wiki/Wireheading
by DrNuke on 6/22/16, 11:02 AM
I may be stupid and I am indeed but it is insanely straightforward today (not tomorrow) to put a gun on a drone and tell it to image recognize some targeted 1.3-2.3m tall biped with oval head and shoot him/her down.
by Mendenhall on 6/22/16, 1:33 AM
I always get the feeling AI is going to be like nuclear capability. Great reason to create it, but then once its made everyone wants to get rid of it.
by glaberficken on 6/22/16, 11:42 AM
How would we program a self driving car that is faced with something like a "Trolley problem" [1]. i.e. the car is faced with 2 possible probable collisions of which it can only avoid one. Or between running over a pedestrian and crashing into a tree.
I assume this probably already worked into the current prototypes. Does anyone have references to discussions about this in current gen self driving car prototypes?
[1] https://en.wikipedia.org/wiki/Trolley_problem
by fitzwatermellow on 6/22/16, 1:03 PM
> Can we transform an RL agent's reward function > to avoid undesired effects on the environment?
To me this is the toughest nut in the lot. Training a Pac-man agent to avoid ghosts and eat pellets, in a world of infinite hazards and cautions! Any strategies?
by w_t_payne on 6/22/16, 11:07 AM
We have well established techniques for developing systems which are safe and exhibit high levels of integrity. We just need to make the tools that support these techniques freely available.
by kordless on 6/22/16, 5:38 AM
> Avoiding negative side effects.
Oh brother. Avoiding negative side effects is a wasteful proposition. Learning from those side effects, however, is priceless.
by mountaineer22 on 6/22/16, 2:31 AM
Any recommendations for relevant AI related sci-fi?
by JoeAltmaier on 6/22/16, 1:27 PM
Its common today to make a robot that kills anybody that comes within a foot or two of it. Without any image recognition at all; much more damaging than a gun; and 100M of them already deployed. This conversation is silly and pointless, until we clean up the insane number of land mines deployed around our planet.
by daveguy on 6/22/16, 2:27 AM
And whatever you do, don't let Randall Munroe teach it:
http://xkcd.com/1696/ (the current xkcd)
by yarou on 6/22/16, 1:36 AM
Seems like hyperparameter optimization to me. These techniques will be useful in general when selecting your model.
by logicallee on 6/22/16, 3:14 PM
it seems the authors have retracted their concerns. The site is down now but I got this screenshot http://imgur.com/eL7GFOr