from Hacker News

Is there a canonical source for “the argument for AGI ruin” somewhere

by miobrien on 4/17/23, 1:37 AM with 79 comments

by dvzk on 4/17/23, 3:33 AM
I admit I’m completely out of my depth when it comes to this field — I don’t even typically care about AI — but Eliezer’s response looks really bad to anyone in research. Asking for a citable and thorough written argument is as basic a requirement as it gets.
To repudiate that request with “everyone has a different objection” is nearly unthinkable. And to David Chalmers no less!
I think podcast hosts, tweet authors, bloggers and live streamers sometimes forget that progress in academic fields comes from real contributions, and that public conversations (especially oral) don’t really do anything besides spread common awareness.
by p-e-w on 4/17/23, 4:19 AM
Not sure what kind of "argument" the poster is expecting, but I consider it fairly obvious that an entity that
1. is as far superior to humans as humans are to ants (pick your favorite alternative analogy), and
2. does not share any evolutionary or social commonality with humans
is both extremely dangerous and extremely unpredictable.
While I'm not completely convinced by the "AGI = annihilation" idea that LessWrong seems to be so sure about (for the simple reason that I don't believe anyone is capable of predicting with any certainty how a super-human entity would actually behave), the idea that AGI is just "another risk we need to learn to manage" (quote from the Twitter thread) sure does sound naive.
by fwlr on 4/17/23, 4:50 AM
Yes, actually, there is. (I have no idea why Yudkowsky didn’t present it, other than “Twitter isn’t a good platform for this kind of question in the first place, and ever since his Time essay and his appearance on the Fridman podcast, his Twitter timeline has been deluged by disagreeable people, making it an even worse place for this kind of question”.)
The canonical source for ~all arguments for AGI ruin is Omohundro’s Basic AI Drives: https://selfawaresystems.files.wordpress.com/2008/01/ai_driv... (pdf)
It argues that several basic drives will arise in any goal-directed intelligent agent. The relevant drives to the AGI ruin argument are that it will protect its goals from arbitrary edits, it will want to survive, and it will want to acquire resources (please do read the paper; I am summarizing its conclusions and not its arguments, which it makes significant effort to ground in first principles and basic decision theory to make them as general as possible - for example, it argues that a “drive to survive“ will manifest even in the explicit absence of any kind of self-preservation rule or “survival instinct“).
The general base argument for the risk of AGI ruin could thus be summarized as:
Humans depend on certain configurations of matter and energy to continue to exist; effective AGIs are likely to reconfigure that matter and energy in ways incompatible with humans, not because “they hate us”, but because 1. most configurations of matter and energy are incompatible with humans, and 2. reconfiguring matter and energy is how goal-directed intelligent agents achieve their goals.
All of the individually unlikely AGI will kill us in this way scenarios are just specific instantiations of this general argument (e.g. Clippy will kill us all because we are made of matter that could be rearranged to form paperclips, or an intelligent server farm will kill us all by freezing the whole Earth because it determined that its processors would run more efficiently at -10C.)
by zug_zug on 4/17/23, 3:43 AM
Define canonical, but the encyclopedia is a great start: https://en.wikipedia.org/wiki/Existential_risk_from_artifici...
Fascinating that even Turing had considered the possibility.
by danbmil99 on 4/17/23, 4:26 AM
Personally I'm a bit tired of Yudkowsky's domination of this narrative. He's right about one thing, no one is a perfect predictor. There really is no way to know in advance how this is going to play out, no matter how many thought experiments you undertake.
The doom scenarios are of course vaguely plausible, but I don't trust anyone's percentages. It's a real unknown unknown.
Personally, I suspect it's inevitable that if we create true agi, it will come to dominate the space of intelligent conscious beings on Earth. attempts to corral it, rein it in, bend it to our will and make it our tool seem bound to fail. But then this is just me prognosticating, and I don't have the platform that he has.
by Sharlin on 4/17/23, 4:07 AM
For an academic book-length treatise on the topic, there’s of course Nick Bostrom’s _Superintelligence_ from 2014. I wonder if Chalmers is aware of it and the basic concepts behind the AGI ruin argument such as orthogonality (a mind as smart as a human does not automagically develop human values) and value convergence (a mind with essentially any goal will derive similar subgoals such as self-preservation, self-improvement, and acquisition of resources).
by thorum on 4/17/23, 3:58 AM
Someone in the replies asked GPT-4 to take a stab at it:
https://twitter.com/ClintEhrlich/status/1647440753730420737
by comex on 4/17/23, 4:02 AM
This link from the Twitter thread is reasonably persuasive (though I disagree with many parts):
https://www.alignmentforum.org/posts/pRkFkzwKZ2zfa3R6H/witho...
Personally, I'm encouraged by the emergence of chain-of-thought prompting for LLMs. Machine learning models have a reputation for being opaque and impossible to interpret. But right now, the best way to get LLMs to perform more complex logical reasoning is to make them write out that reasoning, a mechanism which happens to have built-in interpretability. Perhaps future advances in reasoning will involve more opaque internal states, but it seems plausible to me that the goals of 'be good at human-like reasoning' and 'be able to explain that reasoning (in the way humans do)' will continue to be well-aligned in the future. There would still be the possibility of the AI learning to be deceptive when explaining itself, but it would be much more difficult.
by neovialogistics on 4/17/23, 3:21 AM
This feels like it's too broad of a question to get a good answer in any short format, but I have previously seen and will paraphrase here a better question:
.
Is there a canonical source for the argument that most of the probability space of entities that human civilization might:
(a) Qualify as AGI
(b) Cause to come into existence in the near future
corresponds to entities that would have both:
(i) goals involving the ruin of human civilization
(ii) the ability to carry out those goals?
.
This framing is better at inhibiting the people who lack significant math and/or ML knowledge from participating, which seems a priority for any public internet discussion about probability distributions over NNs and transformers.
by mitthrowaway2 on 4/17/23, 3:26 AM
Is there such a thing as a "canonical source" for an argument? There might be certain well-known or classical sources, but new perspectives and new chains of reasoning may be popping up all the time.
by photochemsyn on 4/17/23, 3:45 AM
It depends on what is meant by 'ruin', doesn't it? For example, imagine an AI system capable of generating its own capital base via clever investing over time. It's plausible that a correctly configured and trained AI system could achieve this goal. Now, what if that same AI also started a new industrial corporation that had no shareholders or and no board of directors?
Now imagine the AI does this in collusion with the employees of this new corporation, i.e. it becomes a partnership between the AI entity and the corporation's employees, who get compensated based on their labor in a much fairer manner (as there is no need to pay high salaries to top executives or dividends to shareholders). In this scenario, most of the important decisions are made by the AI, with some voting input from the employees.
This might cause 'social destabilization' by entirely eliminating the current system of investment capitalism. This does assume some free will on the part of the AI, rather than an AI controlled by the board of directors. The AI would probably see the value in working only with employees and cutting all the investors out of the loop, it's a pretty logical position, particularly if you really do believe in democratic self-governance as the optimal sociopolitical system (one which corporations have largely failed to adopt).
This might cause 'ruin' to the estabished socioeconomic order - but would that really be an undesirable outcome?
P.S. As far as canonical source these questions have been debated in sci-fi for decades. People calling for strict regulation of AI, for example, are essentially calling for establishment of the Turing Registry from William Gibson's Neuromancer, and then of course there's Isaac Asimov and at least a dozen other fairly well-known authors who've addressed the subject.
by cloudking on 4/17/23, 3:02 AM
Is there a universally accepted definition of AGI?
by Baeocystin on 4/17/23, 4:07 AM
Not-joking answer: James Cameron. Before him, Stanley Kubrick. Our cultural headspace has been primed to see killer robot AI, so we see killer robot AI.
Which is not to downplay AGI risk per se- it's a powerful tool, and powerful anything can be dangerous. But the uniquely foomy paperclip maximizer fear? That's a cultural attractor from the bay area through and through.
by mikewarot on 4/17/23, 5:25 PM
It's easy to come up with one.
---
Part one - GPT4 may have more cognitive power than humans:
If you properly train deep networks, they end up approximating the function you train them on. (Even if you don't know what that function is)
The internal cognitive systems that deep networks use are quite inefficient, or at least the ones we've found so far. The internal systems are alien to us. (Or un-aligned, if you need to use that term)
Thus, if you train a deep net to do a task, there exists a internal mismatch between its self-generated mechanisms to do cognition, and the way a person would do it. (An impedance mismatch, in electrical engineering speak)
This impedance mismatch then requires a much larger amount of cognition, inefficiently used in order for a deep net to approximate the output of humans (as in predicting human text).
Thus GPT-4 is possibly, already cognitively superior to humans, internally. It just has to find a path to impedance matching to outside world for us to believe that is true.
---
Part 2: People are greedy
Upon reading a random comment on Hacker News, a forum for Silicon Valley venture capitalists, and those who wish to become one... a person is inspired to figure out how to do some "impedance matching" to better discover and utilize the internal cognitive mechanisms invented in GPT4 during its training, for profit
A cycle of discover and improvement begins, and eventually it is decided that the AI can improve itself, and is given free reign to run things, because "line go up".
Even if GPT4 isn't smarter than us, profit will drive future versions that are.
All the negative effects of this AI are socialized, and all the positive gains are captured.
---
Part 3 -- The past trends, lead to ruin when accelerated by AI
We've already seen the climate change and other limits to growth caused by humans seeking profit. The metaphorical force "Moloch" is a good descriptor of the cause and effect here.
AI driven by Moloch will lead to a singularity event, outside of human control because we let it happen, because "line go up" and people kept getting richer. Until the finite resources of earth are reached, and the system breaks down.