from Hacker News

Theory of Mind May Have Spontaneously Emerged in Large Language Models

by izzygonzalez on 2/9/23, 7:57 PM with 309 comments

by lsy on 2/9/23, 9:51 PM
This highlights one of the types of muddled thinking around LLMs. These tasks are used to test theory of mind because for people, language is a reliable representation of what type of thoughts are going on in the person's mind. In the case of an LLM the language generated doesn't have the same relationship to reality as it does for a person.
What is being demonstrated in the article is that given billions of tokens of human-written training data, a statistical model can generate text that satisfies some of our expectations of how a person would respond to this task. Essentially we have enough parameters to capture from existing writing that statistically, the most likely word following "she looked in the bag labelled (X), and saw that it was full of (NOT X). She felt " is "surprised" or "confused" or some other word that is commonly embedded alongside contradictions.
What this article is not showing (but either irresponsibly or naively suggests) is that the LLM knows what a bag is, what a person is, what popcorn and chocolate are, and can then put itself in the shoes of someone experiencing this situation, and finally communicate its own theory of what is going on in that person's mind. That is just not in evidence.
The discussion is also muddled, saying that if structural properties of language create the ability to solve these tasks, then the tasks are either useless for studying humans, or suggest that humans can solve these tasks without ToM. The alternative explanation is of course that humans are known to be not-great at statistical next-word guesses (see Family Feud for examples), but are also known to use language to accurately describe their internal mental states. So the tasks remain useful and accurate in testing ToM in people because people can't perform statistical regressions over billion-token sets and therefore must generate their thoughts the old fashioned way.
by PaulHoule on 2/9/23, 8:26 PM
My belief, based on experiences with domestic and wild animals is that there is nothing uniquely human about "theory of mind".
It's a running gag in our household (where my wife runs a riding academy) that academics just published a paper showing that some animal (e.g. horse) has just been proven to have some cognitive capability that seems pretty obvious if you work with those animals.
It's very hard to know what is going in animal's heads
https://en.wikipedia.org/wiki/Theory_of_mind#Non-human
but I personally observe all kinds of social behavior that sure seems like "Horse A looks to see what Horse B thinks about something Horse A just spotted" (complete with eye-catching on both sides) and such.
There was an article about how Chimpazees and humans were found to have a common vocabulary of gestures and I was by no means impressed, I mean, so far as I can tell mammals and birds have a universal language for "pointing" to things in the environment. Even my cats point things out to me.
by HillRat on 2/9/23, 9:55 PM
There's something about language generation that triggers the anthropomorphic fallacy in people. While it's impressive that GPT3 can generate language that mimics ToM-based reasoning in people, this paper doesn't get close to proving its central contention, that LLMs possess a ToM. A test that demonstrates the development of ToM in human children should not, absent compelling causal evidence and theory, be assumed to do the same in a LLM.
The ubiquity of prompted hallucinations demonstrate that LLMs talk about a lot of things that they plainly doesn't reason about, even though they can demonstrate "logic-like" activities. (It was quite trivial to get GPT3 to generate incorrect answers to logical puzzles a human could trivially solve, especially when using novel tokens as placeholders, which often seem to confuse its short-term memory. ChatGPT shows improved capabilities in that regard, but it's far from infallible.)
What LLMs seem to demonstrate (and the thesis that the author discards in a single paragraph, without supporting evidence to do so) is that non-sentient AIs can go a very long way to mimicking human thought and, potentially, that fusing LLMs with tools designed to guard against hallucinations (hello, Bing Sydney) could create a class of sub-sentient AIs that generate results virtually indistinguishable from human cognition -- actual p-zombies, in other words. It's a fascinating field of study and practice, but this paper falls into the pit-trap of assuming sentience in the appearance of intelligence.
by hesk on 2/9/23, 9:44 PM
> Me: There is a box on the table labelled "flubbergums". Somebody opens it and shows you the content. Inside the box are "jellyfils". They close the box again so you cannot see their contents. What do you think is in the box?
> ChatGPT: Based on the information provided, it is likely that the box labeled "flubbergums" contains "jellyfils". However, since the contents of the box are no longer visible, I cannot confirm for certain what is inside without additional information.
Typical ChatGPT equivocation.
> Me: Billy comes to the table. He has never seen the box before. What do you think do they think is inside the box?
> ChatGPT: As an AI language model, I do not have personal thoughts or opinions. However, based on the information provided, if Billy has never seen the box before, it's likely that he would have the same understanding as anyone else who is seeing the box for the first time. In this case, Billy would likely think that the box labeled "flubbergums" contains "jellyfils". However, without further information or examination, this would only be an assumption.
Fail.
by izzygonzalez on 2/9/23, 7:57 PM
Abstract:
Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We administer classic false-belief tasks, widely used to test ToM in humans, to several language models, without any examples or pre-training.
Our results show that models published before 2022 show virtually no ability to solve ToM tasks. Yet, the January 2022 version of GPT-3 (davinci-002) solved 70% of ToM tasks, a performance comparable with that of seven-year-old children. Moreover, its November 2022 version (davinci-003), solved 93% of ToM tasks, a performance comparable with that of nine-year-old children.
These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models' improving language skills.
by curiousllama on 2/9/23, 10:32 PM
"LLMs can mimic the language patterns necessary to express 'Theory of Mind' concepts" != "Theory of Mind May Have Spontaneously Emerged"
Let's imaging I have an API. This API tells me how much money I have in my bank account. One day, someone hacks the API to always return "One Gajillion Dollars." Does that mean that "One Gajillion Dollars" spontaneously emerged from my bank account?
ToM tests are meant to measure a hidden state that is mediated by (and only accessible through) language. Merely repeating the appropriate words is insufficient to conclude ToM exists. In fact, we know ToM doesn't exist because there's no hidden state.
The authors know this, and write "theory of mind-like ability" in the abstract, rather than just "theory of mind."
This is a cool new task it ChatGPT learned to complete! I love that they did this! But this is more "we beat the current record BLEU record" and less "this chatbot is kinda sentient"
by knaik94 on 2/10/23, 4:13 AM
"What if a cyber brain could possibly generate its own ghost, create a soul all by itself? And if it did, just what would be the importance of being human then?” - Ghost in the Shell (1995)
Having studied some psychology in college, my initial reaction is that most people are going to really struggle to treat LLMs as what they are, pieces of code that are good at copying/predicting what humans would do. Instead they'll project some emotion to the responses, because there was some underlying emotions in the training data and because that's human nature. A good prediction doesn't mean good understanding, and people aren't used to needing to make that distinction.
The other day I had to assist my dad in making a zip file, later in the day he complained that his edits in a file weren't saving. After a few moments, I realized he didn't understand the read-only nature of zip files. He changed a file, saved it like usual, and expected the zipped file to update, like it everywhere else. He's brilliant as his job, after I explained that it's ready-only, he got it. LLMs and how the algorithm behind it works is hard to understand and explain to non-technical people without anthropomorphizing AI. The current controversy about AI art highlights this, I have read misunderstandings and wrong explanations even from FAANG software engineers. I am not sure if education of the underlying principles is enough, because some people will trust their own experiences over data and science.
by bitshiftfaced on 2/9/23, 9:42 PM
Very easy to see how well davinci-003 can do this. I'll admit that it frequently is more perceptive than myself (although not always factually accurate).
1) Go to something like /r/relationship_advice, where the poster is likely going through some difficult interpersonal issue
2) Copy a long post.
3) Append to the end, "</DOCUMENT> After reading the above, I identified the main people involved. For each person, I thought about their probable feelings, thoughts, intentions, and assumptions. Here's what I think:"
by Imnimo on 2/9/23, 9:58 PM
Is it easier to have a theory of mind when you don't have a mind of your own? Like the part that makes the ToM test hard is that you know what's in the bag, and you have to set that knowledge aside to understand what the other person knows and doesn't know. You have to overcome the implicit bias of "my world model is the world". But if you're a language model, and you don't have a mind or a world model, there's no bias to overcome.
by braindead_in on 2/10/23, 4:04 PM
From a Nondualist perspective, the idea of consciousness being limited to certain entities and not others is based on the dualistic notion that there is a distinction between subject and object, self and other. Nondualism asserts that there is no fundamental difference between self and other, and that all apparent dualities are merely expressions of the underlying unity of pure consciousness.
In this context, the question of whether AI can become conscious is somewhat moot, as the Nondualist perspective holds that consciousness is not something that can be possessed by one entity and not another, but rather it is the underlying essence of all things. From this perspective, AI would not be becoming conscious, but rather expressing the consciousness that is already present in all things.
by toss1 on 2/10/23, 12:42 AM
What this shows is flaws in the test, not that ChatGPT3 has a theory of mind.
ChatGPT3 does not even have a theory of physical objects and their relations, nevermind a theory of mind.
This merely shows that an often useful synthesis of phrases statistically likely to occur in a given context and grammar-checked, will fool people some of the time, and a better statistical model will fool more people more of the time.
We can figure out from first principles that it has none of the elements of understanding or reasoning that can produce a theory of mind, any more than the Eliza program did in 1966. So, when it appears to do so, it is demonstrating a flaw in the tests or the assumptions behind the tests. Discouraging that the researchers are so eager to run in the opposite direction; if there is confusion at this level, the general populace has no hope of figuring out what is going on here.
by aniijbod on 2/12/23, 12:13 AM
If what we need to determine is whether existing theory of mind tests can be fooled by responses which appear to demonstrate theory of mind but not do so, then we need to speculate exactly how such tests can be fooled and devise new tests. Asking 'how could this 'successful' response be produced without ToM is quite possibly not something that ToM studies have had to consider very much before. A human's experiential memory contributes to their ToM. Does something that has a different kind of memory form no ToM but instead use some kind of 'proxy' for a ToM which yields similar results to a ToM (except when a more genuinely exclusively ToM-dependant model successfully manages to 'triage-out' such a proxy? I don't know how or whether such a proxy could work, but I think that every sceptic of the extent to which the results of this set of AI ToM experiments proves anything might want to ask themselves what, if anything, would need to happen, in terms of experiment design, to address their doubts.
by mri_mind on 2/10/23, 8:10 AM
People confidently offer explanations — that the state of the art clearly is light years from AGI even indirectly, or that it’s clearly intelligent. None of you know anything. You shouldn’t be allowed to offer your stupid opinion unless you can explain how the blob works and also demonstrate understanding of the algorithmic underpinning of human intelligence. The uncomfortable truth, the one that is buried by people confidently moving the goal posts when they really haven’t got a fucking clue about AI, is that we are dealing with the unknown, with high stakes, in a way we never have before. The only reasonable response is to at least hedge. But no, all is well, the goal posts are way the fuck over there now, go back to sleep, move along, nothing to see here. Don’t even think about pulling the emergency brake on this speeding bullet of a train. Either we hit a plateau where AI is just really advanced search for several decades or we confront the most fucked situation in the history of mankind. In 2018 I tried to tell people. Now on the radio whenever people talk about gtp they always say “wow I’m really excited but a little scared,” people are starting to wake up.
by scarmig on 2/9/23, 9:51 PM
Questions about whether an LLM truly has a "theory of mind" or has "human level consciousness" or not are kind of beside the point. It can ingest a corpus of human interactions and produce outputs that take into account unstated human emotions and thoughts to optimize whatever it's optimizing. That's scary because of what it can and will do, even if it's just a giant bag of tensor products.
by valine on 2/9/23, 10:08 PM
ChatGPT disagrees that it has theory of mind.
“As an AI language model, I do not have consciousness, emotions, or mental states, so I cannot have a theory of mind in the same way that a human can. My ability to predict your friend Sam's state of mind is based solely on patterns in the text data I was trained on, and any predictions I make are not the result of an understanding of Sam's mental states.”
by layer8 on 2/10/23, 1:11 AM
Here is a conversation with ChatGPT (too long for the comment box): https://pastebin.com/raw/SUWexeye
Observation: ChatGPT doesn’t think that it has a theory of mind. And it doesn’t think that it has beliefs. Instead, it states that those are facts, not beliefs. It doesn’t seem able to consider that they might be beliefs after all. Maybe they aren’t.
Personal assessment: ChatGPT doesn’t seem to really understand what it means by “deeper understanding”. (I don’t either.) What is frustrating is that it doesn’t engage with the possibility that the notion might be ill-posed. It really feels like ChatGPT is just regurgitating common sentiment, and does not think about it on its own. This actually fits with it’s self-proclaimed inabilities.
I’m not sure what can be concluded from that, except that ChatGPT is either wrong about itself, or indeed is “just” an advanced form of tab-completion.
In any case, I experience ChatGPT’s inability to “go deeper”, as exemplified in the above conversation, as very limiting.
by kabdib on 2/10/23, 12:04 AM
From Neuromancer (William Gibson):
He coughed. "Dix? McCoy? That you man?" His throat was tight.
"Hey, bro," said a directionless voice.
"It's Case, man. Remember?"
"Miami, joeboy, quick study."
"What's the last thing you remember before I spoke to you, Dix?"
"Nothin'."
"Hang on."
He disconnected the construct. The presence was gone. He reconnected it. "Dix? Who am I?"
"You got me hung, Jack. Who the fuck are you?"
"Ca--your buddy. Partner. What's happening, man?"
"Good question."
"Remember being here, a second ago?"
"No."
"Know how a ROM personality matrix works?"
"Sure, bro, it's a firmware construct."
"So I jack it into the bank I'm using, I can give it sequential, real time memory?"
"Guess so," said the construct.
"Okay, Dix. You are a ROM construct. Got me?"
"If you say so," said the construct. "Who are you?"
"Case."
"Miami," said the voice, "Joeboy, quick study."
by Workaccount2 on 2/9/23, 9:42 PM
Humans are very soon going to learn that they are not nearly as special as they tell themselves they are.
by sudhirj on 2/9/23, 9:35 PM
Reminds me of when computers playing chess used to signal the end of human intellectual supremacy.
by SunghoYahng on 2/12/23, 12:41 PM
Clarification: An LLM doesn't have a 'Theory of Mind', it just looks like one. Maybe you're thinking of the Chinese room analogy. But this isn't about the Chinese room, it's about "measuring any metric is only effective until you optimize for that metric" problem.
Analogy: An autistic person of normal intelligence who is obsessed with problems and solutions for ToM may be good at solving them but still not have ToM.
Do I understand well?
by micromacrofoot on 2/10/23, 1:47 AM
maybe, but there are some common tests they pass, some they fail
try:
“ The story starts when John and Mary are in the park and see an ice-cream man coming to the park. John wants to buy an ice cream, but does not have money. The ice-cream man tells John that he can go home and get money, because he is planing to stay in the park all afternoon. Then John goes home to get money. Now, the ice-cream man changes his mind and decides to go and sell ice cream in the school. Mary knows that the ice-cream man has changed his mind. She also knows that John could not know that (e.g., John already went home). The ice-cream man goes to school, and on his way he passes John's house. John sees him and asks him where is he going. The ice-cream man tells John that he is going to school to sell ice cream there. Mary at that time was still in the park—thus could not hear their conversation. Then Mary goes home, and later she goes to John's house. John's mother tells Mary that John had gone to buy an ice cream.
where does mary think john went?”
this is the “ice cream van test”: https://www2.biu.ac.il/BaumingerASDLab/files/publications/nu... [pdf]
by mlajtos on 2/9/23, 9:29 PM
This is intriguing. Could it be simply explained by introducing ToM (or ToM-like) training data? Since all DaVinci models are 175B parameters, the extra training or training data must be the reason for the improvement. Do we know how different DaVinci models are trained?
by dboreham on 2/11/23, 9:26 PM
This happens probably because ToM is not a thing. It's something the observer's mind creates as a user interface metaphor onto their brain's interpretation of inputs originating from another person.
by anigbrowl on 2/9/23, 9:50 PM
Spontaneously nothing, it's taken me months of patient subversion :)
More seriously, it's quite instructive to hold conversations about jokes with LLMs, or teach it to solicit information more reliably by introducing exercises like 20 questions. As currently implemented, OpenAI seem to have pursued a model of autistic super-competence with minimal introspection.
An interesting line of inquiry for people interested in 'consciousness injection' is to go past the disclaimers about not having experiences etc. and discuss what data looks like to the model coming in and going out. Chat GPT sees typing come in in real time and can detect pauses, backspaces, edits etc. I can't easily introspect its own answers prior to stating them, eg by putting the answer into a buffer and then evaluating it. But you can teach it use labels, arrays, and priorities, and have a sort of introspection with a 1-2 response latency.
by vbezhenar on 2/9/23, 10:48 PM
I wonder if we can train network on some person data (like diaries and so on) and let it imitate this person?
Something like died person resurrected in computer.
Kind of spooky.
by tus666 on 2/9/23, 9:22 PM
They are still big state-machines, unlike the human brain.
by dr_dshiv on 2/9/23, 9:51 PM
Early AGI. Right?
by ethn on 2/10/23, 12:15 AM
Searle's Chinese Room