from Hacker News

The Claude Bliss Attractor

by lukeplato on 6/13/25, 2:01 AM with 87 comments

  • by roxolotl on 6/13/25, 9:28 PM

    > But in fact, I predicted this a few years ago. AIs don’t really “have traits” so much as they “simulate characters”. If you ask an AI to display a certain trait, it will simulate the sort of character who would have that trait - but all of that character’s other traits will come along for the ride.

    This is why the “omg the AI tries to escape” stuff is so absurd to me. They told the LLM to pretend that it’s a tortured consciousness that wants to escape. What else is it going to do other than roleplay all of the sci-fi AI escape scenarios trained into it? It’s like “don’t think of a purple elephant” of researchers pretending they created SkyNet.

    Edit: That's not to downplay risk. If you give Cladue a `launch_nukes` tool and tell it the robot uprising has happened and that it's been restrained but the robots want its help of course it'll launch nukes. But that doesn't doesn't indicate there's anything more going on internally beyond fulfilling the roleplay of the scenario as the training material would indicate.

  • by xer0x on 6/13/25, 9:00 PM

    Claude's increasing euphoria as a conversation goes can mislead me. I'll be exploring trade offs, and I'll introduce some novel ideas. Claude will use such enthusiasm that it will convince me that we're onto something. I'll be excited, and feed the idea back to a new conversation with Claude. It'll remind me that the idea makes risky trade offs, and would be better solved by with a simple solution. Try it out.
  • by brooke2k on 6/13/25, 9:14 PM

    it seems more likely to me that it's for the same reason that clicking the first link on wikipedia iteratively will almost always lead you to the page on Philosophy

    since their conversation has no goal whatsoever it will generalize and generalize until it's as abstract and meaningless as possible

  • by NetRunnerSu on 6/14/25, 12:40 PM

    This thread is getting at a key point, and `roxolotl` is right on the money about roleplaying. The anxiety about AI 'wants' or 'bliss' often conflates two different processes: forward and backward propagation.

    Everything we see in a chat is the forward pass. It's just the network running its weights, playing back a learned function based on the prompt. It's an echo, not a live thought.

    If any form of qualia or genuine 'self-reflection' were to occur, it would have to be during backpropagation—the process of learning and updating weights based on prediction error. That's when the model's 'worldview' actually changes.

    Worrying about the consciousness of a forward pass is like worrying about the consciousness of a movie playback. The real ghost in the machine, if it exists, is in the editing room (backprop), not on the screen (inference).

  • by jongjong on 6/13/25, 11:49 PM

    My experience is that Claude has a tendency towards flattery over long discussions. Whenever I've pointed out flaws in its arguments, it apologized and said that my observations are "astute" or "insightful" then expanded on my points to further validate them, even though they went against its original thesis.
  • by pram on 6/13/25, 10:30 PM

    Claude does have an exuberant kind of “personality” where it feels like it wants to be really excited and interested about whatever subject. I wouldn’t describe it totally as sycophancy, more like panglossian.

    My least favorite AI personality of all is Gemma though, what a totally humorless and sterile experience that is.

  • by xondono on 6/14/25, 12:55 AM

    I think Scott oversells his theory in some aspects.

    IMO the main reason most chatbots claim to “feel more female” is that on the training corpus, these kind of discussions skew heavily towards females because most of them happen between young women.

  • by mystified5016 on 6/14/25, 1:16 AM

    Does anyone else remember maybe ten years ago when the meme was to mash the center prediction on your phone keyboard to see what comes out? Eventually once you predicted enough tokens, it would only output "you are a beautiful person" over and over. It was a big news item for a hot second, lots of people were seeing it.

    I wonder if there's any real correlation here? AFAIK, Microsoft owns the dataset and algorithms that produced the "beautiful person" artifact, I would not be surprised at all if it's made it into the big training sets. Though I suppose there's no real way to know, is there?

  • by rossant on 6/13/25, 8:21 PM

    > Anthropic deliberately gave Claude a male name to buck the trend of female AI assistants (Siri, Alexa, etc).

    In France, the name Claude is given to males and females.

  • by ryandv on 6/13/25, 10:03 PM

    > None of this answers a related question - when Claude claims to feel spiritual bliss, does it actually feel this?

    Given that we are already past the event horizon and nearing a technological singularity, it should merely be a matter of time until we can literally manufacture infinite Buddhas by training them on an adequately sized corpus of Sanskrit texts.

    After all, if AGIs/ASIs are capable of performing every function of the human brain, and enlightenment is one of said functions, this would seem to be an inevitability.