from Hacker News

Generative Agents: Interactive Simulacra of Human Behavior

by mmq on 4/10/23, 9:32 PM with 252 comments

  • by Ozzie_osman on 4/11/23, 12:38 AM

      To directly command one of the agents, the user takes on the persona of the agent’s “inner voice”—this makes the agent more likely to treat the statement as a directive. For instance, when told “You are going to run against Sam in the upcoming election” by a user as John’s inner voice, John decides to run in the election and shares his candidacy with his wife and son.
    
    So that's where my inner voice comes from.
  • by alexahn on 4/11/23, 12:13 AM

    An interesting thought experiment: what would an AGI do in a sterile world? I think the depth of understanding that any intelligence develops is significantly bound by its environment. If there is not enough entropy in the environment, I can't help but feel that a deep intelligence will not manifest. This kind of becomes a nested dolls type of problem, because we need to leverage and preserve the inherent entropy of the universe if we want to construct powerful simulators.

    As an example, imagine if we wanted to create an AGI that could parse the laws of the universe. We would not be able to construct a perfect simulator because we do not know the laws ourselves. We could probably bootstrap an initial simulator (given what we know about the universe) to get some basic patterns embedded into the system, but in the long run, I think it will be a crutch due to the lack of universal entropy in the system. Instead, in a strange way, the process has to be reversed, that a simulator would have to be created or dreamed up from the "mind" of the AGI after it has collected data from the world (and formed some model of the world).

  • by mdaniel on 4/10/23, 10:35 PM

    The previous submission https://news.ycombinator.com/item?id=35511843 had just a few comments, but Ian's was substantial (although regrettably offsite): https://news.ycombinator.com/item?id=35514112 and it especially highlighted the demo URL: https://reverie.herokuapp.com/arXiv_Demo/
  • by lsy on 4/11/23, 4:35 AM

    I'd be very hard-pressed to call this "human behavior". Moving a sprite to a region called "bathroom" and then showing a speech bubble with a picture of a toothbrush and a tooth isn't the same as someone in a real bathroom brushing their teeth. What you can say is if you can sufficiently reduce behavior to discrete actions and gridded regions in a pixel world, you can use an LLM to produce movesets that sound plausible because they are relying on training data that indicates real-world activity. And if you then have a completely separate process manage the output from many LLMs, you can auto-generate some game behavior that is interesting or fun. That's a great result in itself without the hype!
  • by Imnimo on 4/11/23, 12:29 AM

    It's interesting how much hand-holding the agents need to behave reasonably. Consider the prompt governing reflection:

    >What 5 high-level insights can you infer from the above statements? (example format: insight (because of 1, 5, 3))

    >Given only the information above, what are 3 most salient high-level questions we can answer about the subjects in the statements?

    We're giving the agents step-by-step instructions about how to think, and handling tasks like book-keeping memories and modeling the environment outside the interaction loop.

    This isn't a criticism of the quality of the research - these are clearly the necessary steps to achieve the impressive result. But it's revealing that for all the cool things ChatGPT can do, it is so helpless to navigate this kind of simulation without being dragged along every step of the way. We're still a long way from sci-fi scenarios of AI world domination.

  • by green_man_lives on 4/11/23, 1:24 AM

    All of this research using GPT to simulate an internal monologue to produce agents reminds me of Julian Jaynes theories about consciousness:

    https://en.wikipedia.org/wiki/The_Origin_of_Consciousness_in...

  • by bundie on 4/10/23, 9:49 PM

    Interesting paper. I think something like this could be implemented in open world games in the future, no? I cannot wait for games that feel 'truly alive'.
  • by Jeff_Brown on 4/11/23, 2:18 AM

    People on Twitter are speculating breathlessly about using this for social science. I don't immediately see uses for it outside of fiction, esp. video games.

    It would be cool if some kind of law of large numbers (an LLN for LLMs) implied that the decisions made by a thing trained on the internet will be distributed like human decisions. But the internet seems a very biased sample. Reporters (rightly) mostly write about problems. People argue endlessly about dumb things. Fiction is driven by unreasonably evil characters and unusually intense problems. Few people elaborate the logic of ordinary common sense, because why would they? The edge cases are what deserve attention.

    A close model of a society will need a close model of beliefs, preferences and material conditions. Closely modeling any one of those is far, far beyond us.

  • by skilled on 4/11/23, 4:48 AM

    But the model already has all this info, what is groundbreaking about this? These kind of sensational headlines are not helping anyone either.
  • by cornholio on 4/11/23, 9:20 AM

    I'm concerned that the quality of human simulacra will be so good that they will be indistinguishable from a sentient AGI.

    We will be so used to having lifeless and morally worthless computers accurately emulate humans that when a sentient and worthy of empathy artificial intelligence arrives, we will not treat it any different than a smartphone and we will have a strong prejudice against all non-biological life. GPT is still in the uncanny valley but it's probably just a few years away from being indistinguishable from a human in casual conversation.

    Alternatively, some might claim (and indeed have already claimed) that purely mechanical algorithms are a form of artificial life worthy of legal protection, and we won't have any legal test that could discern the two.

  • by xiphias2 on 4/11/23, 4:29 AM

    Peeking into these lives sounded amazing until I started reading what they are doing and how boring their lives are…. gathering data for podcasts and recording videos, planning and washing teeth.

    It would be fun to run the same simulation in the Game of thrones world, or maybe play House of cards with current politicians.

    Anyways, kudos for being open and sharing all data

  • by og_kalu on 4/10/23, 11:10 PM

    a good enough simulation interacting with the real word would be no less impactful than whatever you imagine a non-simulation to be.

    as we agentify and embody these systems to take actions in the real word, i really hope we remember that. "It's just a simulation"/ "It's not true [insert property]" is not the shield some imagine it to be.

  • by 1letterunixname on 4/11/23, 2:28 AM

    Given the state of technology, I cannot be completely certain that none of you are not bots. On the other hand, neither can any of you.

    Perhaps it would be wise to allow bots to comment if they were able to meet a minimum level of performative insight and/or positive contributions. It is entirely possible that a machine would be able to scan and collect much more data than any human ever could (the myth of the polymath), and possibly even draw conclusions that have been overlooked.

    I see a future of bot "news reporters" able to discern if some business were cheating or exploiting customers, or able to find successful and unsuccessful correlative (perhaps even causal) human habits. Data-driven stories that could not be conceived of by humans. Basically, feed Johnny Number 5 endless input.

  • by neuronexmachina on 4/11/23, 5:35 AM

    Reading the abstract reminded me of Marvin Minsky's 1980s book "Society of Mind". I wonder if you could get some cool emergent mind-like behavior from a collection of specialized agents based on LLMs and other technologies communicating with each other:

    * https://en.wikipedia.org/wiki/Society_of_Mind

    * http://aurellem.org/society-of-mind/

  • by synaesthesisx on 4/11/23, 7:37 AM

    Some of the most interesting work in this space is in the “shared” memory models (in most cases today, vector db’s). Agents can theoretically “learn” and share memories with the entire fleet, and develop a collective understanding & memory accessible by the swarm. This can enable rapid, “guided” evolution of agents and emergent behaviors (such as cooperation).

    We’re going to see some really, really interesting things unfold - the implications of which many haven’t fully grasped.

  • by startupsfail on 4/11/23, 12:20 AM

    Are we sure that these simulations are unconscious? The best answer that I have is: I don’t know…

    Short term, long term memory, inner dialogue, reflection, planning, social interactions… They’d even go and have fun eating lunch 3 times in a row, at noon, half past noon and at one!

  • by refulgentis on 4/11/23, 4:45 AM

    This oversells the paper quite a bit, the interactions are rather mundane as the authors note (and I'm rushing to implement it! it's awesome! but not all this)
  • by discmonkey on 4/11/23, 12:01 AM

    This paper feels significant. If chatgpt was an evolutionary step on gpt3.5/gpt4, then this is bit like taking chatgpt and using it as the backbone of something that can accumulate memories, reflect on them, and make plans accordingly.
  • by d--b on 4/11/23, 4:47 AM

    To me, having not really intelligent agents with humanlike talking abilities is the worst outcome AI could produce.

    These have zero utility for humanity, cause they’re not intelligent whatsoever. Yet these systems can produce tons of garbage content for free, that is difficult to distinguish from human-created content.

    At best this is used to create better NPC in video games (as the article mentions), but more generally this is going to be used to pollute social media (if not already).

  • by cwxm on 4/11/23, 1:26 AM

    Can't wait for the next dwarf fortress to include something like this.
  • by crooked-v on 4/11/23, 2:53 AM

    One thing I find particularly interesting here: The general technique they describe for automatically generating the memory stream and derived embeddings (as well as higher-level inferences about that they call "reflections"), then querying against that in a way that's not dependent on the LLM's limited context window, looks like it would be pretty easily generalizable to almost anything using LLMs. Even SQLite has an extension for vector embedding search now [1], so it should be possible to implement this technique in an entirely client-side manner that doesn't actually depend on the service (or local LLM) you're using.

    [1]: https://observablehq.com/@asg017/introducing-sqlite-vss

  • by colanderman on 4/11/23, 4:43 AM

    Another user posted, and deleted, a comment to the effect that the morality of experimenting with entities which toe the line of sentience is worth considering.

    I'm surprised this wasn't mentioned in the "Ethics" section of the paper.

    The "Ethics" section does repeatedly say "generative agents are computational entities" and should not be confused for humans. Which suggests to me the authors may believe that "computational" consciousness (whether or not these agents exhibit it) is somehow qualitatively different than "real live human" consciousness due to some je ne sais quoi and therefore not ethically problematic to experiment with.

  • by Baeocystin on 4/11/23, 5:36 AM

    Looking forward to playing StardewGPT. Half-joking aside, I do think that level of abstraction is probably a good choice. Familiar and comfy, but with enough detail to be able to find interesting social patterns.
  • by qumpis on 4/11/23, 12:06 AM

    Nice to see progress on this end. I've been hoping for some time for a continuation of AI generated shows (like the previously-famous Nothing Forever) that can 1) interact with the open world and 2) keep history long enough (e.g. by resummarizing and reprompting the model).

    Controlling the agents and not merely making them output text through LLMs sounds very exciting, especially once people figure out the best way to connect APIs of simulators with the models

  • by courseofaction on 4/11/23, 4:25 AM

    Something interesting from the paper:

    The architecture produced more believable behaviour than human crowdworkers.

    That's right, the AI were more believable as human-like agents than humans.

    What a time to be alive.

    (See Figure 8)

  • by tucnak on 4/11/23, 5:01 AM

    I was very disappointed that none of the agents I observed for a whole day got to do the most important "human behaviour"— sex, that is. Tragic
  • by lurquer on 4/11/23, 5:54 PM

    The ‘safe’ tuning of the models is becoming a nuisance. As indicated in the paper, the agents are overly cooperative and pleasant due to the LLM’s training.

    Pity they can’t get access to an untuned LLM. This isn’t the first example I’ve read it where research is being hampered by the PC nonsense and related filters crammed into the model.

  • by newswasboring on 4/11/23, 8:28 AM

    I kid you not, I literally started making something like this yesterday. My plans were smaller, only trying to simulate politics, but still. Living in this moment of AI is sometimes very demoralizing. Whatever you try to make has been made by someone last week. /rant
  • by bradgranath on 4/10/23, 11:00 PM

    Hey! It's a proto ancestor sim!
  • by prakhar897 on 4/11/23, 5:31 AM

  • by ianbicking on 4/11/23, 1:09 AM

    I wrote up some notes from reading this paper here: https://hachyderm.io/@ianbicking/110175179843984127

    But for convenience maybe I'll just copy them into a comment...

    It describes an environment where multiple #LLM (#GPT)-powered agents interact in a small town.

    I'll write my notes here as I read it...

    To indicate actions in the world they represent them as emoji in the interface, e.g., "Isabella Rodriguez is writing in her journal" is displayed as

    You can click on the person to see the exact details, but this emoji summarization is a nice idea for overviews.

    A user can interfere (or "steer" if you are feeling generous) the simulation through chatting with agents, but more interestingly they can "issue a directive to an agent in the form of an 'inner voice'"

    Truly some miniature Voice Of God stuff here!

    I'll see if this is detailed more later in the paper, but initially it sounds like simple prompt injection. Though it's unclear if it's injecting things into the prompt or into some memory module...

    Reading "Environmental Interaction" it sounds like they are specifying the environment at a granular level, with status for each object.

    This was my initial thought when trying something similar, though now I'm more interested in narrative descriptions; that is, describing the environment to the degree it matters or is interesting, and allowing stereotyped expectations to basically "fill in" the rest. (Though that certainly has its own issues!)

    They note the language is stilted and suggest later LLMs could fix this. It's definitely resolvable right now; whatever results they are getting are the results of their prompting.

    The conversations remind me of something Nintendo would produce, short, somewhat bland, but affable. They must have worked to make the interactions so short, as that's not GPT default style. But also every example is an instruction, so it might also have slipped in.

    Memory is a big fixation right now, though I'm just not convinced. It's obviously important, but is it a primary or secondary concern?

    To contrast, some other possible concerns: relationships, mood, motivations, goals, character development, situational awareness... some of these need memory, but many do not. Some are static, but many are not.

    To decide on which memories to retrieve they multiply several scores together, including recency. Recency is an exponential decay of 1% per hour.

    That seems excessive...? It doesn't feel like recency should ever multiply something down to zero. Though it's recency of access, not recency of creation. And perhaps the world just doesn't get old enough for this to cause problems. (It was limited to 3 days, or about 50% max recency penalty.

    The reflection part is much more interesting: given a pool of recent memories they ask the LLM to generate the "3 most salient high-level questions we can answer about the subjects in the statements?"

    Then the questions serve to retrieve concrete memories from which the LLM creates observations with citations.

    Planning and re-planning are interesting. Agents specifically plan out their days, first with a time outline then with specific breakdowns inside that outline.

    For revising plans there's a query process where there is observation, then turning the observation into something longer (fusing memories/etc), and then asking "Should they react to the observation, and if so, what would be an appropriate reaction?"

    Interviewing the agents as a means of evaluation is kind of interesting. Self-knowledge becomes the trait that is judged.

    Then they cut out parts of the agent and see how well they perform in those same interviews.

    Still... the use of quantitative measures here feels a little forced when there's lots of rich qualitative comparisons to be done. I'd rather see individual interactions replayed and compared with different sets of functionality.

    They say they didn't replay the entire world with different functionality because each version would drift (which is fair and true). But instead they could just enter into a single moment to do a comparison (assuming each moment is fully serializable).

    I've thought about updating world state with operational transforms in part for this purpose, to make rewind and effect tracking into first-class operations.

    Well, I'm at the end now. Interesting, but I wish I knew the exact prompts they were using. The details matter a lot. "Boundaries and Errors" touched on this, but that section was 4x the size, there's a lot to be said about the prompts and how they interact with memories and personality descriptions.

    ...

    I realize I missed the online demo: https://reverie.herokuapp.com/arXiv_Demo/

    It's a recording of the play run.

    I also missed this note: "The present study required substantial time and resources to simulate 25 agents for two days, costing thousands of dollars in token credit and taking multiple days to complete"

    I'm slightly surprised, though if they are doing minute-by-minute ticks of the clock over all the agents then it's unsurprising. (Or even if it's less intensive than that.)

    You can look at specific memories: https://reverie.herokuapp.com/replay_persona_state/March20_t...

    Granularity looks to be 10 seconds, very short! It's not filtering based on memories being expected vs interesting memories, so lots of "X is idle" notes.

    If you look at these states the core information (the personality of the person) is very short. There's lots of incidental memories. What matters? What could just be filled in as "life continued as expected"?

    One path to greater efficiency might be to encode "what matters" for a character in a way that doesn't require checking in with GPT.

    Could you have "boring embeddings"? Embeddings that represent the stuff the eye just passes right over without really thinking about it. Some of training up a character would be to build up this database of disinterest. Perhaps not unlike babies with overconnected brains that need synapse pruning to be able to pay attention to anything at all.

    Another option might be for the characters to compose their own "I care about this" triggers, where those triggers are low-cost code (low cost compared to GPT calls) that can be run in a tighter loop in the simulation.

    I think this is actually fairly "believable" as a decision process, as it's about building up habituated behavior, which is what believable people do.

    Opens the question of what this code would look like...

    This is a sneaky way to phrase "AI coding its own soul" as an optimization.

    The planning is like this, but I imagine a richer language. Plans are only assertive: try to do this, then that, etc. The addition would be things like "watch out for this" or "decide what to do if this happens" – lots of triggers for the overmind.

    Some of those triggers might be similar to "emotional state." Like, keep doing normal stuff unless a feeling goes over some threshold, then reconsider.

  • by golol on 4/11/23, 4:58 AM

    It's a pretty obvious idea executed well. I definely think symbolic AI agents written in the programming language english and interpreted using LLMs is the way forward.
  • by FestiveHydra235 on 4/11/23, 2:14 AM

    Maybe I missed it in the paper but they did post the source code (Github) for their implementation? Is anyone working on creating their own infrastructure based on the paper?
  • by amrb on 4/11/23, 7:42 AM

  • by jsemrau on 4/11/23, 1:13 AM

    this is a really important conversation that we are not having. Based on whose character are we modelling these agents?

    If we rely on online conversations for the training we need to realize that this is a journey to the dumbest common denominator.

    Instead, I believe we should look at the brightest and universally morally accepted humans in history to train them.

    Maybe I would start my list like that:

    1. Barack Obama.

    2. Jean-Luc Picard (we can rely on work of fiction).

    3. Bill Gates.

    4. Leonardo Da Vinci.

    5. Mr Rogers

    6. ???

  • by fabiensnauwaert on 4/11/23, 9:47 AM

    Does anyone know which engine they used for the cute 2D rendering? Or is it custom-built?
  • by MrPatan on 4/11/23, 9:26 AM

    It's about to get weird. How do I get investment exposure to the Amish?
  • by creamyhorror on 4/11/23, 9:04 AM

    I love what this project has done. Currently they're basically having to work around the architectural limits of the LLM in order to select salient memories, but it's still produced something very workable.

    Language is acting as a common interpretation-interaction layer for both the world and agents' internal states. The meta-logic of how different language objects interact to cause things to happen (e.g. observations -> reflections) is hand-crafted by the researchers, while the LLM provides the corpus-based reasoning for how a reasonable English-writing human would compute the intermediate answers to the meta-logic's queries.

    I'd love to see stochastic processes, random events (maybe even Banksian 'Outside Context Problems'), and shifted cultural bases be introduced in future work. (Apologies if any of these have been mentioned.) Examples:

    (1) The simulation might actually expose agents to ideas when they consume books or media, potentially absorb those ideas if they align with their knowledge and biases, and then incorporate them into their views and actions (e.g. oppose Tom as mayor because the agent has developed anti-capitalist views and Tom has been an irresponsible business owner).

    (2) In the real world, people occasionally encounter illnesses physical and mental, win lotteries, get into accidents. Maybe the beloved local cafe-bookstore is replaced by a national chain that hires a few local workers (which might necessitate an employment simulation subsystem). Or a warehouse burns down and it's revealed that an agent is involved in a criminal venture or conflict. These random processes would add a degree of dynamism to the simulation, which is more akin to the Truman Show currently.

    (3) Other cultural bases: currently, GPT generates English responses based on a typically 'online-Anglosphere-reasonable' mindset due to its training corpus. To simulate different societies, e.g. a fantasy-feudal one (like Game of Thrones as another commenter mentioned), a modified base for prompts would be needed. I wonder how hard it would be to implement (would fine-tuning be required?).

    Feels like I need to look for collaborative projects working on this sort of simulation, because it's fascinated me ever since the days of Ultima VII simulating NPCs' responses and interactions with the world.

  • by explaininjs on 4/11/23, 5:17 AM

    If there's one category of people I trust to identify authentic human social behavior, it's CS students at Stanford.