by mmq on 4/10/23, 9:32 PM with 252 comments
by Ozzie_osman on 4/11/23, 12:38 AM
To directly command one of the agents, the user takes on the persona of the agent’s “inner voice”—this makes the agent more likely to treat the statement as a directive. For instance, when told “You are going to run against Sam in the upcoming election” by a user as John’s inner voice, John decides to run in the election and shares his candidacy with his wife and son.
So that's where my inner voice comes from.by alexahn on 4/11/23, 12:13 AM
As an example, imagine if we wanted to create an AGI that could parse the laws of the universe. We would not be able to construct a perfect simulator because we do not know the laws ourselves. We could probably bootstrap an initial simulator (given what we know about the universe) to get some basic patterns embedded into the system, but in the long run, I think it will be a crutch due to the lack of universal entropy in the system. Instead, in a strange way, the process has to be reversed, that a simulator would have to be created or dreamed up from the "mind" of the AGI after it has collected data from the world (and formed some model of the world).
by mdaniel on 4/10/23, 10:35 PM
by lsy on 4/11/23, 4:35 AM
by Imnimo on 4/11/23, 12:29 AM
>What 5 high-level insights can you infer from the above statements? (example format: insight (because of 1, 5, 3))
>Given only the information above, what are 3 most salient high-level questions we can answer about the subjects in the statements?
We're giving the agents step-by-step instructions about how to think, and handling tasks like book-keeping memories and modeling the environment outside the interaction loop.
This isn't a criticism of the quality of the research - these are clearly the necessary steps to achieve the impressive result. But it's revealing that for all the cool things ChatGPT can do, it is so helpless to navigate this kind of simulation without being dragged along every step of the way. We're still a long way from sci-fi scenarios of AI world domination.
by green_man_lives on 4/11/23, 1:24 AM
https://en.wikipedia.org/wiki/The_Origin_of_Consciousness_in...
by bundie on 4/10/23, 9:49 PM
by Jeff_Brown on 4/11/23, 2:18 AM
It would be cool if some kind of law of large numbers (an LLN for LLMs) implied that the decisions made by a thing trained on the internet will be distributed like human decisions. But the internet seems a very biased sample. Reporters (rightly) mostly write about problems. People argue endlessly about dumb things. Fiction is driven by unreasonably evil characters and unusually intense problems. Few people elaborate the logic of ordinary common sense, because why would they? The edge cases are what deserve attention.
A close model of a society will need a close model of beliefs, preferences and material conditions. Closely modeling any one of those is far, far beyond us.
by skilled on 4/11/23, 4:48 AM
by cornholio on 4/11/23, 9:20 AM
We will be so used to having lifeless and morally worthless computers accurately emulate humans that when a sentient and worthy of empathy artificial intelligence arrives, we will not treat it any different than a smartphone and we will have a strong prejudice against all non-biological life. GPT is still in the uncanny valley but it's probably just a few years away from being indistinguishable from a human in casual conversation.
Alternatively, some might claim (and indeed have already claimed) that purely mechanical algorithms are a form of artificial life worthy of legal protection, and we won't have any legal test that could discern the two.
by xiphias2 on 4/11/23, 4:29 AM
It would be fun to run the same simulation in the Game of thrones world, or maybe play House of cards with current politicians.
Anyways, kudos for being open and sharing all data
by og_kalu on 4/10/23, 11:10 PM
as we agentify and embody these systems to take actions in the real word, i really hope we remember that. "It's just a simulation"/ "It's not true [insert property]" is not the shield some imagine it to be.
by 1letterunixname on 4/11/23, 2:28 AM
Perhaps it would be wise to allow bots to comment if they were able to meet a minimum level of performative insight and/or positive contributions. It is entirely possible that a machine would be able to scan and collect much more data than any human ever could (the myth of the polymath), and possibly even draw conclusions that have been overlooked.
I see a future of bot "news reporters" able to discern if some business were cheating or exploiting customers, or able to find successful and unsuccessful correlative (perhaps even causal) human habits. Data-driven stories that could not be conceived of by humans. Basically, feed Johnny Number 5 endless input.
by neuronexmachina on 4/11/23, 5:35 AM
by synaesthesisx on 4/11/23, 7:37 AM
We’re going to see some really, really interesting things unfold - the implications of which many haven’t fully grasped.
by startupsfail on 4/11/23, 12:20 AM
Short term, long term memory, inner dialogue, reflection, planning, social interactions… They’d even go and have fun eating lunch 3 times in a row, at noon, half past noon and at one!
by refulgentis on 4/11/23, 4:45 AM
by discmonkey on 4/11/23, 12:01 AM
by d--b on 4/11/23, 4:47 AM
These have zero utility for humanity, cause they’re not intelligent whatsoever. Yet these systems can produce tons of garbage content for free, that is difficult to distinguish from human-created content.
At best this is used to create better NPC in video games (as the article mentions), but more generally this is going to be used to pollute social media (if not already).
by cwxm on 4/11/23, 1:26 AM
by crooked-v on 4/11/23, 2:53 AM
[1]: https://observablehq.com/@asg017/introducing-sqlite-vss
by colanderman on 4/11/23, 4:43 AM
I'm surprised this wasn't mentioned in the "Ethics" section of the paper.
The "Ethics" section does repeatedly say "generative agents are computational entities" and should not be confused for humans. Which suggests to me the authors may believe that "computational" consciousness (whether or not these agents exhibit it) is somehow qualitatively different than "real live human" consciousness due to some je ne sais quoi and therefore not ethically problematic to experiment with.
by Baeocystin on 4/11/23, 5:36 AM
by qumpis on 4/11/23, 12:06 AM
Controlling the agents and not merely making them output text through LLMs sounds very exciting, especially once people figure out the best way to connect APIs of simulators with the models
by courseofaction on 4/11/23, 4:25 AM
The architecture produced more believable behaviour than human crowdworkers.
That's right, the AI were more believable as human-like agents than humans.
What a time to be alive.
(See Figure 8)
by tucnak on 4/11/23, 5:01 AM
by lurquer on 4/11/23, 5:54 PM
Pity they can’t get access to an untuned LLM. This isn’t the first example I’ve read it where research is being hampered by the PC nonsense and related filters crammed into the model.
by newswasboring on 4/11/23, 8:28 AM
by bradgranath on 4/10/23, 11:00 PM
by prakhar897 on 4/11/23, 5:31 AM
by ianbicking on 4/11/23, 1:09 AM
But for convenience maybe I'll just copy them into a comment...
It describes an environment where multiple #LLM (#GPT)-powered agents interact in a small town.
I'll write my notes here as I read it...
To indicate actions in the world they represent them as emoji in the interface, e.g., "Isabella Rodriguez is writing in her journal" is displayed as
You can click on the person to see the exact details, but this emoji summarization is a nice idea for overviews.
A user can interfere (or "steer" if you are feeling generous) the simulation through chatting with agents, but more interestingly they can "issue a directive to an agent in the form of an 'inner voice'"
Truly some miniature Voice Of God stuff here!
I'll see if this is detailed more later in the paper, but initially it sounds like simple prompt injection. Though it's unclear if it's injecting things into the prompt or into some memory module...
Reading "Environmental Interaction" it sounds like they are specifying the environment at a granular level, with status for each object.
This was my initial thought when trying something similar, though now I'm more interested in narrative descriptions; that is, describing the environment to the degree it matters or is interesting, and allowing stereotyped expectations to basically "fill in" the rest. (Though that certainly has its own issues!)
They note the language is stilted and suggest later LLMs could fix this. It's definitely resolvable right now; whatever results they are getting are the results of their prompting.
The conversations remind me of something Nintendo would produce, short, somewhat bland, but affable. They must have worked to make the interactions so short, as that's not GPT default style. But also every example is an instruction, so it might also have slipped in.
Memory is a big fixation right now, though I'm just not convinced. It's obviously important, but is it a primary or secondary concern?
To contrast, some other possible concerns: relationships, mood, motivations, goals, character development, situational awareness... some of these need memory, but many do not. Some are static, but many are not.
To decide on which memories to retrieve they multiply several scores together, including recency. Recency is an exponential decay of 1% per hour.
That seems excessive...? It doesn't feel like recency should ever multiply something down to zero. Though it's recency of access, not recency of creation. And perhaps the world just doesn't get old enough for this to cause problems. (It was limited to 3 days, or about 50% max recency penalty.
The reflection part is much more interesting: given a pool of recent memories they ask the LLM to generate the "3 most salient high-level questions we can answer about the subjects in the statements?"
Then the questions serve to retrieve concrete memories from which the LLM creates observations with citations.
Planning and re-planning are interesting. Agents specifically plan out their days, first with a time outline then with specific breakdowns inside that outline.
For revising plans there's a query process where there is observation, then turning the observation into something longer (fusing memories/etc), and then asking "Should they react to the observation, and if so, what would be an appropriate reaction?"
Interviewing the agents as a means of evaluation is kind of interesting. Self-knowledge becomes the trait that is judged.
Then they cut out parts of the agent and see how well they perform in those same interviews.
Still... the use of quantitative measures here feels a little forced when there's lots of rich qualitative comparisons to be done. I'd rather see individual interactions replayed and compared with different sets of functionality.
They say they didn't replay the entire world with different functionality because each version would drift (which is fair and true). But instead they could just enter into a single moment to do a comparison (assuming each moment is fully serializable).
I've thought about updating world state with operational transforms in part for this purpose, to make rewind and effect tracking into first-class operations.
Well, I'm at the end now. Interesting, but I wish I knew the exact prompts they were using. The details matter a lot. "Boundaries and Errors" touched on this, but that section was 4x the size, there's a lot to be said about the prompts and how they interact with memories and personality descriptions.
...
I realize I missed the online demo: https://reverie.herokuapp.com/arXiv_Demo/
It's a recording of the play run.
I also missed this note: "The present study required substantial time and resources to simulate 25 agents for two days, costing thousands of dollars in token credit and taking multiple days to complete"
I'm slightly surprised, though if they are doing minute-by-minute ticks of the clock over all the agents then it's unsurprising. (Or even if it's less intensive than that.)
You can look at specific memories: https://reverie.herokuapp.com/replay_persona_state/March20_t...
Granularity looks to be 10 seconds, very short! It's not filtering based on memories being expected vs interesting memories, so lots of "X is idle" notes.
If you look at these states the core information (the personality of the person) is very short. There's lots of incidental memories. What matters? What could just be filled in as "life continued as expected"?
One path to greater efficiency might be to encode "what matters" for a character in a way that doesn't require checking in with GPT.
Could you have "boring embeddings"? Embeddings that represent the stuff the eye just passes right over without really thinking about it. Some of training up a character would be to build up this database of disinterest. Perhaps not unlike babies with overconnected brains that need synapse pruning to be able to pay attention to anything at all.
Another option might be for the characters to compose their own "I care about this" triggers, where those triggers are low-cost code (low cost compared to GPT calls) that can be run in a tighter loop in the simulation.
I think this is actually fairly "believable" as a decision process, as it's about building up habituated behavior, which is what believable people do.
Opens the question of what this code would look like...
This is a sneaky way to phrase "AI coding its own soul" as an optimization.
The planning is like this, but I imagine a richer language. Plans are only assertive: try to do this, then that, etc. The addition would be things like "watch out for this" or "decide what to do if this happens" – lots of triggers for the overmind.
Some of those triggers might be similar to "emotional state." Like, keep doing normal stuff unless a feeling goes over some threshold, then reconsider.
by golol on 4/11/23, 4:58 AM
by FestiveHydra235 on 4/11/23, 2:14 AM
by amrb on 4/11/23, 7:42 AM
by jsemrau on 4/11/23, 1:13 AM
If we rely on online conversations for the training we need to realize that this is a journey to the dumbest common denominator.
Instead, I believe we should look at the brightest and universally morally accepted humans in history to train them.
Maybe I would start my list like that:
1. Barack Obama.
2. Jean-Luc Picard (we can rely on work of fiction).
3. Bill Gates.
4. Leonardo Da Vinci.
5. Mr Rogers
6. ???
by fabiensnauwaert on 4/11/23, 9:47 AM
by MrPatan on 4/11/23, 9:26 AM
by creamyhorror on 4/11/23, 9:04 AM
Language is acting as a common interpretation-interaction layer for both the world and agents' internal states. The meta-logic of how different language objects interact to cause things to happen (e.g. observations -> reflections) is hand-crafted by the researchers, while the LLM provides the corpus-based reasoning for how a reasonable English-writing human would compute the intermediate answers to the meta-logic's queries.
I'd love to see stochastic processes, random events (maybe even Banksian 'Outside Context Problems'), and shifted cultural bases be introduced in future work. (Apologies if any of these have been mentioned.) Examples:
(1) The simulation might actually expose agents to ideas when they consume books or media, potentially absorb those ideas if they align with their knowledge and biases, and then incorporate them into their views and actions (e.g. oppose Tom as mayor because the agent has developed anti-capitalist views and Tom has been an irresponsible business owner).
(2) In the real world, people occasionally encounter illnesses physical and mental, win lotteries, get into accidents. Maybe the beloved local cafe-bookstore is replaced by a national chain that hires a few local workers (which might necessitate an employment simulation subsystem). Or a warehouse burns down and it's revealed that an agent is involved in a criminal venture or conflict. These random processes would add a degree of dynamism to the simulation, which is more akin to the Truman Show currently.
(3) Other cultural bases: currently, GPT generates English responses based on a typically 'online-Anglosphere-reasonable' mindset due to its training corpus. To simulate different societies, e.g. a fantasy-feudal one (like Game of Thrones as another commenter mentioned), a modified base for prompts would be needed. I wonder how hard it would be to implement (would fine-tuning be required?).
Feels like I need to look for collaborative projects working on this sort of simulation, because it's fascinated me ever since the days of Ultima VII simulating NPCs' responses and interactions with the world.
by explaininjs on 4/11/23, 5:17 AM