from Hacker News

Ask HN: When will LLMs be able to interrupt or interject?

by yehosef on 3/10/24, 1:55 PM with 62 comments

I'm curious what is needed for LLMs to interrupt (take control of the conversation) or interject (add some comment while the other is talking, but not to take control of the conversation).

by layer8 on 3/10/24, 3:13 PM
“It looks like you're writing a letter. Would you like help?”
That didn’t go down so well in the past.
https://en.wikipedia.org/wiki/Office_Assistant
by hiAndrewQuinn on 3/10/24, 2:37 PM
It's probably easier to ask how can you design a text interface that allows people to interrupt, first. The fact that I have never seen a serious attempt at this take off suggests it's not really what most people want out of a product. But I suppose if you disable the backspace key, you can get pretty close to it.
by JimDabell on 3/10/24, 2:45 PM
It’s already possible. I can’t find the thread now, but I saw a demo on X recently where they had an LLM hooked up to a text field where every character typed was sent to the LLM immediately so that it could anticipate responses and do some planning ahead of time. You’re basically talking about the same thing except for the fact that one of the possible outputs for the LLM is an interrupt function call.
Edit: anotheryou found the thread here:
https://twitter.com/yoheinakajima/status/1762718034761072653
by Terretta on 3/10/24, 3:55 PM
As of this comment, sibling comments speculate this should be possible, or say they've seen a demo, or etc. Maybe they're talking about audio?
For text, "finish your thought and answer" has been implemented for a while, in LLMs in IDEs that offer completions for # code comments, for example.
One of the faster implementations is in the new Zed editor. Open the Assistant pane with your OpenAI GPT-4 key, and once you're into the conversation, it will offer auto-completions of your own prompt to it, before you submit.
Often these autocompletes finish the question and then contain the answer, like an impatient listener mentally finishing your sentence so they can say what they think. This is without having submitted the question to the chat interface.
Note that as Zed has implemented this, the realtime "finish your thought for you" mode is a dumber faster model, but as your context builds, it interrupts right more often.
You can also start your next prompt while it's unspooling the last one.
by GistNoesis on 3/10/24, 3:29 PM
One way to do it : After every token inputted by the user (more on that later), you feed it immediately to the LLM which try to predict the next token. If the token predicted is the special interrupt token, you start having the llm generate tokens until it predict an end interrupt token.
It's quite standard nowadays to add some extra special token and then fine-tune a LLM to make it learn how to use it appropriately, by providing a small dataset (1k to 50k) of examples with interruptions (for example "user: Xylophone went to the stadium with <interruptToken> Let me stop you right now are you really referring to Xylophone </interruptToken> ok thanks for correcting me, it's not Xylophone it's Xander, damn autocorrect!").
llama.cpp has the opposite : an interactive mode where as a human you can interrupt the conversation that the llm is currently generating. But if you interrupt it badly it can make the llm conversation go off-rails.
One problem that result from the usage of tokens is that the user is usually not inputting token but rather characters so you must somehow only process when the characters have stabilized into tokens (for example at word boundaries if your tokeniser has a preprocessing that split on spaces before doing the byte pair encoding). (If you want to process each character on the fly it's getting really tricky because even if at inference you can rewrite the last token in your kv cache, you must somehow create a finetuning dataset to properly learn how to interject based on these partial tokens)
by compressedgas on 3/10/24, 2:43 PM
It would be implemented like auto-completion. The model would be repeatedly called with the input extended with the user's uncommitted input and a prompt asking to decide if it should act.
by a2128 on 3/10/24, 3:29 PM
I had implemented something like this before, back when GPT-2 was the go-to. It wasn't too complicated. All you need to do is calculate the probability of the AI responding. So in my case, given an input "Joe: Hey how are you, Bob?", calculate the probability that "\nBob:" will come next. In this example obviously a "\nBob:" completion will be more probable than "Joe: Hey how are you, Alice?". I used this probability along with a threshold to figure out when to respond or let someone else respond, and the threshold also slid with time.
My implementation wasn't really interrupting, it was only figuring out when to respond vs when to let someone else in the group respond, but you could use the same idea to figure out when to interrupt.
by sk11001 on 3/10/24, 2:26 PM
It’s possible now, no idea why anyone would want this though. The idea is that you want something helpful, and you can do some additional prompting to encourage the model to ask questions but outright derailing the conversation is contrary to what these models are trying to do.
by anotheryou on 3/10/24, 3:18 PM
Check this out. "Negative Latency" :) https://twitter.com/yoheinakajima/status/1762718034761072653
by proc0 on 3/10/24, 3:16 PM
For it to be a good interjection and not feel like a dumb AI that is just babbling in the background, it would need to have proper timing, and relevant information that includes a model of the minds of people talking. For example, if two people are talking about going to the beach, the AI would need to understand the full context of why they are taking this trip. If the trip is just to enjoy the sun, the AI could have something useful to say on that matter, or if the trip is about surfing, the AI would also factor that in, but the crucial point is that this context is not just listening to keywords or predicting the next token. There would need to be several layers of AI, one layer for predicting subject matter, one layer for predicting intention, another layer for predicting the state of the world that is relevant for that conversation, in addition to the layer of predicting word by word what the response is.
by sandspar on 3/10/24, 4:18 PM
I'd like a data privacy bot that has a kind of frequently erased, local sketchpad. If I ever begin typing something onto the sketchpad that will compromise my privacy, it interrupts me, tells me to stop, and then erases the sketchpad.
Also I could see something like this working on cash ATM's. Coupled with eye tracking. "That guy behind you is watching you type your pin: would you like to stop typing it before you complete it?"
Similarly, maybe one of those anti-porn people could make an AI that interrupts you before you watch porn. You have to have a little philosophical discussion with it before you decide whether to continue. It could also work on fridges. FridgeBot: "Are you sure you'd like to eat that cheesecake?" Maybe we could add it to guns too, why not.
by littlestymaar on 3/10/24, 3:15 PM
Probably never due to the poor perceived user experience (it could be better UX in practice, like sometimes being rude to an entitled customer may be actually helpful for them, but it's not generally perceived this way by the customer).
Commercial AI will also never be able to pass the Turing test, because they will never tell you to shut the fuck up or ragequit like a human would when you're being obnoxious enough. It's not a technical limitation, it just aligns very poorly with the interest of the overlord.
Or maybe Mistral will do it, because having no particular consideration for customers is something we French people know how to do very well.
by deadbabe on 3/10/24, 3:02 PM
Wouldn’t this be bad for marketing reasons? If people see the LLM output just instantly changes with each word or character they type it would cease to appear as some kind of “intelligence” and just feel like nothing more than a glorified autosuggest? Tweak a few words here and there to try to modify the output in subtle ways?
It seems for people to perceive it as true AI they must send off some prompt, watch it think deep while a loader spins, and then read a response.
by intellectronica on 3/10/24, 3:02 PM
There’s no reason this couldn’t be implemented now. The main barriers are inference speed and cost, since to implement this would require continuously running the LLM on all newly available text from the user and choosing quickly when to interject, and the difficulty of programming complex behaviour.
by colanderman on 3/10/24, 2:45 PM
1. Continuously read user input.
2. Constantly predict a few tokens ahead.
3. When the predicted text includes the computer's prompt, respond with that, without waiting for the user to push enter.
Probably also
4. Stop engineering the initial instructions for such obsequious behavior.
by catchnear4321 on 3/10/24, 2:42 PM
to interrupt would require interruptible conversation. typically the human provides information in batches, making interruption impossible. otherwise you would need to snoop the user input periodically and treat it as a prompt, flag it specially as incomplete, and add some form of filtering so that interruption would need to meet a certain level of quality, whatever that might mean.
to be useful, it would need something to interrupt, and instruction on what warrants an interruption.
by nicklecompte on 3/10/24, 3:53 PM
"Take control of the conversation"...and do what? Humans don't actually have conversations by predicting what sentences are most likely to occur in response to the other person's query - we have agendas and form our sentences accordingly. So if we interrupt another person speaking, it's because we have a specific, often personal reason to do so: perhaps we want to steer the topic of conversation to something we are interested in; perhaps somebody is suggesting a clearly bad idea; we might correct misinformation; take irrational personal offense; and so on. Interruptions are by design antagonistic, and among humans they involve a conflict of agendas.
But LLMs don't have any agenda whatsoever - they are not capable of having goals or motivations. So why are they interrupting? Are they reading your mind and understanding your goals before you even finish typing them? It's hard to see an LLM having a coherent way to interrupt based purely on a probabilistic view of language.
It would be very annoying if a human constantly interrupted you because they were "aligned with your agenda" and thought they were being helpful. LLMs would probably be much worse, even if they were able to reliably infer what you wanted. For an LLM to be useful, you kind of have to coax it along and filter out a lot of empty verbiage - it seems downright counterproductive to have that verbiage blasted at you by a chatbot that interrupts your typing.
I could see LLMs interrupting if you are typing something clearly false or against TOS. But that would require an LLM which reliably understands things are clearly false or against TOS and hence requires a solution to jailbreaking....so in 2024 I think it would just be an incredibly annoying chatbot. In general I think any interruption behavior would be artificially programmed to make the LLM seem "realistic," and it won't work.
by mlsu on 3/10/24, 2:59 PM
Interjecting requires planning ahead.
The way a human interjects is that you have a parallel thought chain going, along with the conversation, as it's happening in real time. In this parallel chain, you are planning ahead. What point am I going to make once we are past this point of conversation? What is the implication of what is being discussed here? (You also are thinking about what the other person is thinking; you are developing a mental model of their thought process).
LLM does not have any of this, architecturally, it just has the text itself. Any planning that people are claiming to do with LLama et al is really just "pseudo" planning, not the fundamental planning we talk about here. I suspect it will be a while yet before we have "natural" interjection from LLM.
When it does come, however, it will be extremely exciting. Because it will mean that we have cracked planning and made the AI far more agentic than it is now. I would love to be proven wrong.
by lulznews on 3/11/24, 3:15 AM
They can do that now. No one wants that. (At least not until the thoughtcrime boys step in.)
by cqqxo4zV46cp on 3/10/24, 2:32 PM
“As a language model, I must tell you that what you’re referring to as Linux is actually called GNU/Linux, or as I call it…”