from Hacker News

The case against conversational interfaces

by nnx on 4/1/25, 2:14 AM with 217 comments

  • by ChuckMcM on 4/1/25, 6:35 AM

    This clearly elucidated a number of things I've tried to explain to people who are so excited about "conversations" with computers. The example I've used (with varying levels of effectiveness) was to get someone to think about driving their car by only talking to it. Not a self driving car that does the driving for you, but telling it things like: turn, accelerate, stop, slow down, speed up, put on the blinker, turn off the blinker, etc. It would be annoying and painful and you couldn't talk to your passenger while you were "driving" because that might make the car do something weird. My point, and I think it was the author's as well, is that you aren't "conversing" with your computer, you are making it do what you want. There are simpler, faster, and more effective ways to do that then to talk at it with natural language.
  • by PeterStuer on 4/1/25, 7:16 AM

    Here's where the article goes wrong:

    1. "Natural language is a data transfer mechanism"

    2. "Data transfer mechanisms have two critical factors: speed and lossiness"

    3. "Natural language has neither"

    While a conversational interface does transfer information, its main qualities are what I always refer to as "blissfull ignorance" and "intelligent interpretation".

    Blisfull ignorance allows the requester to state an objective while not being required to know or even be right in how to achieve it. It is the opposite of operational command. Do as I mean, not as I say.

    "Intelligent Interpretation" allows the receiver the freedom to infer an intention in the communication rather than a command. It also allows for contextual interactions such as goal oriented partial clarification and elaboration.

    The more capable of intelligent interpretation the request execution system is, the more appropriate a conversational interface will be.

    Think of it as managing a team. If they are junior, inexperienced and not very bright, you will probably tend towards handholding, microtasking and micromanagement to get things done. If you have a team of senior, experienced and bright engineers, you can with a few words point out a desire and, trust them to ask for information when there is relevant ambiguity, and expect a good outcome without having to detail manage every minute of their days.

  • by TeMPOraL on 4/1/25, 8:00 AM

    Star Trek continues to be prescient. It not only introduced the conversational interface to the masses, it also nailed its proper uses in ways we're still (re)discovering now.

    If you pay attention to how the voice interface is used in Star Trek (TNG and upwards), it's basically exactly what the article is saying - it complements manual inputs and works as a secondary channel. Nobody is trying to manually navigate the ship by voicing out specific control inputs, or in the midst of a battle, call out "computer, fire photon torpedoes" - that's what the consoles are for (and there are consoles everywhere). Voice interface is secondary - used for delegation, queries (that may be faster to say than type), casual location-independent use (lights, music; they didn't think of kitchen timers, though (then again, replicators)), brainstorming, etc.

    Yes, this is a fictional show and the real reason for voice interactions was to make it a form of exposition, yadda yadda - but I'd like to think that all those people writing the script, testing it, acting and shooting it, were in perfect position to tell which voice interactions made sense and which didn't: they'd know what feels awkward or nonsensical when acting, or what comes off this way when watching it later.

  • by cdrini on 4/1/25, 5:18 AM

    Completely agree, voice UI is best as an augmentation of our current HCI patterns with keyboard/mouse. I think one of the reasons this is, is because our brains kind of have separate buffers for visual memory and aural memory (Baddeley's working memory model). Most computer use takes up the visual buffer, and our aural buffer has extra bandwidth. This also means we can do things aurally while still maintaining focus/attention on what we're doing visually, allowing a kind of multitasking.

    One thing I will note is that I'm not sure I buy the example for voice UIs being inefficient. I've almost never said "Alexa what's the weather like in Toronto?". I just say "Alexa, weather". And that's much faster than taking my phone out and opening an app. I don't think we need to compress voice input. Language kind of auto-compresses, since we create new words for complex concepts when we find the need.

    For example, in a book club we recently read "As Long as the Lemon Trees Grow". We almost immediately stopped referring to it by the full title, and instead just called it "lemons" because we had to refer to it so much. Eg "Did you finish lemons yet?" or "This book is almost as good as lemons!". The context let shorten the word. Similarly the context of my location shortens the word to just "weather". I think this might be the way the voice UIs can be made more efficient: in the same way human speech makes itself more efficient.

  • by pugio on 4/1/25, 5:17 AM

    > The second thing we need to figure out is how we can compress voice input to make it faster to transmit. What’s the voice equivalent of a thumbs-up or a keyboard shortcut? Can I prompt Claude faster with simple sounds and whistles?

    This reminds me of the amazing 2013 video of Travis Rudd coding python by voice: https://youtu.be/8SkdfdXWYaI?si=AwBE_fk6Y88tLcos

    The number of times in the last few years I've wanted that level of "verbal hotkeys"... The latencies of many coding llms are still a little bit too low to allow for my ideal level of flow (though admittedly I haven't tried one's hosted on services like groq), but I can clearly envision a time when I'm issuing tight commands to a coder model that's chatting with me and watching my program evolve on screen in real time.

    On a somewhat related note to conversational interfaces, the other day I wanted to study some first aid stuff - used Gemini to read the whole textbook and generate Anki flash cards, then copied and pasted the flashcards directly into chat GPT voice mode and had it quiz me. That was probably the most miraculous experience of voice interface I've had in a long time - I could do chores while being constantly quizzed on what I wanted to learn, and anytime I had a question or comment I could just ask it to explain or expound on a term or tangent.

  • by android521 on 4/1/25, 6:57 AM

    >I admit that the title of this essay is a bit misleading (made you click though, didn’t it?). This isn’t really a case against conversational interfaces, it’s a case against zero-sum thinking.

    No matter the intention or quality of the article, i do not like this kind of deceitful link-bait article. It may have higher quality than pure link-bait but nobody like to be deceived

  • by techpineapple on 4/1/25, 2:33 AM

    There’s an interesting… paradox? Observation? That up until 20-30 years ago, humans were not computerized beings. I remember a thought leader at a company I worked at said that the future was wearable computing, a computer that disappears from your knowing and just integrates with your life. And that sounds great and human and has a very thought leadery sense of being forward thinking.

    But I think it’s wrong? Ever since the invention of the television, we’ve been absolutely addicted to screens. Screens and remotes, and I think there’s something sort of anti-humanly human about it. Maybe we don’t want to be human? But people I think would generally much rather tap their thumb on the remote than talk to their tv, and a visual interface you hold in the palm of your hand is not going away any time soon.

  • by whatnow37373 on 4/1/25, 9:08 AM

    It's no wonder extraverted normie and managerial types that get through their day by talking think throwing words at a problem is the best thing since sliced bread.

    They have problems like "compose an email that vaguely makes the impression I'm considering various options but I'm actually not" and for that, I suspect, the conversational workflow is quite good.

    Anyone else that actually just does the stuff is viscerally aware of how sub-optimal it is to throw verbiage at a computer.

    I guess it depends on what level of abstraction you're working at.

  • by benob on 4/1/25, 6:45 AM

    To me natural language interfaces are like the mouse-driven menu vs terminal interpreter. They allow good discoverability in systems that we don't master at the cost of efficiency.

    As always, good UI allows for using multiple modalities.

  • by alnwlsn on 4/1/25, 1:01 PM

    Somebody showed me a text-to-CAD AI tool recently, and I can't help but feel that whoever made it doesn't understand that people who use CAD aren't trying to solve the problem of "make a model of a rubber duck" but something more like "make a custom angle bracket which mounts part number xxxyyyy". Sure, you can try to describe what you want in words, but there's a reason machine shops want drawings and not a 300 word poem like you're a 14th century monk. Much much easier to just draw a picture.
  • by DabeDotCom on 4/1/25, 12:42 PM

    > It was like they were communicating telepathically. > > That is the type of relationship I want to have with my computer!

    The problem is, "The Only Thing Worse Than Computers Making YOU Do Everything... Is When They Do Everything *FOR* You!"

    "ad3} and "aP might not be "discoverable" vi commands, but they're fast and precise.

    Plus, it's easier to teach a human to think like a computer than to teach a computer to think like a human — just like it's easier to teach a musician to act than to teach an actor how to play an instrument — but I admit, it's not as scalable; you can't teach everyone Fortran or C, so we end up looking for these Pareto Principle shortcuts: Javascript provides 20% of the functionality, and solves 80% of the problems.

    But then people find Javascript too hard, so they ask ChatGPT/Bard/Gemini to write it for them. Another 20% solution — of the original 20% is now 4% as featureful — but it solves 64% of the world's problems. (And it's on pace to consume 98% of the world's electricity, but I digress!)

    PS: Mobile interfaces don't HAVE to suck for typing; I could FLY on my old Treo! But "modern" UI eschews functionality for "clean" brutalist minimalism. "Why make it easy to position your cursor when we spent all that money developing auto-conflict?" «sigh»

  • by earcar on 4/1/25, 9:28 AM

    Who's actually making the claim we should replace everything with natural language? Almost nobody serious. This article sets up a bit of a strawman while making excellent points.

    What we're really seeing is specific applications where conversation makes sense, not a wholesale revolution. Natural language shines for complex, ambiguous tasks but is hilariously inefficient for things like opening doors or adjusting volume.

    The real insight here is about choosing the right interface for the job. We don't need philosophical debates about "the future of computing" - we need pragmatic combinations of interfaces that work together seamlessly.

    The butter-passing example is spot on, though. The telepathic anticipation between long-married couples is exactly what good software should aspire to. Not more conversation, but less need for it.

    Where Julian absolutely nails it is the vision of AI as an augmentation layer rather than replacement. That's the realistic future - not some chat-only dystopia where we're verbally commanding our way through tasks that a simple button press would handle more efficiently.

    The tech industry does have these pendulum swings where we overthink basic interaction models. Maybe we could spend less time theorizing about natural language as "the future" and more time just building tools that solve real problems with whatever interface makes the most sense.

  • by nottorp on 4/1/25, 5:33 AM

    > because after 50+ years of marriage he just sensed that she was about to ask for it. It was like they were communicating telepathically.

    > That is the type of relationship I want to have with my computer!

    He means automation of routine tasks? Took 50 years to reach that in the example.

    What if you want to do something new? Will the thought guessing module in your computer even allow that?

  • by rimeice on 4/1/25, 5:30 AM

    Individual UIs have been built for every product that has a UI with specific shortcuts and specific techniques you learn to use that tool. I don’t see why the same couldn’t apply for speech interfaces. The article does mention we haven’t figured out shortcuts like the thumbs up equivalent in speech yet but doesn’t explore that further. I can imagine specific words or combinations of words being used to control certain software that you have to learn. Eventually there would be some unification for common tasks.
  • by macleginn on 4/1/25, 6:59 AM

    I agree with some of the sentiments in the post, but I am somewhat surprised by the framing. Why make ‘a case’ against something that will clearly win or lose depending on adoption? Is the author suggestion that we should not be betting our money or resources on developing this? In this case we need more details for particular use cases, I would say.
  • by fellerts on 4/1/25, 8:30 AM

    > To put the writing and speaking speeds into perspective, we form thoughts at 1,000-3,000 words per minute. Natural language might be natural, but it’s a bottleneck.

    Natural language is very lossy: forming a thought and conveying that through speech or text is often an exercise in frustration. So where does "we form thoughts at 1,000-3,000 words per minute" come from?

    The author clearly had a point about the efficiency of thought vs. natural language, but his thought was lost in a layer of translation. Probably because thoughts don't map cleanly onto words: I may lack some prerequisite knowledge to graph what the author is saying here, which pokes at the core of the issue: language is imperfect, so the statement "we form thoughts at 1,000-3,000 words per minute" makes no sense to me.

    Meta-joking aside, is "we form thoughts at 1,000-3,000 words per minute" an established fact? It's oddly specific.

  • by 3l3ktr4 on 4/1/25, 7:20 AM

    I disagree with the author when they say something along the lines of “why don’t we use buttons instead of using these new assistive technology? Buttons are much faster, and I proved humans like fast.” I think that’s false. Why after 10 years of software development I haven’t learned EMACS? Because I’m lazy, because I don’t think it’s the bottleneck of my work. My bottleneck might be creativity or knowledge and conversational interfaces might be the best thing there are for these (in the lack of a knowledgeable and kind human, which the author also seems to agree with). Anyway, I don’t know, I found the title a bit disconnected from the content and the conclusions a bit overlappingly confusing but this is a complicated question. In the end I agree that we want a mix of things, we want a couple of keyboard strokes and we want chats. But most of all we probably want direct brain interface! ;)
  • by eviks on 4/1/25, 7:57 AM

    > but we’ve never found a mobile equivalent for keyboard shortcuts. Guess why we still don’t have a truly mobile-first productivity app after almost 20 years since the introduction of the iPhone?

    Has it even been tried? Is there an iPhone text editing app with fully customizable keyboard that allows for setting up modes/gestures/shortcuts, scriptable if necessary?

    > A natural language prompt like “Hey Google, what’s the weather in San Francisco today?” just takes 10x longer than simply tapping the weather app on your homescreen.

    That's not entirely fair, the natural language could just as well be side button + saying "Weather" with the same result, though you can make app availability even easier by just displaying weather results on the homescreen without tapping

  • by gatinsama on 4/1/25, 7:19 AM

    It is a huge turnoff for me when futuristic series use conversational interfaces. It happened in the Expanse and was hard to watch. For anyone who likes to think, learn, and tinker with user interfaces (HCI in general), it's obviously a high-latency and noisy channel.
  • by perlgeek on 4/1/25, 6:45 AM

    > To put the writing and speaking speeds into perspective, we form thoughts at 1,000-3,000 words per minute. Natural language might be natural, but it’s a bottleneck.

    We might form fleeting thoughts much faster than we can express them, but if we want to formulate thoughts clearly enough to express them to other people, I think we're close to the ~150 words per minute we can actually speak.

    I recently listened to a Linguistics podcast (lingthusiasm, though I don't recall which episode) where they talked about the efficiency of different languages, and that in the end they all end up roughly the same, because it's really the thought processes that limit the amount of information you communicate, not the language production.

  • by vakkermans on 4/1/25, 9:15 AM

    I appreciate the attempt at making sense of conversational interfaces, but I don't think natural language as a "data transfer mechanism" is a productive way of doing it.

    Natural language isn't best described as data transfer. It's primarily a mechanism for collaboration and negotiation. A speech act isn't transferring data, it's an action with intent. Viewed as such the key metrics are not speed and loss, but successful coordination.

    This is a case where a computer science stance isn't fruitful, and it's best to look through a linguistics lens.

  • by nitwit005 on 4/1/25, 6:50 AM

    > I’m not entirely sure where this obsession with conversational interfaces comes from.

    There's a very similar obsession with the idea that things should be visual instead of textual. We tend to end up back at text.

    Personal suspicion for both is the media set a lot of people's expectations. They loudly talked to the computer in films like 2001 or Star Trek for drama reasons, and all the movie computers generally fancy visual interactions.

  • by byschii on 4/1/25, 7:48 AM

    (assuming privacy is handled correctly) i like the idea of my pc always having a side-channel for communication of "simpler" things.

    I m not sure how it could fit in to my 2 modalities of work: (i) alone in complete focus / silence (ii) in the office where there is already too much spoken communication between humans... maybe it s just a matter of getting used to it

  • by janpmz on 4/1/25, 7:04 AM

    Speaking and pronouncing words feels like more effort and requires more attention than typing on my keyboard or moving the mouse.
  • by Peteragain on 4/1/25, 6:37 AM

    "Like writing, my ChatGPT conversation is a thinking process – not an interaction that happens post-thought" - Brilliant! I have worked on computers and language for over 30 years and the ups and downs certainly make such a passion a CLA (career limiting activity). I am adding the citation to my bibtex file ..
  • by incorrecthorse on 4/1/25, 10:11 AM

    > we form thoughts at 1,000-3,000 words per minute

    I would like to know what this measures exactly.

    The reason I often prefer writing to talking is because writing lets me the time to pause and think. In those cases the bottleneck is very clearly my thought process (which, at least consciously, doesn't appear to me as "words").

  • by Aardwolf on 4/1/25, 8:29 AM

    I'd be ok with a conversational interface if I can use it to improve my non-conversational UI.

    E.g. say I find the scrollbars somewhere way too thin and invisible and I want thick high contrast scrollbars, and nobody thought of implementing that? Ask the AI and it changes your desktop interface to do it immediately.

  • by var_cw on 4/1/25, 7:44 AM

    have few thoughts on this, esp after working in voice AI since couple of years

    1. > "What’s the voice equivalent of a thumbs-up or a keyboard shortcut?" Current ASR systems are much narrow in terms of just capturing the transcript. there is no higher level of intelligence, even the best of GPT voice models fail at this. Humans are highly receptive of non-verbal cues. All the uhms, ahs, even the pauses we take is where the nuance lies.

    2. the hardware for voice AI is still not consumer ready interacting with a voice AI is still doesn't feel private. i am only able to do a voice based interaction only when am in my car. sadly at other places it just feels a privacy breach as its acoustically public. have been thinking about a private microphones to enable more AI based conversations.

  • by matthewsinclair on 4/1/25, 6:44 AM

    Strong agreement on this one. I’ve been referring to this as a UI/UX cul de sac to anyone who’ll listen for a while now.

    Also: https://news.ycombinator.com/item?id=42934190#42935946

  • by hipinspire on 4/1/25, 10:15 AM

  • by notarobot123 on 4/1/25, 7:14 AM

    What if apps published a declarative interface for context specific commands? Conversational interfaces would glue together spoken instructions with sensible matches from the set of available contextual interfaces.
  • by willtemperley on 4/1/25, 7:20 AM

    2001: A Space Odyssey was the original case against conversational interfaces.
  • by woile on 4/1/25, 9:14 AM

    I was recently at a cafe using the computer in Argentina, and I was thinking it would be impossible to use a voice interface here. Everyone chatting so loud I could barely hear my own thoughts.
  • by sebastiennight on 4/1/25, 1:05 PM

    The author seems to ignore the main case for conversational interfaces - which is not to replace the software, but the software user.

    Not telling your car to turn left or right, but telling your cab driver you're going to the airport.

    This is our usecase at our startup[1] - we want to enable tiny SMBs who didn't have the budget to hire a "video guy", to get an experience similar to having one. And that's why we're switching to a conversational UX (because those users would normally communicate with the "video guy" or girl by sending them a Whatsapp message, not by clicking buttons on the video software)

    [1] https://www.onetake.ai

  • by stevage on 4/1/25, 1:32 PM

    > “This is it! The next computing paradigm is here! We’ll only use natural language going forward!”

    Is anyone actually making any argument like that? The whole piece feels like a giant strawman.

  • by heisenbit on 4/1/25, 7:53 AM

    The way I think is in-band vs. out-of-band control. The former is initially convenient but can blow up in surprising ways and remains a source of security issues.
  • by spprashant on 4/1/25, 12:43 PM

    Anyone know how to figure out the web stack for this blog? Its elegant, minimal, and has enough support for some rich elements which add to the experience.
  • by break_the_bank on 4/1/25, 6:48 AM

    shameless plug here but we have been building in a similar space. We call it tabtabtab.ai - https://tabtabtab.ai/

    The core loop is promptless ai that’s guided by accessibility x screenshots & it’s everywhere on your Mac.

    You can snap this comment section or the front page and we’ll structure it for you if it’s a spreadsheet or write a tweet if you’re on Twitter.

  • by paulsutter on 4/1/25, 1:44 PM

    One-way voice is the right answer. Keep the UI, even the mouse and keyboard. But let me speak requests instead of typing literally everything
  • by anthk on 4/1/25, 4:56 AM

    Get a proper physical keyboard to write, and stop using smartphones as typing devices.
  • by novaRom on 4/1/25, 6:53 AM

    > AI needs to work at the OS level

    Absolutely agree. An agent running in the background.

  • by levmiseri on 4/1/25, 7:56 AM

    WPM and other attempts at putting a one specific number/metric to point to is imo only mudding the waters. Better way to think about just how awfully slow natural language (on average) is as an interface is to think about interactions with {whatever} in terms of *intents* and *actions*.

    Comparing "What's the weather in London" with clicking the weather app icon is misleading and too simplistic. When people imagine a future driven by conversational interfaces, they usually picture use cases like:

    1. "When is my next train leaving?"

    2. "Show me my photos from the vacation in Italy with yellow flowers on them"

    3. "Book a flight from New York to Zurich on {dates}"

    ...

    And a way to highlight what's faster/less-noisy is to compare how natural language vs. mouse/touch maps onto the Intent -> Action. The thing is that interactions like these are generally so much more complex. E.g. Does the machine know what 'my' train is? If it doesn't, can it offer reasonable disambiguation? If it can't, what then? And does it present the information in a way where the next likely action is reachable, or will I need to converse about it?

    You could picture a long table listing similar use cases in different contexts and compare various input methods and modalities and their speed. Flicking a finger on a 2d surface or using a mouse and a keyboard is going to be — on average — much faster and with less dead-ends.

    Conversational interfaces are not the future. Imo even in the sense of 'augmenting', it's not going to happen. Natural-language driven interface will always play the role of a supporting (still important, though!) role. An accessibility aid when e.g. temporarily, permanently, or contextually not able to use the primary input method to 'encode your intent'.

  • by m463 on 4/1/25, 6:29 AM

    "the case against"

    You know, doesn't matter what you say. If businesses want something, they'll do it to you whether it's the best interface or not.

    Amazon forces "the rabble" into their chatbot customer service system, and hides access to people.

    People get touchscreens in their car and fumble to turn on their fog lights or defrost in bad weather. They get voice assistant phone trees and angrily yell "operator and agent".

    I really wish there were true competition that would let people choose what works for them.

  • by matsemann on 4/1/25, 7:31 AM

    Not exactly the same case as the article, but just a few minutes ago I booked a time for vaccinations online, and it was done through a chat interface. Screenshot: https://imgur.com/a/OWv7deF

    Just infuriating. Instead of a normal date- and timepicker where I could see available slots, it's a chat where you have to click certain options. Then I had to reply "Ja" (yes) when it asked me if I had clicked the correct date. And then when none of the times of the day suited me, I couldn't just click a new date on the previous message, I instead have to press "vis datovelger på nytt"/show datepicker again, and get a new chat message where I this time select a different date and answer "Ja" again to see the available time slots. It's slow and useless. The title bar of the page says "Microsoft Copilot Studio", some fancy tech instead of a simple form..

  • by randomfool on 4/1/25, 5:49 AM

    And yet here we are, discussing this in a threaded conversation.
  • by wewewedxfgdf on 4/1/25, 8:44 AM

    There have abeen a few of these posts on HN redcently - people who claim that AI/LLM are just some sort of passing fad or of no value or less value than people are saying anyway.

    People who write these posts want to elevate their self value by nay-saying what is popular. I don't understand the psychology but it seems like that sort of pattern to me.

    It takes a deliberate blindness to say that AI/LLMs are just some sort of thing that has popped up every few years and this is the same as them and it will fade away. Why would someone choose to be so blind and dismissive of something obviously fundamentally world changing? Again - it's the instinct to knock down the tall poppy and therefore prove that you have some sort of strength/value.