by nnx on 4/1/25, 2:14 AM with 217 comments
by ChuckMcM on 4/1/25, 6:35 AM
by PeterStuer on 4/1/25, 7:16 AM
1. "Natural language is a data transfer mechanism"
2. "Data transfer mechanisms have two critical factors: speed and lossiness"
3. "Natural language has neither"
While a conversational interface does transfer information, its main qualities are what I always refer to as "blissfull ignorance" and "intelligent interpretation".
Blisfull ignorance allows the requester to state an objective while not being required to know or even be right in how to achieve it. It is the opposite of operational command. Do as I mean, not as I say.
"Intelligent Interpretation" allows the receiver the freedom to infer an intention in the communication rather than a command. It also allows for contextual interactions such as goal oriented partial clarification and elaboration.
The more capable of intelligent interpretation the request execution system is, the more appropriate a conversational interface will be.
Think of it as managing a team. If they are junior, inexperienced and not very bright, you will probably tend towards handholding, microtasking and micromanagement to get things done. If you have a team of senior, experienced and bright engineers, you can with a few words point out a desire and, trust them to ask for information when there is relevant ambiguity, and expect a good outcome without having to detail manage every minute of their days.
by TeMPOraL on 4/1/25, 8:00 AM
If you pay attention to how the voice interface is used in Star Trek (TNG and upwards), it's basically exactly what the article is saying - it complements manual inputs and works as a secondary channel. Nobody is trying to manually navigate the ship by voicing out specific control inputs, or in the midst of a battle, call out "computer, fire photon torpedoes" - that's what the consoles are for (and there are consoles everywhere). Voice interface is secondary - used for delegation, queries (that may be faster to say than type), casual location-independent use (lights, music; they didn't think of kitchen timers, though (then again, replicators)), brainstorming, etc.
Yes, this is a fictional show and the real reason for voice interactions was to make it a form of exposition, yadda yadda - but I'd like to think that all those people writing the script, testing it, acting and shooting it, were in perfect position to tell which voice interactions made sense and which didn't: they'd know what feels awkward or nonsensical when acting, or what comes off this way when watching it later.
by cdrini on 4/1/25, 5:18 AM
One thing I will note is that I'm not sure I buy the example for voice UIs being inefficient. I've almost never said "Alexa what's the weather like in Toronto?". I just say "Alexa, weather". And that's much faster than taking my phone out and opening an app. I don't think we need to compress voice input. Language kind of auto-compresses, since we create new words for complex concepts when we find the need.
For example, in a book club we recently read "As Long as the Lemon Trees Grow". We almost immediately stopped referring to it by the full title, and instead just called it "lemons" because we had to refer to it so much. Eg "Did you finish lemons yet?" or "This book is almost as good as lemons!". The context let shorten the word. Similarly the context of my location shortens the word to just "weather". I think this might be the way the voice UIs can be made more efficient: in the same way human speech makes itself more efficient.
by pugio on 4/1/25, 5:17 AM
This reminds me of the amazing 2013 video of Travis Rudd coding python by voice: https://youtu.be/8SkdfdXWYaI?si=AwBE_fk6Y88tLcos
The number of times in the last few years I've wanted that level of "verbal hotkeys"... The latencies of many coding llms are still a little bit too low to allow for my ideal level of flow (though admittedly I haven't tried one's hosted on services like groq), but I can clearly envision a time when I'm issuing tight commands to a coder model that's chatting with me and watching my program evolve on screen in real time.
On a somewhat related note to conversational interfaces, the other day I wanted to study some first aid stuff - used Gemini to read the whole textbook and generate Anki flash cards, then copied and pasted the flashcards directly into chat GPT voice mode and had it quiz me. That was probably the most miraculous experience of voice interface I've had in a long time - I could do chores while being constantly quizzed on what I wanted to learn, and anytime I had a question or comment I could just ask it to explain or expound on a term or tangent.
by android521 on 4/1/25, 6:57 AM
No matter the intention or quality of the article, i do not like this kind of deceitful link-bait article. It may have higher quality than pure link-bait but nobody like to be deceived
by techpineapple on 4/1/25, 2:33 AM
But I think it’s wrong? Ever since the invention of the television, we’ve been absolutely addicted to screens. Screens and remotes, and I think there’s something sort of anti-humanly human about it. Maybe we don’t want to be human? But people I think would generally much rather tap their thumb on the remote than talk to their tv, and a visual interface you hold in the palm of your hand is not going away any time soon.
by whatnow37373 on 4/1/25, 9:08 AM
They have problems like "compose an email that vaguely makes the impression I'm considering various options but I'm actually not" and for that, I suspect, the conversational workflow is quite good.
Anyone else that actually just does the stuff is viscerally aware of how sub-optimal it is to throw verbiage at a computer.
I guess it depends on what level of abstraction you're working at.
by benob on 4/1/25, 6:45 AM
As always, good UI allows for using multiple modalities.
by alnwlsn on 4/1/25, 1:01 PM
by DabeDotCom on 4/1/25, 12:42 PM
The problem is, "The Only Thing Worse Than Computers Making YOU Do Everything... Is When They Do Everything *FOR* You!"
"ad3} and "aP might not be "discoverable" vi commands, but they're fast and precise.
Plus, it's easier to teach a human to think like a computer than to teach a computer to think like a human — just like it's easier to teach a musician to act than to teach an actor how to play an instrument — but I admit, it's not as scalable; you can't teach everyone Fortran or C, so we end up looking for these Pareto Principle shortcuts: Javascript provides 20% of the functionality, and solves 80% of the problems.
But then people find Javascript too hard, so they ask ChatGPT/Bard/Gemini to write it for them. Another 20% solution — of the original 20% is now 4% as featureful — but it solves 64% of the world's problems. (And it's on pace to consume 98% of the world's electricity, but I digress!)
PS: Mobile interfaces don't HAVE to suck for typing; I could FLY on my old Treo! But "modern" UI eschews functionality for "clean" brutalist minimalism. "Why make it easy to position your cursor when we spent all that money developing auto-conflict?" «sigh»
by earcar on 4/1/25, 9:28 AM
What we're really seeing is specific applications where conversation makes sense, not a wholesale revolution. Natural language shines for complex, ambiguous tasks but is hilariously inefficient for things like opening doors or adjusting volume.
The real insight here is about choosing the right interface for the job. We don't need philosophical debates about "the future of computing" - we need pragmatic combinations of interfaces that work together seamlessly.
The butter-passing example is spot on, though. The telepathic anticipation between long-married couples is exactly what good software should aspire to. Not more conversation, but less need for it.
Where Julian absolutely nails it is the vision of AI as an augmentation layer rather than replacement. That's the realistic future - not some chat-only dystopia where we're verbally commanding our way through tasks that a simple button press would handle more efficiently.
The tech industry does have these pendulum swings where we overthink basic interaction models. Maybe we could spend less time theorizing about natural language as "the future" and more time just building tools that solve real problems with whatever interface makes the most sense.
by nottorp on 4/1/25, 5:33 AM
> That is the type of relationship I want to have with my computer!
He means automation of routine tasks? Took 50 years to reach that in the example.
What if you want to do something new? Will the thought guessing module in your computer even allow that?
by rimeice on 4/1/25, 5:30 AM
by macleginn on 4/1/25, 6:59 AM
by fellerts on 4/1/25, 8:30 AM
Natural language is very lossy: forming a thought and conveying that through speech or text is often an exercise in frustration. So where does "we form thoughts at 1,000-3,000 words per minute" come from?
The author clearly had a point about the efficiency of thought vs. natural language, but his thought was lost in a layer of translation. Probably because thoughts don't map cleanly onto words: I may lack some prerequisite knowledge to graph what the author is saying here, which pokes at the core of the issue: language is imperfect, so the statement "we form thoughts at 1,000-3,000 words per minute" makes no sense to me.
Meta-joking aside, is "we form thoughts at 1,000-3,000 words per minute" an established fact? It's oddly specific.
by 3l3ktr4 on 4/1/25, 7:20 AM
by eviks on 4/1/25, 7:57 AM
Has it even been tried? Is there an iPhone text editing app with fully customizable keyboard that allows for setting up modes/gestures/shortcuts, scriptable if necessary?
> A natural language prompt like “Hey Google, what’s the weather in San Francisco today?” just takes 10x longer than simply tapping the weather app on your homescreen.
That's not entirely fair, the natural language could just as well be side button + saying "Weather" with the same result, though you can make app availability even easier by just displaying weather results on the homescreen without tapping
by gatinsama on 4/1/25, 7:19 AM
by perlgeek on 4/1/25, 6:45 AM
We might form fleeting thoughts much faster than we can express them, but if we want to formulate thoughts clearly enough to express them to other people, I think we're close to the ~150 words per minute we can actually speak.
I recently listened to a Linguistics podcast (lingthusiasm, though I don't recall which episode) where they talked about the efficiency of different languages, and that in the end they all end up roughly the same, because it's really the thought processes that limit the amount of information you communicate, not the language production.
by vakkermans on 4/1/25, 9:15 AM
Natural language isn't best described as data transfer. It's primarily a mechanism for collaboration and negotiation. A speech act isn't transferring data, it's an action with intent. Viewed as such the key metrics are not speed and loss, but successful coordination.
This is a case where a computer science stance isn't fruitful, and it's best to look through a linguistics lens.
by nitwit005 on 4/1/25, 6:50 AM
There's a very similar obsession with the idea that things should be visual instead of textual. We tend to end up back at text.
Personal suspicion for both is the media set a lot of people's expectations. They loudly talked to the computer in films like 2001 or Star Trek for drama reasons, and all the movie computers generally fancy visual interactions.
by byschii on 4/1/25, 7:48 AM
I m not sure how it could fit in to my 2 modalities of work: (i) alone in complete focus / silence (ii) in the office where there is already too much spoken communication between humans... maybe it s just a matter of getting used to it
by janpmz on 4/1/25, 7:04 AM
by Peteragain on 4/1/25, 6:37 AM
by incorrecthorse on 4/1/25, 10:11 AM
I would like to know what this measures exactly.
The reason I often prefer writing to talking is because writing lets me the time to pause and think. In those cases the bottleneck is very clearly my thought process (which, at least consciously, doesn't appear to me as "words").
by Aardwolf on 4/1/25, 8:29 AM
E.g. say I find the scrollbars somewhere way too thin and invisible and I want thick high contrast scrollbars, and nobody thought of implementing that? Ask the AI and it changes your desktop interface to do it immediately.
by var_cw on 4/1/25, 7:44 AM
1. > "What’s the voice equivalent of a thumbs-up or a keyboard shortcut?" Current ASR systems are much narrow in terms of just capturing the transcript. there is no higher level of intelligence, even the best of GPT voice models fail at this. Humans are highly receptive of non-verbal cues. All the uhms, ahs, even the pauses we take is where the nuance lies.
2. the hardware for voice AI is still not consumer ready interacting with a voice AI is still doesn't feel private. i am only able to do a voice based interaction only when am in my car. sadly at other places it just feels a privacy breach as its acoustically public. have been thinking about a private microphones to enable more AI based conversations.
by matthewsinclair on 4/1/25, 6:44 AM
Also: https://news.ycombinator.com/item?id=42934190#42935946
by hipinspire on 4/1/25, 10:15 AM
by notarobot123 on 4/1/25, 7:14 AM
by willtemperley on 4/1/25, 7:20 AM
by woile on 4/1/25, 9:14 AM
by sebastiennight on 4/1/25, 1:05 PM
Not telling your car to turn left or right, but telling your cab driver you're going to the airport.
This is our usecase at our startup[1] - we want to enable tiny SMBs who didn't have the budget to hire a "video guy", to get an experience similar to having one. And that's why we're switching to a conversational UX (because those users would normally communicate with the "video guy" or girl by sending them a Whatsapp message, not by clicking buttons on the video software)
by stevage on 4/1/25, 1:32 PM
Is anyone actually making any argument like that? The whole piece feels like a giant strawman.
by heisenbit on 4/1/25, 7:53 AM
by spprashant on 4/1/25, 12:43 PM
by break_the_bank on 4/1/25, 6:48 AM
The core loop is promptless ai that’s guided by accessibility x screenshots & it’s everywhere on your Mac.
You can snap this comment section or the front page and we’ll structure it for you if it’s a spreadsheet or write a tweet if you’re on Twitter.
by paulsutter on 4/1/25, 1:44 PM
by anthk on 4/1/25, 4:56 AM
by novaRom on 4/1/25, 6:53 AM
Absolutely agree. An agent running in the background.
by levmiseri on 4/1/25, 7:56 AM
Comparing "What's the weather in London" with clicking the weather app icon is misleading and too simplistic. When people imagine a future driven by conversational interfaces, they usually picture use cases like:
1. "When is my next train leaving?"
2. "Show me my photos from the vacation in Italy with yellow flowers on them"
3. "Book a flight from New York to Zurich on {dates}"
...
And a way to highlight what's faster/less-noisy is to compare how natural language vs. mouse/touch maps onto the Intent -> Action. The thing is that interactions like these are generally so much more complex. E.g. Does the machine know what 'my' train is? If it doesn't, can it offer reasonable disambiguation? If it can't, what then? And does it present the information in a way where the next likely action is reachable, or will I need to converse about it?
You could picture a long table listing similar use cases in different contexts and compare various input methods and modalities and their speed. Flicking a finger on a 2d surface or using a mouse and a keyboard is going to be — on average — much faster and with less dead-ends.
Conversational interfaces are not the future. Imo even in the sense of 'augmenting', it's not going to happen. Natural-language driven interface will always play the role of a supporting (still important, though!) role. An accessibility aid when e.g. temporarily, permanently, or contextually not able to use the primary input method to 'encode your intent'.
by m463 on 4/1/25, 6:29 AM
You know, doesn't matter what you say. If businesses want something, they'll do it to you whether it's the best interface or not.
Amazon forces "the rabble" into their chatbot customer service system, and hides access to people.
People get touchscreens in their car and fumble to turn on their fog lights or defrost in bad weather. They get voice assistant phone trees and angrily yell "operator and agent".
I really wish there were true competition that would let people choose what works for them.
by matsemann on 4/1/25, 7:31 AM
Just infuriating. Instead of a normal date- and timepicker where I could see available slots, it's a chat where you have to click certain options. Then I had to reply "Ja" (yes) when it asked me if I had clicked the correct date. And then when none of the times of the day suited me, I couldn't just click a new date on the previous message, I instead have to press "vis datovelger på nytt"/show datepicker again, and get a new chat message where I this time select a different date and answer "Ja" again to see the available time slots. It's slow and useless. The title bar of the page says "Microsoft Copilot Studio", some fancy tech instead of a simple form..
by randomfool on 4/1/25, 5:49 AM
by wewewedxfgdf on 4/1/25, 8:44 AM
People who write these posts want to elevate their self value by nay-saying what is popular. I don't understand the psychology but it seems like that sort of pattern to me.
It takes a deliberate blindness to say that AI/LLMs are just some sort of thing that has popped up every few years and this is the same as them and it will fade away. Why would someone choose to be so blind and dismissive of something obviously fundamentally world changing? Again - it's the instinct to knock down the tall poppy and therefore prove that you have some sort of strength/value.