from Hacker News

Chat is a bad UI pattern for development tools

by cryptophreak on 2/4/25, 4:06 PM with 417 comments

  • by wiremine on 2/4/25, 6:13 PM

    I'm going to take a contrarian view and say it's actually a good UI, but it's all about how you approach it.

    I just finished a small project where I used o3-mini and o3-mini-high to generate most of the code. I averaged around 200 lines of code an hour, including the business logic and unit tests. Total was around 2200 lines. So, not a big project, but not a throw away script. The code was perfectly fine for what we needed. This is the third time I've done this, and each time I get faster and better at it.

    1. I find a "pair programming" mentality is key. I focus on the high-level code, and let the model focus on the lower level code. I code review all the code, and provide feedback. Blindly accepting the code is a terrible approach.

    2. Generating unit tests is critical. After I like the gist of some code, I ask for some smoke tests. Again, peer review the code and adjust as needed.

    3. Be liberal with starting a new chat: the models can get easily confused with longer context windows. If you start to see things go sideways, start over.

    4. Give it code examples. Don't prompt with English only.

    FWIW, o3-mini was the best model I've seen so far; Sonnet 3.5 New is a close second.

  • by taeric on 2/4/25, 4:47 PM

    I'm growing to the idea that chat is a bad UI pattern, period. It is a great record of correspondence, I think. But it is a terrible UI for doing anything.

    In large, I assert this is because the best way to do something is to do that thing. There can be correspondence around the thing, but the artifacts that you are building are separate things.

    You could probably take this further and say that narrative is a terrible way to build things. It can be a great way to communicate them, but being a separate entity, it is not necessarily good at making any artifacts.

  • by themanmaran on 2/4/25, 5:19 PM

    I'm surprised that the article (and comments) haven't mentioned Cursor.

    Agreed that copy pasting context in and out of ChatGPT isn't the fastest workflow. But Cursor has been a major speed up in the way I write code. And it's primarily through a chat interface, but with a few QOL hacks that make it way faster:

    1. Output gets applied to your file in a git-diff style. So you can approve/deny changes.

    2. It (kinda) has context of your codebase so you don't have to specify as much. Though it works best when you explicitly tag files ("Use the utils from @src/utils/currency.ts")

    3. Directly inserting terminal logs or type errors into the chat interface is incredibly convenient. Just hover over the error and click the "add to chat"

  • by croes on 2/4/25, 4:22 PM

    Natural language isn’t made to be precise that’s why we use a subset in programming languages.

    So you either need lots of extra text to remove the ambiguity of natural language if you use AI or you need a special precise subset to communicate with AI and that’s just programming with extra steps.

  • by matthewsinclair on 2/4/25, 5:48 PM

    Yep. 100% agree. The whole “chat as UX” metaphor is a cul-de-sac that I’m sure we’ll back out of sooner or later.

    I think about this like SQL in the late 80s. At the time, SQL was the “next big thing” that was going to mean we didn’t need programmers, and that management could “write code”. It didn’t quite work out that way, of course, as we all know.

    I see chat-based interfaces to LLMs going exactly the same way. The LLM will move down the stack (rather than up) and much more appropriate task-based UX/UI will be put on top of the LLM, coordinated thru a UX/UI layer that is much sympathetic to the way users actually want to interact with a machine.

    In the same way that no end-users ever touch SQL these days (mostly), we won’t expose the chat-based UX of an LLM to users either.

    There will be a place for an ad-hoc natural language interface to a machine, but I suspect it’ll be the exception rather than the rule.

    I really don’t think there are too many end users who want to be forced to seduce a mercurial LLM using natural language to do their day-to-day tech tasks.

  • by spolsky on 2/4/25, 5:34 PM

    I don't think Daniel's point is that Chat is generically a clunky UI and therefore Cursor cannot possibly exist. I think he's saying that to fully specify what a given computer program should do, you have to provide all kinds of details, and human language is too compressed and too sloppy to always include those details. For example, you might say "make a logon screen" but there are an infinite number of ways this could be done and until you answer a lot of questions you may not get what you want.

    If you asked me two or three years ago I would have strongly agreed with this theory. I used to point out that every line of code was a decision made by a programmer and that programming languages were just better ways to convey all those decisions than human language because they eliminated ambiguity and were much terser.

    I changed my mind when I saw how LLMs work. They tend to fill in the ambiguity with good defaults that are somewhere between "how everybody does it" and "how a reasonably bright junior programmer would do it".

    So you say "give me a log on screen" and you get something pretty normal with Username and Password and a decent UI and some decent color choices and it works fine.

    If you wanted to provide more details, you could tell it to use the background color #f9f9f9, but a part of what surprised my and caused me to change my mind on this matter was that you could also leave that out and you wouldn't get an error; you wouldn't get white text on white background; you would get a decent color that might be #f9f9f9 or might be #a1a1a1 but you saved a lot of time by not thinking about that level of detail and you got a good result.

  • by jakelazaroff on 2/4/25, 4:23 PM

    I agree with the premise but not with the conclusion. When you're building visual things, you communicate visually: rough sketches, whiteboard diagrams, mockups, notes scrawled in the margins.

    Something like tldraw's "make real" [1] is a much better bet, imo (not that it's mutually exclusive). Draw a rough mockup of what you want, let AI fill in the details, then draw and write on it to communicate your changes.

    We think multi-modally; why should we limit the creative process to just text?

    [1] https://tldraw.substack.com/p/make-real-the-story-so-far

  • by Edmond on 2/4/25, 4:22 PM

    This is about relying on requirements type documents to drive AI based software development, I believe this will be ultimately integrated into all the AI-dev tools, if not so already. It is really just additional context.

    Here is an example of our approach:

    https://blog.codesolvent.com/2024/11/building-youtube-video-...

    We are also using the requirements to build a checklist, the AI generates the checklist from the requirements document, which then serves as context that can be used for further instructions.

    Here's a demo:

    https://youtu.be/NjYbhZjj7o8?si=XPhivIZz3fgKFK8B

  • by ajmurmann on 2/4/25, 5:16 PM

    I agree with this and disagree at the same time. It depends what the goal is. If the goal is to have AI write the entire codebase for you, yes chat and human language is quite bad. That's part of the reason formal languages exist. But then only experts can use it. Requirement docs are a decent middle ground. However, I'm not sure it's a good goal for AI to generate the code base.

    The mode that I've found most fruitful when using Cursor is treating it almost exactly as I would a pair programming partner. When I start on a new piece of functionality I describe the problem and give it what my thoughts are on a potential solution and invite feedback. Sometimes my solution is the best. Sometimes the LLM had a better idea and frequently we take a modified version of what one of us suggested. Just as you would with a human partner. The result of the discussion is better than what either of us would have done on their own.

    I also will do classical ping-pong style tdd with it one we agreed on an approach. I'll write a test; llm makes it pass and write the next test which I'll make pass and so on.

    As with a real pair, it's important to notice when they are struggling and help them or take over. You can only do this if you stay fully engaged and understand every line. Just like when pairing. I've found llms get frequently in a loop where something doesn't work and they keep applying the same changes they've tried before and it never works. Understand what they are trying to do and help them out. Don't be a shitty pair for your llm!

  • by mlsu on 2/4/25, 6:17 PM

    I can't wait for someone to invent a new language, maybe a subset of English, that is structured enough to half-well describe computer programs. Then train a model with RLHF to generate source code based on prompts in this new language.

    It will slowly grow in complexity, strictness, and features, until it becomes a brand-new programming language, just with a language model and a SaaS sitting in the middle of it.

    A startup will come and disrupt the whole thing by simply writing code in a regular programming language.

  • by sho_hn on 2/4/25, 4:39 PM

    I'd say this criticism is well-addressed in aider. Steering the LLM via code comments is the first UX I've seen that works.

    https://aider.chat/docs/usage/watch.html

    How jarring it is & how much it takes you out of your own flow state is very much dependent on the model output quality and latency still, but at times it works rather nicely.

  • by fny on 2/4/25, 5:44 PM

    Narrative text is a worse UI pattern. It's impractical to read. Also how exactly do you merge narrative changes if you need to write several transformations as updates? Are you expected to update the original text? How does this affect diffs in version control?

    I think it's more ideal to have the LLM map text to some declarative pseudocode that's easy to read which is then translated to code.

    The example given by Daniel might map to something like this:

      define sign-in-screen:
        panel background "#f9f9f9":
          input email required: true, validate-on-blur: true
          input password required: true
          button "Sign in" gradient: ("#EEE" "#DDD")
          connect-to-database
    
    Then you'd use chat to make updates. For example, "make the gradient red" or "add a name field." Come to think of it, I don't see why chat is a bad interface at all with this set up.
  • by karmakaze on 2/4/25, 5:00 PM

    What makes it bad currently is the slow output.

    The example shows "Sign-in screen" with 4 (possibly more) instructions. This could equivalently have been entered one at a time into 'chat'. If the response for each was graphic and instantaneous, chat would be no worse than non-chat.

    What makes non-chat better is that the user puts more thought into what they write. I do agree for producing code Claude with up-front instructions beats ChatGPT handily.

    If OTOH AI's actually got as good or better than humans, chat would be fine. It would be like a discussion in Slack or PR review comments.

  • by quantadev on 2/4/25, 5:14 PM

    Just two tips/thoughts:

    1) The first thing to improve chats as a genre of interface, is that they should all always be a tree/hierarchy (just like Hacker News is), so that you can go back to ANY precise prior point during a discussion/chat and branch off in a different direction, and the only context the AI sees during the conversation is the "Current Node" (your last post), and all "Parent Nodes" going back to the beginning. So that at any time, it's not even aware of all the prior "bad branches" you decided to abandon.

    2) My second tip for designs of Coding Agents is do what mine does. I invented a 'block_begin/block_end' syntax which looks like this, and can be in any source file:

    // block_begin MyAddNumbers

    var = add(a, b)

    return a + b

    // block_end

    With this syntax you can use English language to explain and reason about extremely specific parts of your code with out expecting the LLM to "just understand". You can also direct the LLM to only edit/update specific "Named Blocks", as I call them.

    So a trivial example of a prompt expression related to the above might be "Always put number adding stuff in the MyAddNumbers Block".

    To explain entire architectural aspects to the LLM, these code block names are extremely useful.

  • by tiborsaas on 2/4/25, 5:19 PM

    > This is the core problem. You can’t build real software without being precise about what you want.

    I've tested a few integrated AI dev tools and it works like a charm. I don't type all my instructions at once. I do it the same way as I do it with code. Iteratively:

    1) Create a layout

    2) Fill left side

    3) Fill right side

    4) Connect components

    5) Populate with dummy data

    > The first company to get this will own the next phase of AI development tools.

    There's more than 25 working on this problem and they are already in production and some are really good.

  • by deeviant on 2/4/25, 4:47 PM

    You may have challenges using chat for development (Specifically, I mean text prompting, not necessary using a langchain session with a LLM, although that is my most common mode), but I do not. I have found chat to be, by far, the most productive interface with LLMs for coding.

    Everything else, is just putting layers, that are not nearly as capable at an LLM, between me and the raw power of the LLM.

    The core realization I made to truly unlock LLM code assistance as a 10x + productivity gain, is that I am not writing code anymore, I am writing requirements. It means being less an engineer, and more a manager, or perhaps an architect. It's not your job to write tax code anymore, it's your job to describe what the tax code needs to accomplish and how it's success can be defined and validated.

    Also, it's never even close to true that nobody uses LLMs for production software, here's a write-up by Google talking about using LLMs to drastically accelerate the migration of complex enterprise production systems: https://arxiv.org/pdf/2501.06972

  • by xena on 2/4/25, 5:09 PM

    My nuclear fire hot take is that the chat pattern is actively hampering AI tools because we have to square peg -> round hole things either into the chat UI (because that's what people expect), or that as developers you have to square peg -> round hole into the chat API patterns.

    Last night I wrote an implementation of an AI paper and it was so much easier to just discard the automatic chat formatting and do it "by hand": https://github.com/Xe/structured-reasoning/blob/main/index.j...

    I wonder if foundation models are an untapped goldmine in terms of the things they can do, but we can't surface them to developers because everyone's stuck in the chat pattern.

  • by spandrew on 2/4/25, 11:05 PM

    AI "Agents" that can do tasks outside of the confines of just a chat window are probably the next stage of utility.

    The company I work for integrated AI into some of our native content authoring front-end components and people loved it. Our system took a lot of annotating to be able to accurately translate the natural language to the patterns of our system but users so far have found it WAYYY more useful than chat bc it's deeply integrated into the tasks they do anyway.

    Figma had a similar success at last year's CONFIG when they revealed AI was renaming default layers names (Layer 1, 2, etc)... something they didn't want to do anyway. I dare say nobody gave a flying f about their "template" AI generation whereas layer renaming got audible cheers. Workflow integration is how you show people AI isn't just replacing their job like some bad sci-fi script.

    Workflow integration is going to be big. I think chat will have its place tho; just kind of as an aside in many cases.

  • by ingigauti on 2/4/25, 8:45 PM

    I took the position of not liking to much the AI coding early on. This was specially when it was starting. People writing long description to generate an app, I quickly noticed that doesn't work because it's all in the details.

    Then having ai generate code for my project didn't feel good either, I didn't really understand what it was doing so I would have to read it to understand, then what is the purpose, I might as well write it.

    I then started playing, and out came a new type of programming language called plang (as in pseudo language). It allows you to write the details without all the boiler code.

    I'm think I've stumbled on to something, and just starting to get noticed :) https://www.infoworld.com/article/3635189/11-cutting-edge-pr...

  • by benatkin on 2/4/25, 4:29 PM

    A chat room is an activity stream and so is a commit log of a version control system. A lot of the bad UI is waiting a fixed amount of time that had a minimum that was too high, and for some communicating by typing. Many will prefer chatting by voice. When responses are faster it will be easier to hide the history pane and ask if you need to be reminded of anything in the history pane and use the artifact pane. However not all responses from an LLM need be fast, it is a huge advancement that LLMs will think for minutes at a time. I agree about the usefulness of prose as an artifact while coding. Markdown can be edited in IDEs using LLMs and then referenced in prompts.
  • by bangaladore on 2/4/25, 7:16 PM

    I'll preface this by saying I also dislike using chat as a pattern for AI tools. However, in theory, the idea has merit. Just as having 100% of the specifications and design guidance for a product is valuable before development, complete requirements would seem ideal. In reality, though, many requirements and specifications are living documents. Should we expect to rebuild the entire application every time a document changes? For example, if I decide to reduce a header's height, there's a significant chance the application could end up looking or feeling entirely different.

    In a real-world scenario, we begin with detailed specifications and requirements, develop a product, and then iterate on it. Chat-based interactions might be better suited to this iterative phase. Although I'm not particularly fond of the approach, it does resemble receiving a coworker's feedback, making a small, targeted change, and then getting feedback again.

    Even if the system were designed to focus solely on the differences in the requirements—thus making the build process more iterative—we still encounter an issue: it tends to devolve into a chat format. You might have a set of well-crafted requirements, only for the final instruction to be, "The header should be 2px smaller."

    Nonetheless, using AI in an iterative process (focusing on requirement diffs, for example) is an intriguing concept that I believe warrants further exploration.

  • by yapyap on 2/4/25, 7:59 PM

    > AI was supposed to change everything. Finally, plain English could be a programming language—one everyone already knows. No syntax. No rules. Just say what you want

    That’s the thing about language, you CAN’T program in human language for this exact reason, whereas programming languages are mechanical but precise, human languages flow better but they leave wiggle room. Computers can’t do jack shit with wiggle room, they’re not humans. That’ll always remain, until there’s an AI people like enough to have it’s own flair on things.

  • by r0ckarong on 2/4/25, 4:31 PM

    I don't want to become a lawyer to talk to my compiler; thank you.
  • by PaulHoule on 2/4/25, 8:24 PM

    It makes me think of the promises and perils of Jupyter notebooks.

    So far as this article is concerned (not the many commenters who are talking past it), "chat" is like interacting with a shell or a REPL. How different is the discussion that Winograd has with SHRDLU

    https://en.wikipedia.org/wiki/SHRDLU

    with the conversation that you have with a database with the SQL monitor really?

    There's a lot to say for trying to turn that kind of conversation into a more durable artifact. I'd argue that writing unit tests in Java I'm doing exploratory work like I'd do in a Python REPL except my results aren't scrolling away but are built into something I can check into version control.

    On the other hand, workspace-oriented programming environments are notorious for turning into a sloppy mess, for instance people really can't make up their mind if they want to store the results of their computations (God help you if you have more than one person working on it, never mind if you want to use version control -- yet, isn't that a nice way to publish a data analysis?) or if they want to be a program that multiple people can work, can produce reproducible results, etc.

    See also the struggles of "Literate Programming"

    Not to say there isn't an answer to all this but boy is it a fraught area.

  • by cheapsteak on 2/4/25, 5:44 PM

    I'm predicting that Test-Driven Development may be having a comeback

    English behaviour descriptions -> generated tests

    Use both behaviour descriptions and feedback from test results to iterate on app development

  • by Bjorkbat on 2/4/25, 8:12 PM

    This kind of reminds me of back when the hype cycle was focused on Messenger apps and the idea of most online behavior being replaced with a chatbot. God I hated the smug certainty of (some, definitely not all!) UX designers at the time proclaiming that chat was the ultimate interface.

    Absolutely insane that all the doors unlocked by being able to interact with a computer graphically, and yet these people have visions of the future stuck in the 60s.

  • by furyofantares on 2/4/25, 4:41 PM

    In cursor I keep a specification document in .cursorrules and I have instructions that cursor should update the document whenever I add new specifications in chat.
  • by bcherry on 2/4/25, 6:02 PM

    Chat is a great UX _around_ development tools. Imagine having a pair programmer and never being allowed to speak to them. You could only communicate by taking over the keyboard and editing the code. You'd never get anything done.

    Chat is an awesome powerup for any serious tool you already have, so long as the entity on the other side of the chat has the agency to actually manipulate the tool alongside you as well.

  • by reverendsteveii on 2/4/25, 4:55 PM

    This puts me in mind of something I read years ago and am having trouble finding that basically had the same premise but went about proving it a different way. The idea was that natural language programming is always going to mean dealing with a certain background level of ambiguity, and the article cited contracts and contract law as proof. Basically, a contract is an agreement to define a system with a series of states and a response for each state defined, and the vast and difficult-to-navigate body of contract law is proof that even when purposefully being as unambiguous as possible with two entities that fully grasp the intricacies of the language being used there is so much ambiguity that there has to be an entire separate group of people (the civil court system) whose only job it is to mediate and interpret that ambiguity. You might point to bad-faith actors but a contract where every possible state and the appropriate response are defined without ambiguity would be proof against both misinterpretations and bad faith actors.
  • by anarticle on 2/4/25, 5:10 PM

    I agree, and I think this means there is a lot of space for trying new things. I think cursor was a small glimpse in trying to fix the split between purely GitHub copilot line revision (this interrupts my thoughts too much) and calling in for help via a chat window that you're copying and pasting from.

    I think this post shows there could be a couple levels of indirection, some kind of combination of the "overarching design doc" that is injected into every prompt, and a more tactical level syntax/code/process that we have with something like a chat window that is code aware. I've definitely done some crazy stuff by just asking something really stupid like "Is there any way to speed this up?" and Claude giving me some esoteric pandas optimization that gave me a 100x speedup.

    I think overall the tools have crazy variance in quality of output, but I think with some "multifacet prompting", ie, code styling, design doc, architect docs, constraints, etc you might end up with something that is much more useful.

  • by hoppp on 2/4/25, 5:17 PM

    Every time there is a chat interface for something I try to use it, then after 1-2 prompts I give up.

    So I completely agree with this. Chat is not a good UI

  • by Vox_Leone on 2/4/25, 5:18 PM

    I call it 'structured prompting' [think pseudo-code]. It strikes a nice balance between human-readable logic and structured programming, allowing the LLM to focus on generating accurate code based on clear steps. It’s especially useful when you want to specify the what (the logic) without worrying too much about the how (syntax and language-specific details). If you can create an effective system that supports this kind of input, it would likely be a big step forward in making code generation more intuitive and efficient. Good old UML could also be used.

    Example of a Structured Pseudo-Code Prompt:

    Let’s say you want to generate code for a function that handles object detection:

    '''Function: object_detection Input: image Output: list of detected objects

    Steps: 1. Initialize model (load pretrained object detection model)

    2. Preprocess the image (resize, normalize, etc.)

    3. Run the image through the model

    4. Extract bounding boxes and confidence scores from the model's output

    5. Return objects with confidence greater than 0.5 as a list of tuples (object_name, bounding_box)

    Language: Python'''

  • by xenodium on 2/4/25, 8:11 PM

    I've been experimenting with "paged chats" UX and find myself using it fairly fluidly https://xenodium.com/llm-iterate-and-insert

    Been experimenting with the same approach but for "paged shells" (sorry for the term override) and this seems to be a best of both worlds kinda thing for shells. https://xenodium.com/an-experimental-e-shell-pager That is, the shell is editable when you need it to be (during submission), and automatically read-only after submission. This has the benefit of providing single-character shortcuts to navigate content. n/p (next/previous) or tab/backtab.

    The navigation is particularly handy in LLM chats, so you can quickly jump to code snippets and either copy or direct output elsewhere.

  • by muzani on 2/4/25, 11:36 PM

    The first wave was not chat, it was completion. Instead of saying "suggest some names for an ice cream shop", the first wave was "Here are some great names for ice cream shops: 1. Nice Cream 2." Chat was a lot more intuitive and low effort than this.

    Chat is also iterative. You can go back there and fix things that were misinterpreted. If the misinterpretation happens often, you can add on another instruction on top of that. I strongly disagree that they'd be fixed documents. Documents are a way to talk to yourself and get your rules right before you commit to them. But it costs almost nothing to do this with AI vs setting up brainstorming sessions with another human.

    However, the rational models (o1, r1 and such) are good at iterating with themselves, and work better when you give them documents and have them figure out the best way to implement something.

  • by vismit2000 on 2/5/25, 5:47 AM

    Reminds of Karpathy's pinned tweet since a long time: "The hottest new programming language is English." https://x.com/karpathy/status/1617979122625712128
  • by fhd2 on 2/4/25, 5:53 PM

    I've mainly used gptel in Emacs (primarily with Claude), and I kind of already use the chat buffer like a document. You can freely edit the history, and I make very generous use of that, to steer where the model is going.

    It has features to add context from your current project pretty easily, but personally I prefer to constantly edit the chat buffer to put in just the relevant stuff. If I add too much, Claude seems to get confused and chases down irrelevant stuff.

    Fully controlling the context like that seems pretty powerful compared to other approaches I've tried. I also fully control what goes into the project - for the most part I don't copy paste anything, but rather type a version of the suggestion out quickly.

    If you're fast at typing and use an editor with powerful text wrangling capabilities, this is feasible. And to me, it seems relatively optimal.

  • by weitendorf on 2/4/25, 6:54 PM

    This is exactly why we're developing our AI developer workflow product "Brilliant" to steer users away from conversations altogether.

    Many developers don't realize this but as you go back and forth with models, you are actively polluting their context with junk and irrelevant old data that distracts and confuses it from what you're actually trying to do right now. When using sleeker products like Cursor, it's easy to forget just how much junk context the model is constantly getting fed (from implicit RAG/context gathering and hidden intermediate steps). In my experience LLM performance falls off a cliff somewhere around 4 decent-sized messages, even without including superfluous context.

    We're further separating the concept of "workflow" from "conversation" and prompts, basically actively and aggressively pruning context and conversation history as our agents do their thing (and only including context that is defined explicitly and transparently), and it's allowing us to tackle much more complex tasks than most other AI developer tools. And we are a lot happier working with models - when things don't work we're not forced to grovel for a followup fix, we simply launch a new action to make the targeted change we want with a couple clicks.

    It is in a weird way kind of degrading to have to politely ask a model to change a color after it messed up, and it's also just not an efficient way to work with LLMs - people just default to that style because it's how you'd interact with a human you are delegating tasks to. Developers still need to truly internalize the facts that LLMs are purely completion machines, that your conversation history lives entirely client side outside of active inference, and that you can literally set your conversation input to be whatever you want (even if the model never said that) - after that realizing you're on the path towards using LLMs like "what words do I need to put it in to get it to do what I want" rather than working "with" them.

  • by jimlikeslimes on 2/4/25, 5:12 PM

    Has anyone invited an LLM inside their lisp process that can be accessed from the repl? Being able to empower an LLM to be able to affect the running lisp image (compile functions etc), and having changes reflected back to the source on disk would be interesting.
  • by fullstackchris on 2/5/25, 11:37 AM

    I agree with what is discussed in this post a lot. Chat is indeed a lossy and at best q/a style system. Yes, of course it can be tweaked and hacked but it's ultimately the wrong UI if you want to leverage the most of AI for code.

    However, even with a "docs as spec" pattern, how can you control the actual quality of the code written? How maintainable will it be? If the spec changes (read: it _will_ change constantly), is it easy enough to refactor? What about tests? I also shrink in fear at the complexity of docs that could be _exactly_ captured as code... "well we almost always do it this way, but this one time we do it this way..."

  • by lcfcjs6 on 2/4/25, 4:29 PM

    Seems like this is a common complaint from folks trying to write code purely with ChatGPT / Deepseek by communicating in complete sentences. You can only get so far using these tools before you need a proper understanding of whats happening with the code.
  • by sprucevoid on 2/4/25, 7:27 PM

    I find web chat interfaces very useful for programming, but it also feels like early days. Speedups will smooth out a lot of pain points. But other UI changes, some even quite small, could enhance use a lot. A few of various size top of mind with regard to Claude web chat UI specifically:

    - intellisense in the inputbox based on words in this or all previous chats and a user customizable word list

    - user customizable buttons and keyboard shortcuts for common quick replies, like "explain more".

    - when claude replies with a numbered list of alternatives let me ctrl+click a number to fork the chat with continued focus on that alternative in a new tab.

    - a custom right click menu with action for selection (or if no selection claude can guess the context e.g. the clicked paragraph) such as "new chat with selection", "explain" and some user customizable quick replies

    - make the default download filenames follow a predicable pattern, claude currently varies it too much e.g. "cloud-script.py" jumps to "cloud-script-errorcheck.py". I've tried prompting a format but claude seems to forget that.

    - the stop button should always instantly stop claude in its tracks. Currently it sometimes takes time to get claude to stop thinking.

    - when a claude reply first generates code in the right sidebar followed by detailed explanation text in the chat, let some keyboard shortcut instantly stop the explanation in its tracks. Let the same shortcut preempt that explanation while the sidebar code is still generating.

    - chat history search is very basic. Add andvanced search features, like filter by date first/last message and OR search operator

    - batch jobs and tagging for chat history. E.g. batch apply a prompt to generate a summary in each selected chat and then add the tag "summary" to them. Let us then browse by tag(s).

    - tools to delete parts of a chat history thread, that in hindsight were detours

    - more generally, maybe a "chat history chat" to have Claude apply changes to the chat histories

  • by azhenley on 2/4/25, 4:46 PM

    See my article from January 2023, "Natural language is the lazy user interface".

    https://austinhenley.com/blog/naturallanguageui.html

  • by andix on 2/4/25, 7:43 PM

    My approach for AI generated code with more complexity was always this:

    1. Ask AI to generate a spec of what we're planning to do. 2. Refine it until it's kind of resembling what I want to do 3. Ask AI to implement some aspects from the spec

  • by michaelfeathers on 2/4/25, 9:56 PM

    Chat in English? Sure. But there is a better way. Make it a game to see how little you can specify to get what you want.

    I used this single line to generate a 5 line Java unit test a while back.

    test: grip o -> assert state.grip o

    LLMs have wide "understanding" of various syntaxes and associated semantics. Most LLMs have instruct tuning that helps. Simplifications that are close to code work.

    Re precision, yes, we need precision but if you work in small steps, the precision comes in the review.

    Make your own private pidgin language in conversation.

  • by yawnxyz on 2/4/25, 5:35 PM

    It's interesting we view Email and Chat so differently. Some companies run on chat (e.g. Slack), while most companies run on email.

    Emails are so similar to Chat, except we're used to writing in long-form, and we're not expecting sub-minute replies.

    Maybe emails are going to be the new chat?

    I've been experimenting with "email-like" interfaces (that encourage you to write more / specify more), take longer to get back to you, and go out to LLMs. I think this works well for tools like Deep Research where you expect them to take minutes to hours.

  • by nimski on 2/4/25, 5:35 PM

    This has been the thesis behind our product since the beginning (~3 years), before a lot of the current hype took hold. I'm excited to see it gain more recognition.

    Chat is single threaded and ephemeral. Documents are versioned, multi-threaded, and a source of truth. Although chat is not appropriate as the source of truth, it's very effective for single-threaded discussions about documents. This is how people use requirements documents today. Each comment on a doc is a localized chat. It's an excellent interface when targeted.

  • by foz on 2/4/25, 5:30 PM

    After using Cursor and Copilot for some time, I long for a tool that works like a "real" collaborator. We share a spec and make comments, resolve them. We file issues and pull requests and approve them. We use tests and specs to lock down our decisions. We keep a backlog up to date, maintain our user docs, discuss what assumptions we have to validate still, and write down our decisions.

    Like with any coworker - when ideas get real, get out of chat and start using our tools and process to get stuff done.

  • by a3w on 2/4/25, 4:24 PM

    For me: Chat is like writing comments, but not a the right place in the source code.

    Perhaps I should comment all todos and then write "finish todos" as the always-same text prompt.

  • by chaisan on 2/5/25, 1:25 AM

    The workflow of starting with and iterating on a PRD is already taking off: https://x.com/daniel_mac8/status/1871372159748079893 https://skylarbpayne.com/posts/cursed-cursor
  • by firefoxd on 2/4/25, 6:44 PM

    That's also why AGI as defined today is an interface problem. Imagine we've actually achieved it, and the interface is a chat prompt. It will be really hard to differentiate it with the current tools we have.

    For writing, the canvas interface is much more effective because you rely less on copy and paste. For code, even with the ctrl+i method, it works but it's a pain to have to load all other files as reference every single time.

  • by nayuki on 2/4/25, 6:05 PM

    This current post is a good rebuttal to the killed post this morning: https://news.ycombinator.com/item?id=42933031 "Programmers are modern-day computers", https://jtlicardo.com/writing/modern-day-computers
  • by josefrichter on 2/4/25, 4:39 PM

    I think everyone is aware that chat is not the ideal UI pattern for this. It's just the way current AI models work and generate content - that's why they have this "typewriter" mode, which naturally leads to a chat interface.

    It's not really a conscious choice, but rather a side effect. And we already see the trend is away from that, with tools like chatGPT Canvas, editors like Windsurf, etc.

  • by jfkrrorj on 2/4/25, 4:45 PM

    No, it is pretty much dialog, I would compare it to pair programming.

    AI in many levels is more capable than human programmer, in some it is not. It is not supersmart. It can not hold entire program in its head, you have to feed it small relevant section of program.

    》 That’s why we use documents—they let us organize complexity, reference specific points, and track changes systematically.

    Extra steps. Something like waterfall...

  • by tgraf_80 on 2/4/25, 5:42 PM

    Truly speaking, you can use AI for a little bit higher abstraction and ambiguity, but not much. For instance, if you need an iteration over an array and you want to do a very specific aggregation you can instruct AI to write this loop but you yourself need to understand exactly what it’s doing and have a very clear idea how this code snippet fits into the larger picture
  • by kmarc on 2/4/25, 5:32 PM

    Look, deleting the inside of () parens in a function call makes total sense by instructing your editor to "delete inside parenthesis", or in vim:

        di(
    
    Yet, millions of programmers use their mouse to SELECT first something visually and THEN delete whatever was selected. Shrug.

    I won't be surprised if chat-based programming will be the next way of doing stuff.

  • by grumbel on 2/4/25, 6:23 PM

    Plenty of software has been developed on the command line, chat is just a more powerful and flexible version of that. The missing part with current AI systems is a persistent workspace/filesystem that allows you to store things you want to keep, discard things you want to get rid off and highlight things you want to focus on.
  • by darepublic on 2/4/25, 6:36 PM

    Need interactive chat for coding. You say something high level, the model prompts you for low level decisions etc. every once in a while the code bot can send a screenshot or some test results so we stay grounded in where we are in the process. This can enable coding while I'm driving or sitting stoned on the couch.
  • by daxfohl on 2/4/25, 10:15 PM

    I agree with everything except there being a business opportunity there. Whatever ends up being the fix, all the big players will incorporate it into their own IDEs within a couple months. Unless the fix is something that is significantly different from an IDE, that incorporating it into one doesn't make sense.
  • by cratermoon on 2/5/25, 3:22 PM

    Is chat a good UI for anything besides, well, chatting between people? Joseph Weizenbaum chose a chat interface for ELIZA because he expressly wanted to ape Rogerian therapy. Are we doomed to imagine AI is "chatting with a computer" because no one can imagine anything else?
  • by orand on 2/4/25, 8:43 PM

    Chat as a bad UI pattern for development tools is like saying language is a bad UI pattern for thought.
  • by randomNumber7 on 2/4/25, 9:14 PM

    For me the LLM does 3 things. Since it is trained on pattern matching it performs well on these. The tree like chatgpt inerface (where you can change the questions) is perfect imo.

    - Speed up literature recherche

    - replace reading library documentation

    - generate copy pasta code that has been written often before

  • by ansonhw on 2/4/25, 7:31 PM

    I agree actually that chat is overrated overall as UX. It works really well for chatgpt but creates the wrong ux expectations for users where more precision or constraint is needed. Also not good for large processing. It was magic though for chatgpt.
  • by ypyrko on 2/4/25, 5:36 PM

    100% agree. I had the same issue when it comes to text editing and I created this tool https://www.potext.com I love to have a full control over AI suggestions
  • by m3kw9 on 2/4/25, 4:22 PM

    There is a black box effect between when you press enter and it starts updating code in multiple places. Like wtf just happened, I have to find these changes, which code broke dependencies. It should be more step wise visually
  • by whatsakandr on 2/4/25, 5:01 PM

    The nice thing about chat is it's open ended, the terrible thing is that holy crap I have to write a paragraph describing exactly what I want when I should just be able to hit a button or navigate through a couple menus.
  • by suralind on 2/4/25, 11:06 PM

    I love how Zed integrates the chat into IDE. You can essentially edit any part of the history, remove or redo the prompt. I just started to use this feature a couple of days ago and I couldn’t be happier.
  • by jes5199 on 2/5/25, 12:12 AM

    I'm looking forward to a Cursor-like experience but using voice - so we can discuss code changes verbally, as if we were pair programming. honestly I'm not sure why we don't have that yet.
  • by RyanAdamas on 2/4/25, 7:13 PM

    The chat interface modality is a fleeting one in the grand scheme. Billion token context windows with recursive Ai production based on development documentation and graphics is likely the next iteration.
  • by nbzso on 2/4/25, 7:04 PM

    Is there a statistical data on adoption of AI chatbots in the industry? I see a lot of personal demos, small projects and nobody is talking about serious integration into production and useful patterns.
  • by 6h6j65j76k on 2/4/25, 5:44 PM

    "Current AI tools pretend writing software is like having a conversation. "

    But that is true? Devs spend more time in meetings than writing code. Having conversations about the code they are going to write.

  • by sramam on 2/4/25, 9:21 PM

    Interesting take. Completely agree that product requirements document is a good mental models for system description. However, aren't bug-reports+PRs approximating a chat-interface?
  • by anoncow on 2/4/25, 5:34 PM

    There should be a smart way of merging all the chat messages into a streamlined story of the development on the fly. Perhaps something an AI could do. We could call it contextAI.
  • by remoquete on 2/4/25, 5:45 PM

    I'm intrigued by the conclusion. Docs-as-code, this time turning actual documentation and requirements into code? So, specifications? Back to OpenAPI?

    Back to... programming languages? :)

  • by icapybara on 2/4/25, 6:36 PM

    This feels like those arguments that text is the worst way to code and we actually need a no code solution instead.

    Theoretically maybe, but chat windows are getting the job done right now.

  • by arnaudsm on 2/4/25, 6:33 PM

    Considering how good real time voice chat with LLMs is now (gpt4o and Gemini 2.0), I'm haven't seen anyone try to integrate them into programming tools.

    It could be quite fun !

  • by empath75 on 2/4/25, 4:20 PM

    I think he's right that there's a place for a more structured AI programming UI, but chat and autocomplete are also good for a lot of use cases.
  • by mhh__ on 2/5/25, 2:52 AM

    I had a daydream about programming with an LLM as something more like driving a car than typing, e.g. constant steering, changing gears and so on
  • by notatoad on 2/5/25, 4:59 AM

    i like chat. all the more dedicated ai development tools try to force the author's (or somebody's) specific workflow, and fall into an uncanny valley sort of situation for my workflow, where it highlights all the bits of my workflow that don't match the tool's desired workflow.

    chat is so drastically far away from my workflow that it doesn't feel like my workflow is wrong.

  • by Apocryphon on 2/4/25, 5:10 PM

    > When your intent is in a document instead of scattered across a chat log, English becomes a real programming language

    So, something like Gherkin?

  • by shireboy on 2/4/25, 5:46 PM

    Yeah, I've landed on similar, although I wouldn't say it's bad for all dev scenarios. For small tweaks, or cases where I want a junior dev to do something I say explicitly ("add a bootstrap style input field for every property on #somemodel") chat works fine.

    For higher-level AI assist, I do agree chat is not what makes sense. What I think would be cool is to work in markdown files, refining in precise plain english each feature. The AI then generates code from the .md files plus existing context. Then you have well-written documentation and consistent code. You can do this to a degree today by referencing a md file in chat, or by using some of the newer tools, but I haven't seen exactly what I want yet. (I guess I should build it?)

  • by kristofferR on 2/4/25, 8:33 PM

    "You don’t program by chatting. You program by writing documents.", or long chat prompts as they are also called.
  • by gunalx on 2/4/25, 9:57 PM

    This just seem more cumbersome than just writing the software to begin with.

    Its a problem of programming languages and definitions.

  • by kerblang on 2/4/25, 4:29 PM

    English is a terrible programming language.
  • by aantix on 2/4/25, 7:33 PM

    The only LLM agent I've seen who asked any sort of clarifying questions about design was Devin.
  • by indymike on 2/4/25, 4:40 PM

    How else do you interact with a chat based ai? It may not be ideal, but it is an improvement.
  • by jillesvangurp on 2/5/25, 7:51 AM

    Not just for development tools. It's a bad UI for a lot of use cases. It's merely what made sense for UX challenged researchers when they had to come up with a UI for their LLMs in a hurry. Discord was there and reasonably easy to integrate. Many tools started out as just that. Fast forward and most tools are kind of standalone versions of the same thing.

    The challenge is that I haven't seen anything better really.

    Lately the innovation comes mainly from deeper integration with tools. Standalone AI editors are mainly popular with people who use relatively simple editors (like VS Code). VS Code has a few party tricks but for me swapping out Intellij for something else on a typical Kotlin project is a complete non starter. Not going to happen. I'd gain AI, but I'd loose everything else that I use all the time. That would be a real productivity killer. I want to keep all the smart tooling I already have and have used for years.

    There are a few extensions for intellij but they are pretty much all variations of a sidebar with a chat and autocomplete. Autocomplete competes with normal autocomplete, which I use all the time. And the clippy style "it looks like you are writing a letter" style completions just aren't that useful too me at all. They are just noise and break my flow. And they drown out the completions I use and need all the time. And sidebars just take up space and copying code from there back to your editor is a bit awkward as UX

    Lately I've been using chat gpt. It started out pretty dumb but these days I can option+shift+1 in a chat and have it look over my shoulder at my current editor. "how do I do that?" translates into a full context with my current editing window, cursor & selected text, etc. all in the context. Before I was copy pasting everything and the kitchen sync to chat gpt, now it just tells me what I need to do. The next step up from this is that it starts driving the tools itself. They already have a beta for this. This deeper integration is what is needed.

    A big challenge is that most of these tools are driven to minimize cost and context size. Tokens cost money. So chat GPT only looks at my active editor and not at the 15 other files I have open. It could. But it doesn't. It's also unaware of my project structure, or the fact that most of my projects are kotlin multiplatform and can't use JVM dependencies. So, in that sense, every chat still is a bit ground hog day. It's promise to "remember" stuff when you ask it too is super flaky. It forgets most things it's supposed to remember pretty quickly.

    These are solvable problems of course. But it's useful to me for debugging, analyzing, completing functions, etc.

  • by stevage on 2/4/25, 9:59 PM

    Boy does this feel like the author has never actually used any AI tools for writing code.
  • by proc0 on 2/4/25, 5:14 PM

    This is lowkey cope. AI should be like talking to another human, at least that is the promise. Instead we're getting glorified autocomplete with padded language to sound like a human.

    In its current form LLMs are pretty much at their limit, barring optimization and chaining them together for more productivity once we have better hardware. Still, it will just be useful for repetitive low level tasks and mediocre art. We need more breakthroughs beyond transformers to approach something that creates like humans instead of using statistical inference.

  • by jpcom on 2/5/25, 12:30 AM

    So you're saying a spec [specification] is the solution to building programs.
  • by karaterobot on 2/4/25, 5:07 PM

    I don't know about this. He admits you can write prototype code with chat-based LLMs, but then says this doesn't matter, because you can't write extremely complex applications with them.

    First of all, most people can't write extremely complex applications, period. Most programmers included. If your baseline for real programming is something of equivalent complexity as the U.S. tax code, you're clearly such a great programmer that you're an outlier, and should recognize that.

    Second of all, I think it's a straw man argument to say that you can either write prototype-level code with a chat UI, or complex code with documents. You can use both. I think the proposition being put forward is that more people can write complex code by supplementing their document-based thinking with chat-based thinking. Or, that people can write slightly better-than-prototype level code with the help of a chat assistant. In other words, that it's better to have access to AI to help you code small sections of a larger application that you are still responsible for.

    I'd be more interested in reading a good argument against the value of using chat-based AI as another tool in your belt, rather than a straight-up replacement for traditional coding. If you could make that argument, then you could say chat is a bad UI pattern for dev tools.

  • by synergy20 on 2/4/25, 5:59 PM

    what about organizing chats into documents by chatting: keep track of the chats and build up a design doc

    or the other way around,give AI a design doc and generate what you want,this is still chatting, just more official and lengthy

  • by nektro on 2/5/25, 5:50 AM

    got so close to the point and then flew right past it. AI is never going to be great for this because the limit of "prompt engineering" just loops right back to normal programming
  • by debacle on 2/4/25, 9:09 PM

    I quite like it. Meta AI has become a good programming companion.
  • by kordlessagain on 2/4/25, 6:39 PM

    A terminal prompt has worked great for me for years…
  • by Havoc on 2/4/25, 6:13 PM

    Chat seems flawed but I don’t see how a document is better.

    I don’t buy that a document could capture what is needed here. Imagine describing navigating through multiple levels of menus in document form. That sounds straight up painful even for trivial apps. And for a full blown app…nope

    There is a whole new paradigm missing there imo

  • by williamcotton on 2/4/25, 9:04 PM

    It’s like writing laws.

    Vague and prone to endless argument?

  • by fragmede on 2/4/25, 5:58 PM

    > People call them “great for prototyping,” which means “don’t use this for anything real.”

    Eh, that's just copium because we all have a vested monetary interest in them not being useful for "anything real", whatever that means. If it turns out that there useful for "real things", then then entire industry would get turned on its head. (hint: they're useful for "real" things), though putting the entire codebase into the context window doesn't currently work. Aider works past this by passing the directory tree and filenames as context, so the LLM guess that /cloud/scope/cluster.go is where the cluster scope code lives and ask for that specific file to get added to the context and you can ask it to add, say, logging code to that file.

  • by tommiegannert on 2/4/25, 6:04 PM

    I'm in the business of data collection, to some extent: building a support system for residential solar panel installations. There's a bunch of data needed for simulations, purchase estimations, legal and tax reasons. Not insane amounts, but enough that filling out a form feels tedious. LLMs are great in that they can be given a task to gather a number of pieces, and can explain to the user what "kWh" means, at many level of technical depth.

    We play around with LLMs to build a chat experience. My first attempt made Claude spew out five questions at a time, which didn't solve the "guiding" problem. So I started asking it to limit the number of unanswered questions. It worked, but felt really clunky and "cheap."

    I drew two conclusions: We need UI builders for this to feel nice, and professionals will want to use forms.

    First, LLMs would be great at driving step-by-step guides, but it must be given building blocks to generate a UI. When asking about location, show a map. When deciding to ask about TIN or roof size, if the user is technically inclined, perhaps start with asking about the roof. When asking about the roof size, let the user draw the shape and assign lengths. Or display aerial photos. The result on screen shouldn't be a log of me-you text messages, but a live-updated summary of where we are, and what's remaining.

    Second, professionals have incentive to build mental model for navigating complex data structures. People who have no reason to invest time into the data model (e.g. a consumer buying a single solar panel installation in ther lifetime,) will benefit from rich LLM-driven UIs. Chat UIs might create room for a new type of computer user who doesn't use visual clues to build this mental model, but everyone else will want to stay on graphics. If you're an executive wondering how many sick days there were last month, that's a situation where a BI LLM RAG would be great. But if you're not sure what your question is, because you're hired to make up your own questions, then pointing, clicking and massaging might make more sense.

  • by randomcatuser on 2/4/25, 6:09 PM

    chat=repl?

    doc=programming in a DSL? / (what was that one language which was functional & represented in circles in a canvas?)

  • by newsyco21 on 2/4/25, 4:35 PM

    generated ai is cancer
  • by SuperHeavy256 on 2/4/25, 8:16 PM

    so, a really long text? that's your big revelation?
  • by waylonchang on 2/5/25, 1:28 AM

    all this just to say "prompt neatly" :)
  • by waychang on 2/5/25, 1:28 AM

    all this just to say "prompt neatly" :)
  • by h1fra on 2/4/25, 4:42 PM

    Chat is a bad UI.
  • by dehugger on 2/4/25, 7:49 PM

    LLM generated code seems to depend wildly on if the project is about something that a bunch of people have already put out on GitHub or not.

    Writing a crud web API? Great! Writing business logic for a niche edge case in a highly specialized domain? Good luck.

  • by sebastianconcpt on 2/4/25, 6:24 PM

    Yep.
  • by thomastjeffery on 2/4/25, 5:38 PM

    Chat is a bad interface for tools in general, but this problem goes deeper than that.

    What's a good interface?

    There are a few things we try to balance to make a good UI/UX:

    - Latency: How long it takes to do a single task

    - Decision-tree pathing: How many tasks to meet a goal

    - Flexibility/Configurability: How much of a task can be encapsulated by the user's predefined knowledge of the system

    - Discoverability: What tasks are available, and where

    The perfect NLP chat could accomplish some of these:

    - Flexibility/Configurability: Define/infer words and phrases that the user can use as shortcuts

    - Decision-tree pathing: Define concepts that shortcut an otherwise verbose interaction

    - Latency: Context-aware text-completions so the user doesn't need to type as much

    - Discoverability: Well-formed introductions and clarifying questions to introduce useful interaction

    This can only get us so far. What better latency can be accomplished than a button or a keyboard shortcut? What better discoverability than a menu?

    The most exciting prospect left is flexibility. Traditional software is inflexible. It can only perform the interaction it was already designed with. Every design decision becomes a wall of assumption. These walls are the fundamental architecture of software. Without them, we would have nothing. With them, we have a structure that guides us along whatever assumptions were already made.

    If we want to change something about our software's UI, then we must change the software itself, and that means writing. If NLP was a truly solved problem, then software compatibility and flexibility would be trivialized. We could redesign the entire UI by simply describing the changes we want.

    LLMs are not even close. Sure, you can get one to generate some code, but only if the code you want generated is close enough to the text it was already trained on. LLMs construct continuations of tokens: no more, no less. There is no logic. There is no consideration about what is right or wrong: only what is likely to come next.

    Like you said,

    > You can’t build real software without being precise about what you want.

    This is the ultimate limitation of UI. If only we could be ambiguous instead! LLMs let us do that, but they keep that ambiguity permanent. There is no real way to tie an LLM back down to reality. No logic. No axioms. No rules. So we must either be precise or ambiguous. The latter option is an exciting development, and certainly offers its own unique advantages, but it isn't a complete solution.

    ---

    I've been thinking through another approach to the ambiguity problem that I think could really give us the expressive power of natural language, while preserving the logical structure we use to write software (and more). It wouldn't solve the problem entirely, but it could potentially move it out of the way.

  • by talles on 2/4/25, 9:11 PM

    Imagine if instead of English, someone invented some sort of computer language that is precise and produces the same result every time you execute it.
  • by mehakkar3006 on 2/5/25, 11:05 PM

    hi