from Hacker News

Ask HN: What can LLMs not do?

by ynniv on 12/27/24, 4:04 PM with 60 comments

Hey guys. I'm getting a bit grey in the beard, but not quite there yet, and I'm really interested in what's going on with AI. Twelve years ago I had been reading Kurzweil and criticisms that people don't understand exponential growth, so I plotted some data and made a bet with my friends that we would have autonomous-only highway lanes in 2024. For a decade I watched what felt like glacial growth, but in the end I think that I was more right than wrong.

Now I'm watching LLMs empirically, and they too seem to be moving faster than I expected. I used to have a solid list of things that separated people from AI, but it's mostly crossed off at this point. I'd like some fresh lines in the sand, and since HN is both technical and cynical, this seems like the perfect place to ask.

I do have a couple of caveats: I'm not interested in intentional or form factor limitations. You can easily add stateful memory, or a robotic body. I'm also trying to avoid subjective measures, so even if it isn't "good", an LLM can already write a book. Finally, I'd prefer to focus on things that are testable. It's tempting to say they won't be conscious or have a soul, but I don't see how I would ever be able to cross those off.

That said, let's see what you've got!

  • by idontwantthis on 12/27/24, 4:24 PM

    LLMs don't have metacognition. The biggest problem with them that I've encountered is that they clearly don't understand what they are saying. This means when they are wrong, or shouldn't provide an answer because they don't actually know, it's unlikely to give a good answer. You can easily trick them too because all you're doing is priming a word generator. You can tell them false things are true and then they don't tell the truth anymore. There is no way for them to "realize" something is wrong and correct themselves.
  • by valbaca on 12/27/24, 6:28 PM

    They cannot provide information that is 100.0% accurate to the source where there is NO acceptable amount of hallucination; talking legal, compliance, etc.

    They’re so so bad at saying “I don’t know” or “no results”. They’re still sycophants.

  • by mikewarot on 12/27/24, 5:38 PM

    I've seen the appearance of PCs, and the impact of 4 decades of Moore's law on price/performance. With O1 and now O3, the main limits of what you can do with an LLM seem to center on just how much compute you can give it.

    I hope to offer a few more decades of performance boost to the show, with Isolinear Memory[1], to help dissolve those limitations. ;-)

    [1] https://bitgrid.blogspot.com/2024/09/bitgrid-and-isolinear-m...

  • by fzzzy on 12/27/24, 5:40 PM

    Currently, they can't say "I don't know." I think it may be possible to fix this.
  • by jitl on 12/27/24, 5:24 PM

    It’s great at doing stuff that is routine, well-understood, done and documented and discussed many times before. “Boilerplate”. But also summarization, brainstorming, assisting a human unfamiliar with the body of knowledge.

    It is bad at doing novel things or subtlety. For example when I ask for help with Postgres queries, LLMs frequently give me made-up syntax or combine features from several different databases like Oracle, MS-SQL, Postgres into a single answer.

  • by SirMaster on 12/27/24, 4:21 PM

    Apparently they can’t stop hallucinating.
  • by admissionsguy on 12/27/24, 6:32 PM

    Consistently counting the occurrences of letter R in the word strawberry (or strawberrystrawberry).

    Solving freshman physics and maths problems that are a variation of common problems (since they will solve the common problem and not the variation). My experience with this makes me completely confident that all the claims about OpenAI models solving whatever maths problems they claim to be solving are bogus. "Testing on training is all you need" basically.

  • by andersco on 12/27/24, 4:28 PM

    They seem incapable of saying “I don’t know.”
  • by jonifico on 12/27/24, 4:35 PM

    They can't draw vector illustrations.
  • by not_your_vase on 12/27/24, 5:16 PM

    They are still a statistical machines, without any ability for actual comprehension and interpretation. They try to give an output that statistically matches best the question/task - regardless of the correctness of it.

    It gets very obvious (and annoying) when you ask some not trivial technical questions.

  • by jqpabc123 on 12/27/24, 7:34 PM

    LLMs can't provide fast, reliable answers at low cost.

    Instead, they offer possibilities of questionable reliability at high cost.

    In other words, they turn the traditional model of computing upside down.

  • by rsynnott on 12/28/24, 8:13 AM

    > I'm also trying to avoid subjective measures, so even if it isn't "good", an LLM can already write a book.

    I mean, if ‘good’ is not a criteria, then so can a cat.

    “What can they do _usefully_” is perhaps a more interesting question, and it’s a far shorter list. The only thing I’ve seen that I fully buy that they can do to a useful standard is writing estate agent advertising copy, and that’s only because, in that industry, it is customary to use painfully awful prose, and accuracy has never been considered important.

  • by scjody on 12/27/24, 4:40 PM

    Spatial reasoning.

    (But we're working on it.)

  • by al2o3cr on 12/27/24, 6:16 PM

        made a bet with my friends that we would have autonomous-only
        highway lanes in 2024
    
    Which... checks notes... we don't have.

        in the end I think that I was more right than wrong
    
    In exactly the same sense that "false" is mostly true, sure. /s

    More-concretely, here's a simple question put to Gemini ( https://g.co/gemini/share/2d68036d0ff7 ):

        I have a two-gallon bucket and a five-gallon bucket. 
        How can I measure out 7 gallons of water?
    
    And Gemini's scintillating reply:

        You can't measure out exactly 7 gallons using only a 2-gallon bucket
        and a 5-gallon bucket. The combinations you can make are limited to
        multiples of the greatest common divisor of the bucket sizes. In this
        case, the greatest common divisor of 2 and 5 is 1. This means you can
        measure any whole number of gallons. Here's one way to get 7 gallons
        using the two buckets:
    
        * Fill the 5-gallon bucket completely.
        * Pour water from the 5-gallon bucket into the 2-gallon bucket until it's full. 
          You now have 3 gallons left in the 5-gallon bucket.
        * Empty the 2-gallon bucket.
        * Pour the 3 gallons from the 5-gallon bucket into the empty 2-gallon bucket.
        * Fill the 5-gallon bucket completely again.
        * Carefully pour water from the 5-gallon bucket into the 2-gallon bucket
          (which already has 3 gallons) until the 2-gallon bucket is full. This will
          use 1 gallon from the 5-gallon bucket.
        * You now have 4 gallons in the 5 gallon bucket. Empty the two gallon bucket.
        * Pour the four gallons into the two gallon bucket twice. This will give you
          eight gallons.
        * Fill the five gallon bucket and pour one gallon into the two gallon bucket.
          This will leave you with four gallons in the five gallon bucket. Empty the
          two gallon bucket.
        * Pour the four gallons into the two gallon bucket twice. This will give you
          eight gallons.
    
        You can't get exactly 7 gallons with just these two buckets.
        You can get 1, 2, 3, 4, 5, 6, 8, 9, etc., but not 7 directly.
    
    Reminding Gemini that pouring 3 gallons into a 2-gallon bucket is impossible produces even more nonsense:

        You are absolutely right! My apologies. You can't pour 3 gallons into a 2-gallon
        bucket. That was a significant error in my reasoning.
    
        You cannot measure out exactly 7 gallons using only a 2-gallon bucket and a
        5-gallon bucket. It's not possible through any combination of filling,
        emptying, and pouring between the two.
    
        Here's why: Any amount of water you measure will be a linear combination of
        2 and 5 (2x + 5y, where x and y are integers representing the number of times
        you fill or empty each bucket). The possible amounts are:
    
        2
        4
        5
        6
        7 (2+5)
        8
        9
        10 and so on.
        However, you can't get exactly 1 or 3 by doing this.
    
        My apologies for the initial incorrect response. Thank you for pointing out my mistake!
    
    This second explanation is fractally wrong:

    * 1 and 3 are both achievable by the "partially pour from the 5-gallon to the 2-gallon" approach

    * this explanation for why 7 is impossible identifies 7 as possible and even decomposes it

    Prompting a THIRD time calling out those impossibilities produces another round of "put 3 gallons into the 2-gallon bucket" that works out how to get to 1 gallon to very-inefficiently get to 7, then finally produces the "fill both buckets" answer.

    On the one hand, the production of so much sensible-looking prose from simple prompts is impressive.

    On the other hand, pretending that a system that can't even ADD 2 AND 5 TOGETHER is somehow anywhere near ready to be used without direct human involvement is bizarre.

  • by timelesstard on 12/27/24, 5:28 PM

    > I'm getting a bit grey in the beard, but not quite there yet

    fuck does this even mean

  • by chistev on 12/27/24, 4:21 PM

    It can't have NSFW conversations