from Hacker News

How LLMs Work, Explained Without Math

by kdamica on 5/6/24, 5:40 AM with 91 comments

  • by mjburgess on 5/6/24, 10:14 AM

    If commenters wish to know what is not, "guessing the next word", let me outline it.

    Compare, "I like what you were wearing", "Pass me the salt", and "Have you been to London recently?" as generated by an LLM and as spoken by a person.

    What is the reason each piece of text (in a whatsapp chat, say) is provided?

    When the LLM generates each word it does so because it is, on average, the most common word in a corpus of text on which it was trained: "wearing" follows, "I like what you were" because most people who were having these conversations, captured in the training data, were talking about clothes.

    When a person types those words on a keyboard, the following are the causes: the speaker's mental states of recollection, preference, taste; the speaker's affective/attachement states with respect to their friend; the speaker's habitation into social cues; the speaker's imagining through recall what their friend was wearing; the speaker's ability to abstract from their memories into identifying clothing; and so on.

    Indeed, the cause of a person speaking is so vastly different to generating a word based on a historical frequency, that to suppose these are related seems incomprehensible.

    The only reason the illusion of similarity is effective is because the training data is a text-based observation of the causal process in people: the training data is distributed by people talking (and so on). Insofar as you cannot just replay variations on these prior conversations, the LLM will fail and expose itself as actually insensitive to any of these things.

    I'd encourage credulous fans of AI not to dehumanize themselves and others by the supposition that they speak because they are selecting an optimal word from a dictionary based on all prior conversations they were a part of. You aren't doing that.

  • by astrange on 5/6/24, 9:32 AM

    > The assumption that most people make is that these models can answer questions or chat with you, but in reality all they can do is take some text you provide as input and guess what the next word (or more accurately, the next token) is going to be.

    These two things cannot be compared or contrasted. It's very common to see people write something like "LLMs don't actually do <thing they obviously actually do>, they just do <dismissive description of the same thing>."

    Typically, like here, the dismissive description just ignores the problem of why it manages to write complete novel sentences when it's only "guessing" subword tokens, why those sentences appear to be related to the question you asked, and why they are in the form of an answer to your question instead of another question (which is what base models would do).

  • by mft_ on 5/6/24, 10:27 AM

    How does this concept explain (for example) an LLM’s ability to provide a precis of an article? Or to compare two blocks of text and highlight differences? Or to take an existing block of code and find and correct an error?
  • by thefz on 5/6/24, 8:44 AM

    > On the other side, given the propensity of LLMs to hallucinate, I wouldn't trust any workflow in which the LLM produces output that goes straight to end users without verification by a human.

    Yep. Nice article, though!

  • by pietmichal on 5/6/24, 11:21 AM

    This was such a nice primer that inspired me to give Karpathy's series another try. Loved the explanation!
  • by l5870uoo9y on 5/6/24, 10:29 AM

    Are there any open source implementation of neural network "functions"? And the layering, transformers and attention mechanisms.
  • by z7 on 5/6/24, 11:34 AM

    "I'll begin by clearing a big misunderstanding people have regarding how Large Language Models work. The assumption that most people make is that these models can answer questions or chat with you, but in reality all they can do is take some text you provide as input and guess what the next word (or more accurately, the next token) is going to be."

    What separates this from the following:

    "I'll begin by clearing a big misunderstanding people have regarding how the human brain works. The assumption that most people make is that the brain can think, reason, and understand language, but in reality all it can do is process electrical and chemical signals."

  • by alabhyajindal on 5/6/24, 5:02 PM

    Great read, thanks for sharing!