from Hacker News

Classifying customer messages with LLMs vs traditional ML

by hellovai on 7/11/23, 2:51 PM with 123 comments

  • by 19h on 7/11/23, 9:00 PM

    We’re classifying gigabytes of intel (SOCMINT / HUMINT) per second and found semantic folding or better in classification quality vs throughput than BERT / LLMs.

    How it works — imagine you’re having these sentences:

    “Acorn is a tree” and “acorn is an app”

    You essentially keep record of all word to word relations internal to a sentence:

    - acorn: is, a, an, app, tree Etc.

    Now you repeat this for a few gigabytes of text. You’ll end up with a huge map of “word connections”.

    You now take the top X words that other words connect to (I.e. 16384). Then you create a vector of 16384 connections, where each word is encoded as 1,0,1,0,1,0,0,0, … (1 is the most connected to word, 0 the second, etc. 1 indicates “is connected” and 0 indicates “no such connection).

    You’ll end up with a vector that has a lot of zeroes — you can now sparsify it (I.e. store only the positions of the ones).

    You essentially have fingerprints now — what you can do now is to generate fingerprints of entire sentences, paragraphs and texts. Remove the fingerprints of the most common words like “is”, “in”, “a”, “the” etc. and you’ll have a “semantic fingerprint”. Now if you take a lot of example texts and generate fingerprints off it, you can end up with a very small amount of “indices” like maybe 10 numbers that are enough to very reliably identify texts of a specific topic.

    Sorry, couldn’t be too specific as I’m on the go - if you’re interested drop me a mail.

    We’re using this to categorize literally tens of gigabytes per second with 92% precision into more than 72 categories.

  • by nestorD on 7/11/23, 7:01 PM

    LLMs are significantly slower than traditional ML, typically costlier and, I have been told, tend to be less accurate than a traditional model trained on a large dataset.

    But, they are zero/few shot classifiers. Meaning that you can get your classification running and reasonably accurate now, collect data and switch to a fine-tuned very efficient traditional ML model later.

  • by alexmolas on 7/11/23, 6:25 PM

    Where's the comparison with traditional ML? In the article I only see the good things about using LLM, but there's no mention to traditional ML besides from the title.

    It would be nice to see how compares this "complex" approach against a "simple" TF-IDF + RF or SVM.

  • by rossirpaulo on 7/11/23, 3:02 PM

    This is great! We had a similar thought and couldn't agree more with "LLMs prefer producing something rather than nothing." We have been consistently requesting responses in JSON format, which, despite its numerous advantages, sometimes imposes an obligation for an output even if it shouldn't. This frequently results in hallucinations. Encouraging NULL returns, for example, is a great way to deal with that.
  • by crazygringo on 7/11/23, 5:03 PM

    This is really interesting.

    I'm really wondering when LLM's are going to replace humans for ~all first-pass social media and forum moderation.

    Obviously humans will always be involved in coming up with moderation policy and judging gray areas and refining moderation policy... but at what point will LLM's do everything else more reliably than humans?

    6 months from now? 3 years from now?

  • by rckrd on 7/11/23, 7:33 PM

    I just released a zero-shot classification API built on LLMs https://github.com/thiggle/api. It always returns structured JSON and only the relevant categories/classes out of the ones you provide.

    LLMs are excellent reasoning engines. But nudging them to the desired output is challenging. They might return categories outside the ones that you determined. They might return multiple categories when you only want one (or the opposite — a single category when you want multiple). Even if you steer the AI toward the correct answer, parsing the output can be difficult. Asking the LLM to output structure data works 80% of the time. But the 20% of the time that your code parses the response fails takes up 99% of your time and is unacceptable for most real-world use cases.

    [0] https://twitter.com/mattrickard/status/1678603390337822722

  • by Animats on 7/12/23, 4:31 AM

    What's the application?

    If you're using this to direct messages to approximately the correct department, it doesn't have to be that complicated.

    If you're doing this to evaluate customer sentiment, you could probably just select a few hundred messages at random and read them. (There are many "big data" problems which are only big due to not sampling.)

  • by i-am-agi on 7/11/23, 5:19 PM

    Wohoo this is amazing! I have been using the Autolabel (https://news.ycombinator.com/item?id=36409201) library so far for labeling a few classification and question answering datasets and have been seeing some great performance. Would be interested in giving gloo a shot as well to see if it helps performance further. Thanks for sharing this :)
  • by r_singh on 7/11/23, 6:38 PM

    I have been using LLMs for ABSA, text classification and even labelling clusters (something that had to be done manually earlier on) and I couldn't be happier.

    It was turning out to be expensive earlier but with optimising the prompt a lot, reduced pricing by OpenAI and now also being able to run Guanaco 13/33B locally has made it even more accessible in terms of pricing for millions of pieces of text.

  • by wilg on 7/11/23, 6:26 PM

    Classic HN website nitpick: Logo should link to home page. In this case it is a link but just goes to the current page. However, points for being able to easily get to the main product page from the blog, usually that's buried.
  • by m3kw9 on 7/12/23, 3:37 AM

    Prob cheaper with ML but you need training, with transfer learning though you can use a pub trained model and use way less data to train up a classifier like single digit thousands may be ok with 2-5 sentiments
  • by andrewgazelka on 7/12/23, 5:59 PM

    My understanding was training on ChatGPT output was against OpenAI ToS. Is this incorrect for this use case (training BERT)?
  • by YetAnotherNick on 7/12/23, 1:53 AM

    Interested in knowing how you are running BERT model with $35/month? Cheapest GPU instance costs $200-300/month AFAIK.
  • by caycep on 7/11/23, 8:53 PM

    what's "traditional ML"?