from Hacker News

Releasing v1 of GPT-JT, fork of GPT-6B fine-tuned on 3.53B tokens

by b_mc2 on 11/30/22, 2:21 AM with 17 comments

  • by ipsum2 on 11/30/22, 4:03 AM

    If anyone's wondering, this model is not the GPT-3-killer. The model will be mostly only useful for classification, and not general text generation. It's not an apple to apples comparison, since the other models were not fine-tuned on the same dataset.

    Interesting that they didn't compare the model to Flan-T5 or TK-Instruct, both of which were fine-tuned on similar data and should display comparable results using the same amount of parameters. See the leaderboard here: https://huggingface.co/spaces/ought/raft-leaderboard

    Nonetheless, props for open sourcing the model and attempting to develop new techniques for decentralized training of large scale transformers, this is no easy feat.

  • by selcuka on 11/30/22, 4:33 AM

    Text summarization examples [1] are fun:

    > Input: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as 'Jumbo'.

    > Output: Not as Advertised

    [1] https://huggingface.co/spaces/togethercomputer/GPT-JT

  • by haolez on 11/30/22, 3:48 AM

    What does this mean? Can I download the trained model and run it on my machines? Assuming I won't need a supercomputer to run it.
  • by Tepix on 11/30/22, 1:40 PM

    What's the best chatty model to run locally on a RTX 3090? This seems cool but it's a bit hard to get it to talk.
  • by malshe on 11/30/22, 3:59 PM

    Has anyone tried running it on M1 MBP? How is the performance?