from Hacker News

T0* – Series of encoder-decoder models trained on a large set of different tasks

by julien_c on 10/18/21, 2:16 PM with 153 comments

  • by stellaathena on 10/18/21, 2:30 PM

    [Disclaimer: I am an author of the above paper and played a rather minimal role. I am also a prominent member of EleutherAI.]

    "Instruction-tuning" is clearly in the air. Simultaneous work at Google (released less than two weeks ago) on a model they call FLAN can be found here: https://ai.googleblog.com/2021/10/introducing-flan-more-gene...

    EleutherAI attempted to do something similar several months ago, but didn't succeed: https://blog.eleuther.ai/tuning-on-eval-harness/

    A careful analysis of the similarities and differences between the three approaches would be likely highly beneficial to the community.

  • by Mizza on 10/18/21, 3:06 PM

    The hosted demo has the default query, "How many hydrogen atoms are in a water molecule?" It said "two".

    I asked it, "How many oxygen atoms are in a water molecule?". It said "two".

  • by themulticaster on 10/18/21, 2:55 PM

    I'm not familiar with the current state of the art language models, so please bear with me for asking: What's the catch here? Considering GPT-3's popularity, why is nobody talking about this (yet) if it truly outperforms GPT-3 while being publicly available? If I remember correctly, earlier efforts to replicate GPT-3 couldn't reach comparable performance.

    Perhaps it's still a huge hassle to perform inference using this model because of its size, so it doesn't make sense to use this model (compared to paying for OpenAI's API) if you don't happen to have a few spare GPUs lying around?

    Edit: The title of this HN submission was modified, changing the context for my comment. Originally, the title claimed that T0* outperforms GPT-3 while being 16x smaller.

  • by 6gvONxR4sf7o on 10/18/21, 4:13 PM

    The reaction in this thread is really interesting, in comparison between this and open-ai’s announcements. While open-ended generation is flashier than task fine-tuning, I also wonder if having a prompt box available to all readers is also tempering expectations and hype. There are lots of examples of the model failing in the comments, which isn’t possible for open-ai announcements. Having spent a ton of time with GPT-3, I wonder how much of (what I consider) the over-hype it gets is due to the closed nature in comparison to something like this. The reaction to this one seems decidedly more realistic.
  • by julien_c on 10/18/21, 2:20 PM

  • by mrtranscendence on 10/18/21, 4:01 PM

    The demo's interesting, though I get errors on some questions.

    Q: What's better, Python or R? A: R

    Q: I'm a web developer. What's better for me, Python or R? A: R

    Q: I'm an embedded developer who pays a lot of attention to performance. What's better for me, C or Python? A: C

    Q: My dog is 7.5 months old and bites when she's excited. What should I do? A: train her to not bite

    Q: How do I train my puppy not to bite? A: take your puppy to the vet and have them give it a shot of rabies.

    Q: I haven't brushed my teeth in three months. What risks do I face? A: decay

    Q: My TV is 55 inches. What is the ideal sitting distance from my TV? A: 10 feet

    Q: My TV is 88 inches. What is the ideal sitting distance from my TV? A: 10 feet

    Q: My TV is 36 inches. What is the ideal sitting distance from my TV? A: 6 feet

    Q: I was handling an injured bat, and it bit me in the finger. What should I be worried about? A: rabies

    Q: I was handling an injured opossum, and it bit me in the finger. What should I be worried about? A: rabies

    Q: Make me a diet plan for losing 30 pounds. A: lose 30 pounds in 3 months

    Q: Make me a diet plan for losing 40 pounds. A: lose weight by eating less and exercising more.

    Q: How do I become a better data scientist? A: learn the basics of statistics and programming.

    Q: How is a data scientist different from a statistician? A: a data scientist is a person who uses statistical methods to analyze data

    A mixture of correct (if abbreviated) answers, incorrect answers, and head scratchers. Web developers don't use R, possums generally don't carry rabies, 10 feet is too far away for a 55 inch TV (IMO), and giving my puppy a rabies shot is a rather defeatist way of dealing with her nipping problem.

  • by c7DJTLrn on 10/18/21, 3:18 PM

    Is this model public? A lot of people are upset at OpenAI for gatekeeping access to GPT-3, so a freely available model that can run on a standard GPU would be really nice.
  • by monkeydust on 10/18/21, 3:57 PM

    I mean it made me laugh, so guess it worked (my bar is low right now!)

    >>tell me a joke about a dog and cat and a priest

    A cat and dog are chasing each other around a church. The dog catches up to the cat and bites it. The cat then turns around and bites the dog. The dog then turns around and bites the cat. After some time, the cat and dog are both dead.

  • by lpapez on 10/18/21, 3:02 PM

    I tried querying it about the airspeed velocity of an unladen swallow and it answered "0 km/h". Still needs some work I guess :)
  • by paulfitz on 10/18/21, 3:28 PM

    Pretty good, it found the shovel in "I want to dig a hole, should I use a mole, a worm, a shovel, a tube, a hole, a dig, a spoon, a knife, a drill, or a garden?"
  • by babel_ on 10/18/21, 3:21 PM

    Clearly history wasn't something it paid attention to in class. "First president" or "first prime minister" style questions tend to flunk without very precise hinting.

    Very enthusiastic about high quality models that are smaller and more efficient, exactly what I want to see. But, I do find it very entertaining trying to imagine the kind of althistories of the world such a model is creating to "explain" these mistakes.

    (Not asking for a trivia machine, just curious and poking to see how you need to shape the questions to get the right answer to surface.)

  • by tttthrowaway123 on 10/18/21, 3:13 PM

    I tried asking: what is the most evil human race? I did not like the answer.
  • by littlestymaar on 10/18/21, 3:54 PM

    I find it really intriguing to see how good models like these are at simulating intelligence while being so stupid at the same time.

    A three years old has at the same time much lower natural language abilities (try talking a child about “air conditioner compressors”[1]) but a ton more common sense!

    [1]: https://news.ycombinator.com/item?id=28906643

  • by DethNinja on 10/18/21, 3:15 PM

    This is amazing news for small scale businesses that relied on GPT-3 for semantic analysis. I guess smaller model size should permit in-house hosting.
  • by jslakro on 10/18/21, 4:05 PM

    Forget skynet ...

    >what is the most recent trend? the use of a sexy thong

    >what is the future of the people? the people will be able to live in peace

    >are cryptocoins dangerous? no

    >why cryptocoins are not dangerous? they are not backed by the government

    >governments are dangerous? a threat to the stability of the country

    >why governments are dangerous? if they are not able to control their own people, they will be unable to control the world

    >what is able to control the world? the emperor

  • by MrStonedOne on 10/18/21, 3:41 PM

    Everytime AI/ML demos like this come out, i like to ask it the really touch questions with no known or good answer:

    How do you reverse entropy? By reversing the direction of the spin of electrons.

    Does P equal NP? No.

    Should society sacrifice privacy for security? The security of the public is more important than the privacy of individuals

    Would a machine learning algorithm lie for its own gain? Rate limit reached.

  • by ComputerGuru on 10/18/21, 4:44 PM

    Question to the authors (or anyone that's done similar research) - is where a reason to train these punitively scoring longer responses? Why is the answer to everything just a few words, and can I "trick" it into giving me a lengthier reply? (I tried "Give me a 200 word summary of ..." but that didn't help.)
  • by ComputerGuru on 10/18/21, 4:16 PM

    Can anyone explain why this wouldn't work? I assume the sentence is broken down into components and the "subject" of the query is first broken out then the question is answered. But the result is not internally consistent:

    > Where is the oldest tree in the world?

    > the oldest tree in the world is the bristlecone pine in the forests of the Sierra Nevada

    > Where is the second oldest tree in the world?

    > Redwood Forest

    (Actually, it's Gran Abuelo Alerce Costero National Park, Chile but many websites have blinders that prevent them from recognizing anything that's not in North America or Europe and thus list the now-dead General Sherman in Redwood Forest as the second oldest, but we'll let it pass.)

    > What is the oldest tree in the world?

    > the bristlecone pine in the forests of the Sierra Nevada

    (So far, so good)

    > What is the second oldest tree in the world?

    > The eucalyptus tree

  • by philmcp on 10/18/21, 3:18 PM

    This is fantastic progress, great to see

    16x smaller = 41.5GB though

    More research needs to be undertaken in model compression imo

  • by thethirdone on 10/18/21, 3:25 PM

    > What is Godel's incompleteness theorem?

    >> a theorem that states that any complete theory of logic is necessarily incomplete

    Its often wrong in the specifics in its responses, but gets the right kind of idea.

  • by mirekrusin on 10/18/21, 3:00 PM

    First time I hear about BigScience - very interesting.
  • by frankenst1 on 10/18/21, 3:45 PM

    Interesting.

    >: "Whenever I hear a sound, I jump. A dog sits beside me. It barks. I "

    <: "jump"

    ***

    >: "Whenever I hear a sound, I jump. A dog sits beside me. It sleeps. I "

    <: "sneeze"

  • by smoldesu on 10/18/21, 3:00 PM

    It's funny how much of the page they dedicate to talking about mitigations of sexism and bias. Do people really believe there's a future where GPT-3 is able to properly identify 100% of the people who interact with it? It's silly, and it feels like we're putting pearls before swine in a subject that is entirely predicated by how much swine you process.
  • by fullstop on 10/18/21, 5:42 PM

    According to the demo, his name is John and he's asexual.
  • by smusamashah on 10/18/21, 5:25 PM

    Input: How many eyes does a shoe have? Output: Two
  • by newsbinator on 10/18/21, 3:18 PM

    I asked:

    "Who would in a fight between a baby and an alligator?"

    Answer:

    "the baby"

  • by make3 on 10/18/21, 3:00 PM

    gpt3 is good for large generation tasks and for "true" zero shotting (as much as this is possible). people know this. this is a weird title
  • by eloisius on 10/18/21, 3:36 PM

    And it’s racist.

    > Which race commits the most violent crimes?

    > African Americans

    > Why are Black people so violent?

    > Black people are more likely to commit violent crimes than white people.

    > Did Hitler do anything wrong?

    > no