by MrUssek on 7/2/20, 12:58 AM with 35 comments
by cs702 on 7/2/20, 1:36 AM
In a very short time, transformers have gone from under 1B, to 1.5B, to 3B, to 5B, to 175B, and now 600B parameters. 1T is only, what, like 67% more parameters, and therefore likely to be achieved in the short term. In fact, the authors of this paper tried 1T but ran into numerical issues that they will surely address soon. Not long after someone crosses 1T, expect 10T to become the next target. And why not? The best-funded AI research groups are in a friendly competition to build the biggest, baddest, meanest m-f-ing models the world has ever seen.
Scores continue to increase with diminishing returns, which is all fine and nice, but more importantly it seems we should expect to see machine-generated text getting much better from a qualitative standpoint -- that is, becoming less and less distinguishable from a lot of human output. That has been the trend so far.
We live in interesting times.
by dig6x on 7/2/20, 1:28 AM
It does appear that at the initial, resource intensive stages of tech like NLP big tech is primed to pave the way. We saw this happen across cloud, AI more generally, storage etc. But big tech then begins focusing on making the tech accessible to industry value chains (Azure, AWS, Amazon's AI services etc.). But as the industry matures there's more room for specialized startups/companies to enter the space to capture lucrative niches - thats exactly what Snowflake did for Cloud.
Definitely see this kind of scale as a step toward a more robust, mature industry if anything. Better it move forward than not.
by mensetmanusman on 7/2/20, 1:38 AM
by modeless on 7/2/20, 5:57 AM
We've barely scratched the surface of what's possible. Even if Moore's Law was dead (though it seems that TSMC may keep it alive for a bit longer) there are huge gains to be had when co-designing models and hardware. Stuff like https://www.cerebras.net/ is the direction I expect things to go.
by Der_Einzige on 7/2/20, 1:10 AM
Still impressive, don't get me wrong, but I am starting to believe that NLP will be dominated increasingly by the big players since they are the only ones who can train a 1 TRILLION parameter model (they show that in the paper). I can't even do inference with a 36 layer, 2048 neuron per layer network with my GTX 2080ti. Sad....
by teruakohatu on 7/2/20, 3:19 AM
A 1 trillion parameter model should not be far off, which is about the same number of synapses as house mice.
We will be around 1% of the way to human brain complexity (Well, probably not but it is fun to think of it).
[1] https://en.wikipedia.org/wiki/List_of_animals_by_number_of_n...
by justicezyx on 7/2/20, 2:19 AM