by antimatter15 on 5/5/23, 11:29 PM with 106 comments
by sphars on 5/6/23, 12:57 AM
by rawrmaan on 5/6/23, 12:01 AM
There's really only one thing I care about: How does this compare to GPT-4?
I have no use for models that aren't at that level. Even though this almost definitely isn't at that level, it's hard to know how close or far it is from the data presented.
by andy_xor_andrew on 5/6/23, 12:00 AM
On one hand, the resources required to run these models continues falling dramatically, thanks to the techniques discovered by researchers: GPTQ quantizing down to 4, 3, 2, even 1 bits! model pruning! hybrid vram offloading! better, more efficient architectures! 1-click finetuning on consumer hardware! Of course, the free lunches won't last forever, and this will level off, but it's still incredible.
And on the other side of the coin, the power of all computing devices continues its ever-upward exponential growth.
So you have a continuous lowering of requirements, combined with a continuous increase in available power... surely these two trends will collide, and I can only imagine what this stuff will be like at that intersection.
by knaik94 on 5/6/23, 12:19 AM
As the resouces required to train and fine tune these models becomes consumer handware friendly, I think we'll see a shift towards a bunch of smaller models. Open models like these also mean the results of securty and capability research is publicly available. Models like this one and the Replit code model will become the new base all open source models are based on. I am really looking forward to the gptj 4bit, cuda optimized 7b models, the others I have tested run fast on 2070max q and 16gb ram, I was getting ~7tokens/second. Lora can work directly with 4bit quantized models. While ggml, cpu models are very strong, I don't believe we're move away from gpu accelarated training and fine tuning anytime soon.
by practice9 on 5/6/23, 1:17 AM
by ftxbro on 5/5/23, 11:52 PM
by wtarreau on 5/7/23, 2:19 PM
Let's wait for someone to port it to a cheaper and more powerful C-based engine like llama-cpp.
by nico on 5/6/23, 3:21 AM
build a model that can change the number of parameters in the vicinity of some meaning, effectively increasing the local resolution around that meaning
so parameter space becomes linked-parameter space, between models
links could be pruned based on activation frequency
another way of seeing the concept is a tree of models/llms
and one additional model/llm that all it does is manage the tree (ie. build it as it goes, use it to infer, prune it, etc)
Or is it too dumb what I’m saying?
by ftxbro on 5/6/23, 12:40 AM
by born-jre on 5/6/23, 2:25 AM
by ibitto on 5/6/23, 8:21 AM
by mirker on 5/6/23, 1:17 AM
by acapybara on 5/5/23, 11:58 PM
The 3B model, being super fast and accessible, is a game changer for a lot of us who may not have the latest hardware. I mean, running on an RTX 2070 that was released 5 years ago? That's pretty cool.
As for the 7B model, it's great to see that it's already outperforming the Pythia 7B. The bigger dataset definitely seems to be making a difference here. I'm eager to see how far this project goes, and what kinda improvements we can expect in the coming weeks with the new RedPajama dataset they're working on.
One thing I found interesting is the mention of differences between the LLaMA 7B and their replication. I'd love to learn more about those differences, as it could shed light on what's working well and what could be improved further.