by pierre on 3/17/24, 7:33 PM with 419 comments
by extheat on 3/17/24, 7:43 PM
by ilaksh on 3/18/24, 4:37 AM
I think you can rent like an 8 x A100 or 8 x H100 and it's "affordable" to play around with for at least a few minutes. But you would need to know exactly how to set up the GPU cluster.
Because I doubt it's as simple as just 'python run.py' to get it going.
by simonw on 3/17/24, 8:26 PM
Presumably the version they've been previewing on Twitter is an instruction-tuned model which behaves quite differently from these raw weights.
by nasir on 3/18/24, 6:19 AM
by nylonstrung on 3/17/24, 7:51 PM
by pogue on 3/17/24, 7:55 PM
by joydeep314 on 3/18/24, 3:52 AM
by cl3misch on 3/18/24, 9:40 AM
by stale2002 on 3/17/24, 8:11 PM
IE, is this comparable to any other model released, or are there significant metric differences that make it better for certain usecases?
The only thing I see, of the top of my head, is that it is a very large model, and I don't think any models of similar size have been released.
by modeless on 3/17/24, 9:33 PM
by tosh on 3/17/24, 7:43 PM
* 314B parameters (86B active at a time)
* mixture of experts 8 (2 active at a time)
* weights and architecture licensed under Apache 2.0
(edit:) announcement blog post from last year
with benchmarks compared to Claude 2, GPT-3.5 and GPT-4: https://x.ai/blog/grok(edit2:)TL;DR: somewhat comparable to GPT-3.5, Mixtral and Qwen-1.5-72B in capability but way larger than the open weight models
by shantnutiwari on 3/18/24, 5:18 PM
by moralestapia on 3/17/24, 8:09 PM
by gardenhedge on 3/17/24, 7:54 PM
What type of machine do you need to play around with this?
by simonw on 3/17/24, 8:12 PM
by hubraumhugo on 3/17/24, 7:46 PM
by littlestymaar on 3/17/24, 8:34 PM
by aussieguy1234 on 3/18/24, 4:13 AM
by ArunRaja on 3/19/24, 12:59 PM
by LZ_Khan on 3/17/24, 8:17 PM
by andre-z on 3/17/24, 9:14 PM
by sqreept on 3/18/24, 12:17 AM
by rvnx on 3/17/24, 7:51 PM
by captcanuk on 3/17/24, 9:15 PM
Or perhaps release your actual code AND the simplified implementation instead of hiding it and saying "you don't know her, she goes to a different high school"
by atleastoptimal on 3/18/24, 1:30 AM
1. For sub-SOTA LLM's, distribution/marketing is more important than having a proprietary lock on capabilities. Open sourcing is a benefit for the firm, distincct from goodwill
2. For SOTA LLM's, keeping it closed and proprietary is the strategic play
If grok were SOTA Elon never would have open sourced it. It's not even SOTA within XAI. This is a marketing play to win public sentiment against OpenAI.
by redskyluan on 3/17/24, 9:30 PM
But anyway, it always great to see more LLM weigts available.
by sashank_1509 on 3/18/24, 1:39 AM
1. An exact snapshot of the data used, many companies don’t have this, you have rough dataset versions but remember if even 1 token is different, the model produced won’t be the same.
2. Data must be sent to the training algorithm in the exact same order as it was originally. So every data loader needs to be with a fixed random seed.
3. All the probabilistic parts of your model needs to have a fixed random seed. Here I’m thinking of stuff like dropout and for autoregressive models you might be sampling your previous output, you have to ensure they are properly seeded. Generally you do see fixed seeds in academic papers but it’s easy to miss stuff especially in distributed training jobs.
4. Here’s another interesting thing, you start your training job on 1000 GPUs and then suddenly 4 GPUs fail. What do you do? There might be deterministic ways to solve this but the standard approach is to discard all updates that that GPU was going to do and restart that GPU from scratch. You can see why this is a problem? Now if you want to reproduce this training you need to disable those GPU at the same time in the new training job to make this work.
I suspect there are even more things I didn’t think of that will make this model unique and irreproducible by training for eternity, almost like a human brain?
In fact the notion of exact reproducibility in the world of LLMs is silly, there is only approximate reproducibility, (models with similar scores in benchmarks) but nothing exact. That said I can see the value of releasing source code but I’m completely fine with grok not releasing it. Source code can reveal tricks that have not been published in papers yet that a company discovered to improve their model. Seeing the performance of Grok, I’m pretty confident there isn’t any great tricks to be found in their code so I don’t really care, I would be pretty curious about OpenAI’s or Anthropic’s source code though.
by seccode on 3/17/24, 8:30 PM
by mattxxx on 3/17/24, 8:03 PM
by mvkel on 3/18/24, 1:44 AM
What is the practical use of this repo?
by machiaweliczny on 3/17/24, 8:25 PM
by orsenthil on 3/17/24, 8:50 PM
by 2devnull on 3/17/24, 8:06 PM
That’s why they are using a torrent I suppose.
by arduanika on 3/17/24, 8:17 PM
by bbor on 3/17/24, 7:56 PM
Code wise, excited to see if this could grow into anything! I think it’s pretty clear that Grok didn’t have nearly enough investment to be a top model so Elon “sacrificed” it on a whim in his schoolyard spat with OpenAI, but I’m not complaining. I’ve always took Elon on his word that he truly is worried about centralization of AI, and I don’t think any of the emails released by his schoolmate Altman dissuade me of that. So I have some reasonable hope that he uses some of his immense resources to start “fighting the good fight” here with Le Cun
by greenpizza13 on 3/18/24, 4:19 PM