from Hacker News

Grok

by pierre on 3/17/24, 7:33 PM with 419 comments

by extheat on 3/17/24, 7:43 PM
At 8x86B, looks like the largest open model yet by far. Would be interesting to hear how many tokens it's been trained on. Especially important for higher param models in order to efficiently utilize all those parameters.
by ilaksh on 3/18/24, 4:37 AM
Has anyone outside of x.ai actually done inference with this model yet? And if so, have they provided details of the hardware? What type of AWS instance or whatever?
I think you can rent like an 8 x A100 or 8 x H100 and it's "affordable" to play around with for at least a few minutes. But you would need to know exactly how to set up the GPU cluster.
Because I doubt it's as simple as just 'python run.py' to get it going.
by simonw on 3/17/24, 8:26 PM
"Base model trained on a large amount of text data, not fine-tuned for any particular task."
Presumably the version they've been previewing on Twitter is an instruction-tuned model which behaves quite differently from these raw weights.
by nasir on 3/18/24, 6:19 AM
I'd be very curious to see how it performs especially on inputs that's blocked by other models. Seems like Grok will differentiate itself from other OS models from a cencorship and alignment perspective.
by nylonstrung on 3/17/24, 7:51 PM
For what reason would you want to use this instead of open source alternatives like Mistral
by pogue on 3/17/24, 7:55 PM
Can someone explain why the weights are posted via a Bittorrent magnet link? I have no way to check the size at the moment, but isn't that a bit unusual? There's also only 21 seeders right now according to https://checker.openwebtorrent.com/
by joydeep314 on 3/18/24, 3:52 AM
Model weights on huggingface: https://huggingface.co/xai-org/grok-1
by cl3misch on 3/18/24, 9:40 AM
Love the minimal repo, magnet link, and stating "open weights" instead of "open source". Refreshing!
by stale2002 on 3/17/24, 8:11 PM
Hey, asking any experts here, what are their first thoughts in the significance of this?
IE, is this comparable to any other model released, or are there significant metric differences that make it better for certain usecases?
The only thing I see, of the top of my head, is that it is a very large model, and I don't think any models of similar size have been released.
by modeless on 3/17/24, 9:33 PM
Is this the first major model to be natively FP8? I was wondering why people hadn't done it yet. Seems like a big win when hardware supports it.
by tosh on 3/17/24, 7:43 PM
blog post: https://x.ai/blog/grok-os
```
  * 314B parameters (86B active at a time)
  * mixture of experts 8 (2 active at a time)
  * weights and architecture licensed under Apache 2.0
```
(edit:) announcement blog post from last year with benchmarks compared to Claude 2, GPT-3.5 and GPT-4: https://x.ai/blog/grok
(edit2:)TL;DR: somewhat comparable to GPT-3.5, Mixtral and Qwen-1.5-72B in capability but way larger than the open weight models
by shantnutiwari on 3/18/24, 5:18 PM
Those of us who dont spend all our time in LLMs-- whats this about? Whats the big deal and why is it on the front page at #1?
by moralestapia on 3/17/24, 8:09 PM
Well, he delivered.
by gardenhedge on 3/17/24, 7:54 PM
> Due to the large size of the model (314B parameters), a machine with enough GPU memory is required to test the model with the example code
What type of machine do you need to play around with this?
by simonw on 3/17/24, 8:12 PM
Is there a model card anywhere? I'd like to know what it was trained on.
by hubraumhugo on 3/17/24, 7:46 PM
When will we reach an upper limit/dimishing returns in terms of number of parameters and mixture of experts?
by littlestymaar on 3/17/24, 8:34 PM
How long before the Groq team sues for trademark violation? It's literally the purpose of trademark laws to make sure resembling names do not cause confusion in the mind of customers so it would be very surprising to see this situation persist.
by aussieguy1234 on 3/18/24, 4:13 AM
How hard would it be for an open source group to fine tune this into a chatbot?
by ArunRaja on 3/19/24, 12:59 PM
Is this grok open sourcing really a big deal? How is this move beneficial for grok per se? Does it build trust as in other opensource products..?
by LZ_Khan on 3/17/24, 8:17 PM
How are people's experience with this model? Having the most weights is one thing but being a better model than the 70B models is another.
by andre-z on 3/17/24, 9:14 PM
The only other Repository is a fork of Qdrant.
by sqreept on 3/18/24, 12:17 AM
What are the languages supported by it?
by rvnx on 3/17/24, 7:51 PM
One subtle thing: Musk said "open-source", we got "open-weights" instead (still better than nothing though, so it's greatly appreciated).
by captcanuk on 3/17/24, 9:15 PM
"The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model."
Or perhaps release your actual code AND the simplified implementation instead of hiding it and saying "you don't know her, she goes to a different high school"
by atleastoptimal on 3/18/24, 1:30 AM
I think everyone should realize the following realities of the LLM market
1. For sub-SOTA LLM's, distribution/marketing is more important than having a proprietary lock on capabilities. Open sourcing is a benefit for the firm, distincct from goodwill
2. For SOTA LLM's, keeping it closed and proprietary is the strategic play
If grok were SOTA Elon never would have open sourced it. It's not even SOTA within XAI. This is a marketing play to win public sentiment against OpenAI.
by redskyluan on 3/17/24, 9:30 PM
This seems not be a repo ready to open source. You only get weights, very less information about how the weights is trained and finetuned.
But anyway, it always great to see more LLM weigts available.
by sashank_1509 on 3/18/24, 1:39 AM
In all the debate about open source I don’t think people realize, this model is most likely not reproducible ever again even given the code. Here’s what you need to reproduce the model:
1. An exact snapshot of the data used, many companies don’t have this, you have rough dataset versions but remember if even 1 token is different, the model produced won’t be the same.
2. Data must be sent to the training algorithm in the exact same order as it was originally. So every data loader needs to be with a fixed random seed.
3. All the probabilistic parts of your model needs to have a fixed random seed. Here I’m thinking of stuff like dropout and for autoregressive models you might be sampling your previous output, you have to ensure they are properly seeded. Generally you do see fixed seeds in academic papers but it’s easy to miss stuff especially in distributed training jobs.
4. Here’s another interesting thing, you start your training job on 1000 GPUs and then suddenly 4 GPUs fail. What do you do? There might be deterministic ways to solve this but the standard approach is to discard all updates that that GPU was going to do and restart that GPU from scratch. You can see why this is a problem? Now if you want to reproduce this training you need to disable those GPU at the same time in the new training job to make this work.
I suspect there are even more things I didn’t think of that will make this model unique and irreproducible by training for eternity, almost like a human brain?
In fact the notion of exact reproducibility in the world of LLMs is silly, there is only approximate reproducibility, (models with similar scores in benchmarks) but nothing exact. That said I can see the value of releasing source code but I’m completely fine with grok not releasing it. Source code can reveal tricks that have not been published in papers yet that a company discovered to improve their model. Seeing the performance of Grok, I’m pretty confident there isn’t any great tricks to be found in their code so I don’t really care, I would be pretty curious about OpenAI’s or Anthropic’s source code though.
by seccode on 3/17/24, 8:30 PM
It would be cool if these models had conversations with us where they ask questions. I think the future of AI is models that ask questions. There is so much data to be gained by doing this.
by mattxxx on 3/17/24, 8:03 PM
I respect the openness here! This is the future that I want to see
by mvkel on 3/18/24, 1:44 AM
This feels like a "now we can say we're open" PR play rather than contributing much value to the open source community.
What is the practical use of this repo?
by machiaweliczny on 3/17/24, 8:25 PM
If they are so behind they could make it open source instead of open weights and get some help.
by orsenthil on 3/17/24, 8:50 PM
I am not sure what open source models are accomplishing another than killing the lead from the competition (openai), only to give it to someone else who has expertise in the area of distribution. This will be yet another good addition to systems like Amazon BedRock.
by 2devnull on 3/17/24, 8:06 PM
From issues: “Well the magnet file contains a 300GB checkpoint “
That’s why they are using a torrent I suppose.
by arduanika on 3/17/24, 8:17 PM
CODE_OF_CONDUCT.md has only five words. :)
by bbor on 3/17/24, 7:56 PM
Honestly the most interesting part is taking a peek at the kind of AI researcher working for Twitter after the objectively messy layoffs and subsequent crunch. I notice neither of them has Twitter mentioned on their GitHub, which is prolly for the best to avoid harassment lol.
Code wise, excited to see if this could grow into anything! I think it’s pretty clear that Grok didn’t have nearly enough investment to be a top model so Elon “sacrificed” it on a whim in his schoolyard spat with OpenAI, but I’m not complaining. I’ve always took Elon on his word that he truly is worried about centralization of AI, and I don’t think any of the emails released by his schoolmate Altman dissuade me of that. So I have some reasonable hope that he uses some of his immense resources to start “fighting the good fight” here with Le Cun
by greenpizza13 on 3/18/24, 4:19 PM
If we just stop looking at Elon, he will lose his power. Why oh why do we keep giving him attention? There are plenty of great models out there that _aren't_ backed by maniacs.