from Hacker News

Understanding, using, and finetuning Gemma

by rasbt on 2/24/24, 2:18 PM with 48 comments

  • by brucethemoose2 on 2/24/24, 2:50 PM

    What are HNers looking for in this article? The architectural differences, or how to run/finetune it?
  • by lopkeny12ko on 2/24/24, 6:54 PM

    Gemma, despite being developed by a company worth billions of dollars, is a phenomonally poor model.

    I tried the open source release yesterday. I started with the input string "hello" and it responded "I am a new user to this forum and I am looking for 100000000000000..." with zeros repeating forever.

    Ok, cool I guess. Looks like I'll be sticking with GPT-4.

  • by brunooliv on 2/24/24, 4:24 PM

    Anyone who uses these models for more than 10 min will immediately realize that they're really, really bad compared to other free, OSS models. Even Phi-2 was giving me "on par" results except that its a model of a different league.

    Many models are being released now, which is good to keep OpenAI on their toes and not mess up, but, truth be told, I've yet to see _any_ OSS model that I can run on my machine being as good as ChatGPT 3 (not 3.5, not 4, but the original one from when everyone went crazy).

    My hopes for consumer hardware ChatGPT-3.5 within 2024 probably lie with what Meta will keep building upon.

    Google was great, once. Now, they're a mere bystander in the larger scheme of things. I think that's a good thing. Everything in the world is cyclic and ephemeral and Google enjoyed their time while it lasted, but, newer and better things are and will, keep on coming.

    PS: Completely unrelated, but, gmail is now the only Google product I actively use. I don't, genuinely, remember the last time I did a Google Search... When I need to do my own digging I use Phind these days.

    Times are changing and that's great for tech and future generations joining the field and workforce!

  • by Solvency on 2/24/24, 3:51 PM

    Can we just stop talking about Gemini/Gemma for at least two years before it's improved? In fact, the two-year mark is rather strategic recommendation, because I guarantee it'll become vaporware by then anyway with Google's track record. It's outrageously poorly performing.
  • by behnamoh on 2/24/24, 2:47 PM

    Gemma (and Gemini) are heavily nerfed. Why are they on the news lately?

    Also, Gemma is a +9B model. I think it's not okay that Google compared it with Mistral and Llama 2 (7B) models.

    Google also took llama.cpp and used it in one of their Github repos without giving credit. Again, not cool.

    All this hype seems to be backed by Google to boost their models whereas in practice, the models are not that good.

    Google also made a big claim about Gemini 1.5 1M context window, but at the end of their article they said they'll limit it to 128K. So all that 1M flex was for nothing?

    Not to mention their absurd approach in alignment in image creation.