from Hacker News

Fine-tune Google's Gemma 3

by tomdekan on 3/19/25, 4:34 PM with 78 comments

  • by smokel on 3/19/25, 8:51 PM

    I'm interested to know if anyone is using fine-tuning to train a model on proprietary or in-house codebases and documentation.

    RAG solutions seem to have their limitations, and fine-tuning might be a more effective approach.

    How much effort is required to turn code into something one can use for fine-tuning?

  • by zk on 3/20/25, 5:24 AM

    Is there a version of Gemma 3 that has tool calling? Google's blog claimed it supports tools but it doesn't seem like it actually does.
  • by bryan0 on 3/19/25, 7:39 PM

    Are people fine-tuning LLMs on their local machines with a single GPU? What are people using to scale their training to multiple nodes / gpus? I've been playing around with Hugging Face Estimators in sagemaker.huggingface but not sure if there are better options for this?
  • by rockwotj on 3/19/25, 6:44 PM

    is anyone outside of the research labs fine tuning models for production use cases? I have been seeing more people just using foundational models off the shelf especially in light of a new advancement that seems to come every few months
  • by yieldcrv on 3/19/25, 7:07 PM

    Instead of versions, these things should be labeled by their release date, since this kind of training is based on started at a dataset snapshot in time, colloquially called knowledge-cutoff date which isnt really accurate

    we are optimizing these on different dimensions at once, and multiple branches of evolution from each model

    so a successor version name doesn't really convey that

  • by huqedato on 3/19/25, 11:14 PM

    Great article, but I didn't see anything about the costs.

    I'm particularly interested in this aspect because we're considering fine-tuning Gemma 3, but our budget is tight. We're looking into (real-world) cost estimates for this approach.

  • by siliconc0w on 3/19/25, 6:54 PM

    It likely makes sense to use more expensive frontier models as teachers or architects for smaller fine-tuned ones that generate the majority of tokens (though possibly against the ToS).
  • by admiralrohan on 3/19/25, 9:44 PM

    Have anyone used those small models in any production environment?

    If yes, what they are good and bad at?

  • by dhooper on 3/20/25, 11:03 AM

    Please try to enjoy each Gemma tuning equally, and not show preference for any over the others