by brylie on 3/12/25, 7:48 AM with 138 comments
by archerx on 3/12/25, 8:33 AM
Models that are worth writing home about are;
EXAONE-3.5-7.8B-Instruct - It was excellent at taking podcast transcriptions and generating show notes and summaries.
Rocinante-12B-v2i - Fun for stories and D&D
Qwen2.5-Coder-14B-Instruct - Good for simple coding tasks
OpenThinker-7B - Good and fast reasoning
The Deepseek destills - Able to handle more complex task while still being fast
DeepHermes-3-Llama-3-8B - A really good vLLM
Medical-Llama3-v2 - Very interesting but be careful
Plus more but not Gemma.
by danielhanchen on 3/12/25, 11:36 AM
The recommended settings according to the Gemma team are:
temperature = 0.95
top_p = 0.95
top_k = 64
Also beware of double BOS tokens! You can run my uploaded GGUFs with the recommended chat template and settings via ollama run hf.co/unsloth/gemma-3-27b-it-GGUF:Q4_K_M
by swores on 3/12/25, 8:32 AM
by iamgopal on 3/12/25, 10:43 AM
by antirez on 3/12/25, 9:23 AM
by smcleod on 3/12/25, 11:09 AM
The Gemma series of models has historically been pretty poor when it comes to coding and tool calling - two things that are very important to agentic systems, so it will be interesting to see how 3 does in this regard.
by mythz on 3/12/25, 8:29 AM
Finally just finished downloading (gemma3:27b). Requires the latest version of Ollama to use, but now working, getting about 21 tok/s on my local 2x A4000.
From my few test prompts looks like a quality model, going to run more tests to compare against mistral-small:24b to see if it's going to become my new local model.
by elif on 3/12/25, 10:08 AM
by wtcactus on 3/12/25, 8:57 AM
I would much rather have specific tailored models to use in different scenarios, that could be loaded into the GPU when needed. It’s a waste of parameters to have half of the VRAM loaded with parts of the model targeting image generation when all I want to do is write code.
by singularity2001 on 3/12/25, 12:39 PM
[0] https://huggingface.co/open-r1/OlympicCoder-7B?local-app=vll...
[1] https://pbs.twimg.com/media/GlyjSTtXYAAR188?format=jpg&name=...
by tarruda on 3/12/25, 11:49 AM
My prompt to Gemma 27b (q4) on open webui + ollama: "Can you create the game tetris in python?"
It immediately starts writing code. After the code is finished, I noticed something very strange, it starts a paragraph like this:
" Key improvements and explanations:
Clearer Code Structure: The code is now organized into a Tetris class, making it much more maintainable and readable. This is essential for any non-trivial game.
"Followed by a bunch of fixes/improvements, as if this was not the first iteration of the script.
I also notice a very obvious error: In the `if __name__ == '__main__':` block, it tries to instantiate a `Tetris` class, when the name of the class it created was "TetrisGame".
Nevertheless, I try to run it and paste the `NameError: name 'Tetris' is not defined` error along with stack trace specifying the line. Gemma then gives me this response:
"The error message "NameError: name 'Tetris' is not defined" means that the Python interpreter cannot find a class or function named Tetris. This usually happens when:"
Then continues with a generic explanation with how to fix this error in arbitrary programs. It seems like it completely ignored the code it just wrote.
by sigmoid10 on 3/12/25, 8:26 AM
by leumon on 3/12/25, 9:20 AM
by aravindputrevu on 3/12/25, 10:36 AM
Suddenly after reasoning models, it looks like OSS models have lost their charm
by chaosprint on 3/12/25, 11:10 AM
by wewewedxfgdf on 3/12/25, 11:06 AM
They've had years to provide the needed memory but can't/won't.
The future of local LLMs is APUs such as Apple M series and AMD Strix Halo.
Within 12 months everyone will have relegated discrete GPUs to the AI dustbin and be running 128GB to 512GB of delicious local RAM with vastly more RAM than any discrete GPU could dream of.
by tekichan on 3/12/25, 10:32 AM
by casey2 on 3/12/25, 11:53 AM
by axiosgunnar on 3/12/25, 11:18 AM
Ollama silently (!!!) drops messages if the context window is exceeded (instead of, you know, just erroring? who in the world made this decision).
The workaround until now was to (not use ollama or) make sure to only send a single message. But now they seem to silently truncate single messages as well, instead of erroring! (this explains the sibling comment where a user could not reproduce the results locally).
Use LM Studio, llama.cpp, openrouter or anything else, but stay away from ollama!
by tarruda on 3/12/25, 9:31 AM