by Philpax on 5/12/25, 1:46 AM with 65 comments
by throwanem on 5/12/25, 3:04 AM
by refulgentis on 5/12/25, 3:02 AM
It's not that they trained a new model, but they took an existing model and RL'd it a bit?
The scores are very close to QwQ-32B, and at the end:
"Overall, as QwQ-32B was already extensively trained with RL, it was difficult to obtain huge amounts of generalized improvement on benchmarks beyond our improvements on the training dataset. To see stronger improvements, it is likely that better base models such as the now available Qwen3, or higher quality datasets and RL environments are needed."
by iTokio on 5/12/25, 5:00 AM
Maybe this could be used as proof of work? To stop wasting computing resources in crypto currencies and get something useful as a byproduct.
by 3abiton on 5/12/25, 4:55 AM
by Thomashuet on 5/12/25, 8:08 AM
by danielhanchen on 5/12/25, 4:52 AM
./llama.cpp/llama-cli -hf unsloth/INTELLECT-2-GGUF:Q4_K_XL -ngl 99
Also it's best to read https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-e... on sampling issues for QwQ based models.
Or TLDR, use the below settings:
./llama.cpp/llama-cli -hf unsloth/INTELLECT-2-GGUF:Q4_K_XL -ngl 99 --temp 0.6 --repeat-penalty 1.1 --dry-multiplier 0.5 --min-p 0.00 --top-k 40 --top-p 0.95 --samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc"
by abtinf on 5/12/25, 3:46 AM
by esafak on 5/12/25, 2:41 AM
by schneehertz on 5/12/25, 3:40 AM
by mountainriver on 5/12/25, 3:01 AM
by quantumwoke on 5/12/25, 2:47 AM
by bwfan123 on 5/12/25, 3:32 PM
by ikeashark on 5/12/25, 3:53 PM
by ndgold on 5/12/25, 2:46 AM
by Mougatine on 5/12/25, 11:14 AM
by jumploops on 5/12/25, 3:04 AM
Personal story time: I met a couple of their engineers at an event a few months back. They mentioned they were building a distributed training system for LLMs.
I asked them how they were building it and they mentioned Python. I said something along the lines of “not to be the typical internet commenter guy, but why aren’t you using something like Rust for the distributed system parts?”
They mumbled something about Python as the base for all current LLMs, and then kinda just walked away…
From their article: > “Rust-based orchestrator and discovery service coordinate permissionless workers”
Glad to see that I wasn’t entirely off-base :)