by grinich on 1/29/25, 5:04 PM with 32 comments
by lxe on 1/29/25, 5:45 PM
> DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1.
The amount of confusion on the internet because of this seems surprisingly high. DeepSeek R1 has 670B parameters, and it's not easy to run it on local hardware.
There are some ways to run it locally, like https://unsloth.ai/blog/deepseekr1-dynamic which should let you fit the dynamic quant into 160GBs of VRAM, but the quality will suffer.
Also MLX attempt on a cluster of Mac Ultras: https://x.com/awnihannun/status/1881412271236346233
by Flux159 on 1/29/25, 5:57 PM
There are a lot of low quant ways to run in less RAM, but the quality will be worse. Also, running a distill is not the same thing as running the larger model, so unless you have access to an 8xGPU server with lots of VRAM (>$50k), cpu inference is probably your best bet today.
If the new M4 Ultra Macs have 256GB unified RAM as expected, then you may still need to connect 3 of them together via Thunderbolt 5 in order to have enough RAM to run the Q8 model. Assuming that the speed of that will be faster than the EPYC server, but will need to test empirically once that machine is released.
by coder543 on 1/29/25, 5:26 PM
No… it downloads the 7B model by default. If you think that is large, then you better hold on to your seat when you try to download the 671B model.
by jascha_eng on 1/29/25, 5:48 PM
The other ones are fine-tunes of LLama 3.3 and Qwen2 which have been additionally trained on outputs of the big "Deepseek V3 + R1" model.
I'm happy people are looking into selfhosting models, but if you want to get an idea of what R1 can do, this is not a good way to do so.
by kristjansson on 1/29/25, 5:53 PM
by paradite on 1/29/25, 5:59 PM
Here's how to run deepseek-r1:14b (DeepSeek-R1-Distill-Qwen-14B) and set it to 8k context window:
ollama run deepseek-r1:14b
/set parameter num_ctx 8192
/save deepseek-r1:14b-8k
ollama serve
by Hizonner on 1/29/25, 5:44 PM
by rcarmo on 1/29/25, 5:24 PM
by _0xdd on 1/29/25, 5:36 PM
by j45 on 1/29/25, 6:25 PM
If you want the optionality of using the full model, there are private hosted models that can be connected in cheaply for those use cases into a "one place for all the models" locally.