by mrbonner on 1/27/25, 3:04 PM with 1 comments
I wanted to get back to that hobby again. This time, it's less about gaming but more on running AI tasks such as LLM and diffusion.
Do you have any recommendations for me to get started?
by roosgit on 1/27/25, 5:14 PM
I have an RTX 3060(12GB) and 32GB RAM. Just ran Qwen2.5-14B-Instruct-Q4_K_M.gguf in llama.cpp with flash attention enabled and 8K context. I get get 845t/s for prompt processing and 25t/s for generation.
For a while I even ran llama.cpp without a GPU (don't recommend it for diffusion) and with the same model (Qwen2.5 14B) I would get 11t/s for processing and 4t/s for generation. Acceptable for chats with short questions/instructions and answers.