by 4k on 2/1/25, 6:59 AM with 1 comments
Conditions:
1. You want a solution for LLM inference and LLM inference only. You don't care about any other general or special purpose computing
2. The solution can use any kind of hardware you want
3. Your only goal is to maximize the (inference speed) X (model size) for 70b+ models
4. You're allowed to build this with tech mostly likely available by end of 2025.
How do you do it?
by sitkack on 2/1/25, 8:39 AM