by physicsgraph on 12/18/23, 2:57 AM with 24 comments
by antirez on 12/18/23, 8:18 AM
Also there are better models than the one suggested. Mistral for 7B parameters. Yi if you want to go larger and happen to have 32Gb of memory. Mixtral MoE is the best but requires too much memory right now for most users.
by upon_drumhead on 12/18/23, 6:17 AM
> TinyChatEngine provides an off-line open-source large language model (LLM) that has been reduced in size.
But then they download the models from huggingface. I don’t understand how these are smaller? Or do they modify them locally?
by rodnim on 12/18/23, 10:26 AM
by aravindgp on 12/18/23, 8:00 AM
by collyw on 12/18/23, 12:08 PM
by dkjaudyeqooe on 12/18/23, 11:04 AM
Performance on my relatively old i5-8600 CPU running 6 cores at 3.10GHz with 32GB of memory gives me about 150-250 ms per token on the default model, which is perfectly usable.