by enginaar on 5/28/24, 4:16 AM with 0 comments
configuration:
- laptop: macbookpro m2 w/ 16 gb ram
- context length: 8192 (max)
- gpu layers: 33 (max)cpu threads: 8
response:
- time to first token: ~2s by the end of the conversation (4892 total token count)
- speed: ~7-8 tok/s
- memory usage: 13GB (system total)
- memory pressure: slightly over 50% (>90% when coding with containers)
now, these results are on par with ChatGPT 4. i compared a few general knowledge questions and coding problems, including some niche libraries, and it seem to do very well against ChatGTP 4 as well.
i want to compare your experience and wonder your opinions. is it possible to run copilot alike with a local server?