from Hacker News

Show HN: Collider – the platform for local LLM debug and inference at warp speed

by Ambix on 11/30/23, 8:32 PM with 1 comments

ChatGPT turns one today :)

What a day to launch the project I'm tinkering with for more than half a year. Welcome new LLM platform suited both for individual research and scaling AI services in production.

GitHub: https://github.com/gotzmann/collider

Some superpowers:

- Built with performance and scaling in mind thanks Golang and C++

- No more problems with Python dependencies and broken compatibility

- Most of modern CPUs are supported: any Intel/AMD x64 platofrms, server and Mac ARM64

- GPUs supported as well: Nvidia CUDA, Apple Metal, OpenCL cards

- Split really big models between a number of GPU (warp LLaMA 70B with 2x RTX 3090)

- Not bad performance on shy CPU machines, fast as hell inference on monsters with beefy GPUs

- Both regular FP16/FP32 models and their quantised versions are supported - 4-bit really rocks!

- Popular LLM architectures already there: LLaMA, Starcoder, Baichuan, Mistral, etc...

- Special bonus: proprietary Janus Sampling for code generation and non English languages

by smcleod on 12/1/23, 12:32 PM
Tip: Winter and “Fall” don’t really mean much when you’re talking about something global like software or media milestones - I’d suggest stating either Q1/Q2 etc… or early / late.