from Hacker News

Show HN: Mixlayer – code and deploy LLM prompts using JavaScript

by zackangelo on 10/11/24, 7:30 PM with 0 comments

Hi HN,

I'm excited to introduce Mixlayer, a platform I've been working on over the past 6 months that allows you to code and deploy prompts using simple JavaScript functions.

Mixlayer recreates the developer experience of using LLMs locally without having to do all of the local setup yourself. I originally came up with this idea when using LLMs on my MacBook and thought it’d be cool to build a product that makes it easy for everyone. It compiles your code to a WASM binary and runs it alongside a custom inference stack I wrote in Rust. When you integrate LLMs in this way, your code and the model share a common context window that stays open for the duration of your program’s execution. I find many common prompting patterns become much simpler when applied in this way versus using a generic OpenAI-style inference API.

Some cool features: * Tool calling: LLM has direct access to your code, just pass objects containing functions and their descriptions * Hidden tokens: Mark certain tokens as "hidden" to recreate long-running reasoning and iterative refinement operations like gpt-4o. * Output constraints: Use regular expressions to constrain the generated text * Instant deployment: we can host your prompts behind an API that we scale for you

Tech details: * Built on Huggingface's candle crate * Supports continuous batching and multi-GPU for larger models * WASM allows me to support for more prompt languages easily in the future

Models: * Free tier: Llama 3.1 8b (on NVIDIA L4s, shared resources) * Paid tier: Faster models on A100s (soon H100 SXMs) * Llama 3.1 70b (currently gated due to resource constraints, requires 8xH100 SXMs)

Future: * Vision models * More elaborate decoding methods (e.g. beam) * Multiple model prompts (routing/spawning/forking/joining)

I’m happy to discuss any of the internal/technical details around how I built this.

Thank you for your time and feedback!