from Hacker News

Universal LLM Deployment Engine with ML Compilation

by ruihangl on 6/7/24, 7:06 PM with 7 comments

  • by zhye on 6/7/24, 10:32 PM

    Glad to see MLC is becoming more mature :) I can imagine the unified engine could help build agents on multiple devices.

    Any ideas on how those edge and cloud models collaborate on compound tasks (e.g. the compound ai systems: https://bair.berkeley.edu/blog/2024/02/18/compound-ai-system...)

  • by ruihangl on 6/7/24, 7:06 PM

    A unified efficient open-source LLM deployment engine for both cloud server and local use cases.

    It comes with full OpenAI-compatible API that runs directly with Python, iOS, Android, browsers. Supporting deploying latest large language models such as Qwen2, Phi3, and more.

  • by yongwww on 6/7/24, 8:04 PM

    The MLCEngine presents an approach to universal LLM deployment, glad to know it works for both local servers and cloud devices with competitive performance. Looking forward to exploring it further!
  • by neetnestor on 6/7/24, 8:19 PM

    Looks cool. I'm looking forward to trying building some interesting apps using the SDKs.
  • by CharlieRuan on 6/7/24, 7:37 PM

    From first-hand experience, the all-in-one framework really helps reduce engineering effort!
  • by cyx6 on 6/7/24, 7:43 PM

    AI ALL IN ONE! Super universal and performant!
  • by crowwork on 6/7/24, 7:18 PM

    runs on qwen2 on iphone with 26 tok/sec and a OpenAI style swift API