from Hacker News

Show HN: Batteries Included AI Deployment

by Mockapapella on 6/22/24, 5:17 PM with 2 comments

Hi HN, I've spent the better part of the last year deploying AI inference systems for both personal and work projects. I've noticed a ton of "gotchas" and footguns throughout the process that make doing it right a pain, so I built "Batteries Included AI Deployment". Not a creative name but it does what it says on the tin.

In short, it's a template for deploying AI inference APIs with FastAPI.

In long, it uses Docker to encapsulate (almost) the entire development and deployment process. The repo includes:

1. A way to download and cache models straight from huggingface

2. A way to expose those cached models via a FastAPI server endpoint

3. A docker configuration that exposes a `debugpy` port so that you can debug your application within a container

4. A way to run tests

5. A way to debug tests (using `debugpy` as mentioned above)

6. A way to run pre-commits on staged files

7. A way to manually run pre-commits on all code in your repository

8. CI steps via GitHub Actions

9. Full Observability with a Grafana Dashboard

10. Metrics via Prometheus

11. Tracing via Tempo

12. Logs via Loki

13. GPU monitoring via DCGM

14. CD via GitHub actions and a `post-receive` hook on the server

15. Alerts that email you when something goes wrong in production

I say "almost" because you still need a way to attach to the debugger port from outside the docker container and there's some one-time configurations that need to be set up manually, but not anything beyond that.

I'd love to hear any feedback you might have :)

by JojoFatsani on 6/22/24, 5:24 PM
The DevOps around LLM/AI is a disaster right now.