by Mockapapella on 6/22/24, 5:17 PM with 2 comments
In short, it's a template for deploying AI inference APIs with FastAPI.
In long, it uses Docker to encapsulate (almost) the entire development and deployment process. The repo includes:
1. A way to download and cache models straight from huggingface
2. A way to expose those cached models via a FastAPI server endpoint
3. A docker configuration that exposes a `debugpy` port so that you can debug your application within a container
4. A way to run tests
5. A way to debug tests (using `debugpy` as mentioned above)
6. A way to run pre-commits on staged files
7. A way to manually run pre-commits on all code in your repository
8. CI steps via GitHub Actions
9. Full Observability with a Grafana Dashboard
10. Metrics via Prometheus
11. Tracing via Tempo
12. Logs via Loki
13. GPU monitoring via DCGM
14. CD via GitHub actions and a `post-receive` hook on the server
15. Alerts that email you when something goes wrong in production
I say "almost" because you still need a way to attach to the debugger port from outside the docker container and there's some one-time configurations that need to be set up manually, but not anything beyond that.
I'd love to hear any feedback you might have :)
by JojoFatsani on 6/22/24, 5:24 PM