from
Hacker News
Top
New
Ask HN: How do we reduce latency for AI applications?
by
MrAR
on 11/21/24, 9:53 AM with 1 comments
I am connecting 3-4 AI models serially, such that output of one model is fed as input of another model. I am getting a lot of latency, even after using GPUs, how to reduce it?
by
compressedgas
on 11/21/24, 10:19 AM
Use smaller models.