from Hacker News

Ask HN: What's your serverless stack for AI/LLM apps in production?

by fazlerocks on 1/10/25, 8:22 PM with 3 comments

I've been building AI applications using Next.js, GPT, and Langchain. As I'm approaching production scale, I'm curious how others are handling deployment infrastructure.

Current stack: - Next.js on Vercel - Serverless functions for AI/LLM endpoints - Pinecone for vector storage

Questions for those running AI in production:

1. What's your serverless infrastructure choice? (Vercel/Cloud Run/Lambda)

2. How are you handling state management for long-running agent tasks?

3. What's your approach to cost optimization with LLM API calls?

4. Are you self-hosting any components?

5. How are you handling vector store scaling?

Particularly interested in hearing from teams who've scaled beyond prototype stage. Have you hit any unexpected limitations with serverless for AI workloads?

by lunarcave on 1/10/25, 8:54 PM
I have a hosted code-first agent builder platform in production, so I respond these question a lot from our customers.
1. Probably the best is fly.io IMHO. It has a nice balance between running ephemeral containers that can support long running tasks, and quickly booting up to respond to a tool call. [1]
2. If your task is truly long running, (I'm thinking several minutes), probably wise to put trigger [2] or temporal [3] under it.
3. A mix of prompt caching, context shedding, progressive context enrichment [4].
4. I'm building a platform that can be self-hosted to do a few of the above, so I can't speak to this. But most of my customers do not.
5. To start with, a simple postgres table and pgvector is all you need. But I've recently been delighted with the DX of Upstash vector [5]. They handle the embeddings for you and give you a text-in, text-out experience. If you want more control, and savings on a higher scale, have heard good things about marqo.ai [6].
Happy to talk more about this at length. (E-mail in the profile)
[1] https://fly.io/docs/reference/architecture/
[2] trigger.dev
[3] temporal.io
[4] https://www.inferable.ai/blog/posts/llm-progressive-context-...
[5] https://upstash.com/docs/vector/overall/getstarted
[6] https://www.marqo.ai/