from Hacker News

12-factor Agents: Patterns of reliable LLM applications

by dhorthy on 4/15/25, 10:38 PM with 78 comments

I've been building AI agents for a while. After trying every framework out there and talking to many founders building with AI, I've noticed something interesting: most "AI Agents" that make it to production aren't actually that agentic. The best ones are mostly just well-engineered software with LLMs sprinkled in at key points.

So I set out to document what I've learned about building production-grade AI systems: https://github.com/humanlayer/12-factor-agents. It's a set of principles for building LLM-powered software that's reliable enough to put in the hands of production customers.

In the spirit of Heroku's 12 Factor Apps (https://12factor.net/), these principles focus on the engineering practices that make LLM applications more reliable, scalable, and maintainable. Even as models get exponentially more powerful, these core techniques will remain valuable.

I've seen many SaaS builders try to pivot towards AI by building greenfield new projects on agent frameworks, only to find that they couldn't get things past the 70-80% reliability bar with out-of-the-box tools. The ones that did succeed tended to take small, modular concepts from agent building, and incorporate them into their existing product, rather than starting from scratch.

The full guide goes into detail on each principle with examples and patterns to follow. I've seen these practices work well in production systems handling real user traffic.

I'm sharing this as a starting point—the field is moving quickly so these principles will evolve. I welcome your feedback and contributions to help figure out what "production grade" means for AI systems!

by mgdev on 4/17/25, 2:10 AM
These are great. I had my own list of takeaways [0] after doing this for a couple years, though I wouldn't go so far as calling mine factors.
Like you, biggest one I didn't include but would now is to own the lowest level planning loop. It's fine to have some dynamic planning, but you should own an OODA loop (observe, orient, decide, act) and have heuristics for determining if you're converging on a solution (e.g. scoring), or else breaking out (e.g. max loops).
I would also potentially bake in a workflow engine. Then, have your model build a workflow specification that runs on that engine (where workflow steps may call back to the model) instead of trying to keep an implicit workflow valid/progressing through multiple turns in the model.
[0]: https://mg.dev/lessons-learned-building-ai-agents/
by hhimanshu on 4/17/25, 7:18 AM
I am wondering how libraries like DSPY [0] fits in your factor-2 [1]
As I was reading, I saw mention of BAML > (the above example uses BAML to generate the prompt ...
Personally, in my experience hand-writing prompts for extracting structured information from unstructured data has never been easy. With DSPY, my experience has been quite good so far.
As you have used raw prompt from BAML, what do you think of using the raw prompts from DSPY [2]?
[0] https://dspy.ai/
[1] https://github.com/humanlayer/12-factor-agents/blob/main/con...
[2] https://dspy.ai/tutorials/observability/#using-inspect_histo...
by daxfohl on 4/16/25, 7:52 PM
This old obscure blog post about framework patterns has resonated with me throughout my career and I think it applies here too. LLMs are best used as "libraries" rather than "frameworks", for all the reasons described in the article and more, especially now while everything is in such flux. "Frameworks" are sexier and easier to sell though, and lead to lock-in and add-on services, so that's what gets promoted.
https://tomasp.net/blog/2015/library-frameworks/
by pancsta on 4/16/25, 9:09 AM
Very informative wiki, thank you, I will definitely use it. So Ive made my own "AI Agents framework" [0] based on actor model, state machines and aspect oriented programming (released just yesterday, no HN post yet) and I really like points 5 and 7:
```
    5: Unify execution state and business state
    8. Own your control flow
```
That is exactly what SecAI does, as it's a graph control flow library at it's core (multigraph instead of DAG) and LLM calls are embedded into graph's nodes. The flow is reinforced with negotiation, cancellation and stateful relations, which make it more "organic". Another thing often missed by other frameworks are dedicated devtools (dbg, repl, svg) - programming for failure, inspecting every step in detail, automatic data exporters (metrics, traces, logs, sql), and dead-simple integrations (bash). I've released the first tech demo [1] which showcases all the devtools using a reference implementation of deepresearch (ported from AtomicAgents). You may especially like the Send/Stop button, which is nothings else then "Factor 6. Launch/Pause/Resume with simple APIs". Oh and it's network transparent, so it can scale.
Feel free to reach out.
[0] https://github.com/pancsta/secai
[1] https://youtu.be/0VJzO1S-gV0
by daxfohl on 4/16/25, 7:45 PM
Another one: plan for cost at scale.
These things aren't cheap at scale, so whenever something might be handled by a deterministic component, try that first. Not only save on hallucinations and latency, but could make a huge difference in your bottom line.
by Manfred on 4/16/25, 7:33 PM
I believe the principles would be easier to follow if there is a consistent narrative through the factors, why which I mean using potentially real-world example for such a system.
by glial on 4/16/25, 10:39 PM
This is great -- and I have learned 80% the hard way. The other 20% will be valuable reading!
Personally I've had success with LangGraph + pydantic schemas. Curious to know what others have found useful.
by wfn on 4/17/25, 7:35 AM
This could not have come at a better time for me, thank you!
I've been tinkering with an idea for an audiovisual sandbox[1] (like vvvv[2] but much simpler of course, barebones).
Idea is to have a way to insert LM (or some simple locally run neural net) "nodes" which are given specific tasks and whose output is expected to be very constrained. Hence your example:
```
    "question -> answer: float"
```
Is very attractive here. Of course, some questions in my case would be quite abstract, but anyway. Also, multistage pipelines are also very interesting.
[1]: loose set of bulletpoints brainstorming the idea if curious, not organised: https://kfs.mkj.lt/#audiovisllm (click to expand description)
[2]: https://vvvv.org/
by darepublic on 4/17/25, 6:44 PM
I didn't really read this extensively but to me I would want to use as much deterministic code as possible and leverage the llm as little as possible. That to me is a better portend of predictable result, lower operational costs and is a signal that nobody could just quickly reproduce the same app. I would tend to roll my own tools and not use out of the box buzz word glue to integrate my llm with other systems. And if these conditions aren't met or aren't necessary I'd figure someone else could just vibe code the same solution in no time anyway. Keep control I say! Die on the hill of control! That's not to say I'm not impressed by LLMs.. quite the opposite
by ianbutler on 4/16/25, 8:09 PM
Let's go! Super happy to see this make it's way to HN front page.
by mettamage on 4/17/25, 11:29 AM
I've noticed some of these factors myself as well. I'd love to build more AI applications like this. Currently I'm a data analyst and they don't fully appreciate that I can build stuff like this as it is not a technology oriented company.
I'd love to work on stuff like this full-time. If anyone is interested in a chat, my email is on my profile (US/EU).
by DebtDeflation on 4/16/25, 5:32 PM
> most "AI Agents" that make it to production aren't actually that agentic. The best ones are mostly just well-engineered software with LLMs sprinkled in at key points
I've been saying that forever, and I think that anyone who actually implements AI in an enterprise context has come to the same conclusion. Using the Anthropic vernacular, AI "workflows" are the solution 90% of the time and AI "agents" maybe 10%. But everyone wants the shiny new object on their CV and the LLM vendors want to bias the market in that direction because running LLMs in a loop drives token consumption through the roof.
by dphuang2 on 4/21/25, 8:10 PM
I am curious about the exceptions. Is *anybody* using an agent framework with large production usage? I suspect no, but curious to see if anybody on HN knows otherwise.
by daxfohl on 4/16/25, 10:54 PM
Also, "Don't lay off half your engineering department and try to replace with LLMs"
by nickenbank on 4/17/25, 2:45 AM
I totally agree with this. Most, if not all, frameworks or building agents are a waste of time
by silasb on 4/16/25, 5:44 PM
While not specific to 12factor question. With any of these agents and solutions how is LLM Ops being handled? Also, what's the testing strategy and how do I make sure that I don't cause regression?
by hellovai on 4/17/25, 7:12 AM
really cool to see BAML on here :) 100% align on so much of what you've said here. its really about treating LLMs as functions.
by abhishek-iiit on 4/17/25, 3:48 AM
Really curious and excited to know the experience you faces at Heroku that led to the formulation of these 12 principles
by sps44 on 4/16/25, 7:59 PM
Very good and useful summary, thank you!
by AbhishekParmar on 4/17/25, 8:56 PM
would feel blessed if someone dropped something similar but for image generation agents. Been trying to build consistent image/video generation agents and god are they unreliable
by mertleee on 4/15/25, 11:15 PM
What are your favorite open source "frameworks" for agents?
by deadbabe on 4/16/25, 5:38 PM
With all this AI-agent bullshit out there these days, the most useful AI-agent I still use in daily life is the humble floor vacuum/mopping robot.
by musicale on 4/17/25, 1:23 AM
> reliable LLM applications
add that to the list of contradictory phrases (jumbo shrimp, etc.)