from Hacker News

Virtual Threads: New Foundations for High-Scale Java Applications

by axelfontaine on 9/29/22, 6:03 PM with 174 comments

by ccooffee on 9/29/22, 7:30 PM
This is a great writeup, and reignites my interest in Java. (I've long considered "Java Concurrency in Practice" to be the _best_ Java book ever written.)
I haven't been able to figure out how the "unmount" of a virtual thread works. As stated in this article:
> Nearly all blocking points in the JDK have been adapted so that when encountering a blocking operation on a virtual thread, the virtual thread is unmounted from its carrier instead of blocking.
How would I implement this logic in my own libraries? The underlying JEP 425[0] doesn't seem to list any explicit APIs for that, but it does give other details not in the OP writeup.
[0] https://openjdk.org/jeps/425
by geodel on 9/29/22, 7:36 PM
I think it is really important development in Java space. One reason I plan to use it soon is because it does not bring in complex programing model of "reactive world" and hence dependency on tons of reactive libraries.
I tried moving plain old Tomcat based service to scalable netty based reactive stack but it turned out to be too much work and an alien programing model. With Loom/Virtual thread, the only thing I will be looking for server supporting Virtual threads natively. Helidon Nima would fit the bill here as all other frameworks/app servers have so far just slapping virtual threads on their thread pool based system. And unsurprisingly it is not leading to great perf expected from Virtual thread based system.
by anonymousDan on 9/29/22, 10:39 PM
Copying virtual stacks on a context switch sounds kind of expensive. Any performance numbers available? Maybe for very deep stacks there are optimizations whereby you only copy in deeper frames lazily under the assumption they won't be used yet? Also, what is the story with preemption - if a virtual thread spins in an infinite loop, will it effectively hog the carrier thread or can it be descheduled? Finally, I would be really interested to see the impact on debugability. I did some related work where we were trying to get the JVM to run on top of a library operating system and a libc that contained a user level threading library. Debugging anything concurrency related became a complete nightmare since all the gdb tooling only really understood the underlying carrier threads.
Having said all that, this sounds super cool and I think is 100% the way to go for Java. Would be interesting to revisit the implementation of something like Akka in light of this.
by thom on 9/29/22, 8:16 PM
So right now it seems like you can replace the thread pool Clojure uses for futures etc with virtual threads and go ham. You could even write an alternative go macro to replace the bits of core.async where you’re not supposed to block. Feels like Clojure could be poised to benefit the most here, and what a delight it is to have such a language on a modern runtime that still gets shiny new features!
by samsquire on 9/29/22, 7:27 PM
This is good.
I implemented a userspace 1:M:N timeslicing thread, kernel thread to lightweight thread multiplexer in Java, Rust and C.
I preempt hot for and while loops by setting the looping variable to the limit from the kernel multiplexing thread.
It means threads cannot have resource starvation.
https://github.com/samsquire/preemptible-thread
The design is simple. But having native support as in Loom is really useful.
by Blackthorn on 9/29/22, 8:58 PM
So happy this is finally coming out! After years of using the library that inspired this (fibers), I'm so stoked this is coming to the wide outside world of Java. There's just no comparison in how understandable and easy to program and debug this is compared to callback and event based programming.
by gigatexal on 9/29/22, 8:53 PM
Reading through the source code examples has me rethinking my dislike for Java. It sure seems far less verbose and kinda nice actually.
by smasher164 on 9/29/22, 9:14 PM
After all the hoopla surrounding concurrency models, it seems that languages are conceding that green threads are more ergonomic to work with. Go and Java have it, and now .NET is even experimenting with it.
How long until OS vendors introduce abstractions to make this easier? Why aren't there OS-native green threads, or at the very least user-space scheduling affordances for runtimes that want to implement them without overhead in calling blocking code?
by rr808 on 9/30/22, 2:13 AM
I spent the last 5 years learning reactive programming. I hate it. I'm looking forward to going back to something solid.
by mikece on 9/29/22, 8:11 PM
How does this compare to Processes in Elixir/Erlang -- is Java now as lightweight and performant?
by jeffbee on 9/29/22, 7:49 PM
" Operating systems typically allocate thread stacks as monolithic blocks of memory at thread creation time that cannot be resized later. This means that threads carry with them megabyte-scale chunks of memory to manage the native and Java call stacks."
This extremely common misconception is not true of Linux or Windows. Both Windows and Linux have demand-paged thread stacks whose real size ("committed memory" in Windows) is minimal initially and grows when needed.
by lenkite on 9/29/22, 9:20 PM
Really hope this makes it to Android. (probably need to wait for a decade or two though)
by polskibus on 9/30/22, 10:56 AM
Is this like async/await machinery in .NET?
by stefs on 9/29/22, 11:37 PM
for the record, i really don't know much about threads, so the following questions are probably kinda stupid.
first question: so, as the article states, the ONLY performance upside of virtual threads (versus os threads) is the number of inactive threads, thanks due to lower per-thread memory overhead.
for some reason i was expecting to read something about context switching cost too.
as far as i understand, virtual thread context switches are most likely between a lot cheaper and roughly as expensive than their carrier thread context switches, depending on how much memory has to be copied around and how to find the next thread to execute.
the problem here is that virtual context switches may be cheaper, but have to be executed in addition to the os thread context switches, so the overall efficiency is actually lower because more work is spent scheduling (os vs. os+virtual).
to minimize this it might be possible for privileged applications to disable os thread context switching for the carrier threads as long as there are active virtual threads. that way, the context switching and scheduling overhead is reduced from "os vs. os+virt" to "os vs. virt". i.e. as soon as there are active virtual threads the carrier thread is excluded for os scheduler until there aren't any active virtual threads anymore (or, alternatively, the virtual thread pool is empty).
is this a thing? does this make sense? would it be worth it? do operating systems even support "manual" (i.e. by the app) thread scheduling hints? or are the carrier threads only rarely taken out of schedule because they're not really put to sleep as long as there are active virtual threads anyway, making this a non-issue?
second question: as far as i understand blocking os threads, the scheduler stores which thread is waiting on which io resource and the appropriate thread gets woken up once a waited-on io resouce is available. this is not much of a problem with with a few hundred or thousand os threads, but now with virtual threads, the io resource must now be linked to the os thread for the virtual thread executor's scheduler by the os and then to the virtual thread waiting on the resource by the virtual thread scheduler. so for example if there are 100.000 inactive virtual threads waiting for a network response and one arrives, the os scheduler has to match it to an os thread first (the one the vt scheduler runs on) and then the vt scheduler has to match it to one of the virtual threads. i.e. two lookups in hashtables with 100.000 entries each (one io to os threads, the other io to vt). is this how it works or do i misunderstand this? as async models have the same issue but work fine i guess this isn't really a problem in practice. also, as far as i understand, the os thread woken up is given a kind of resouce id it's been woken up for, instead of "well, you went to sleep for a certain resource id so it's obvious which one you've been woken up for" in blocking IO).
by bheadmaster on 9/30/22, 4:46 AM
It's funny how it took Java almost a decade to finally implement goroutines.
by mgraczyk on 9/29/22, 8:35 PM
The section "What about async/await?", which compares these virtual threads to async/await is very weak. After reading this article, I came away with the impression that this is a dramatically worse way to solve this problem than async/await. The only benefit I see is that this will be simpler to use for the (increasingly rare) programmers who are not used to async programming.
The first objection in the article is that with async/await you to may forget to use an async operation and could instead use a synchronous operation. This is not a real problem. Languages like JavaScript do not have any synchronous operations so you can't use them by mistake. Languages like python and C# solve this with simple lint rules that tell you if you make this mistake.
The second objection is that you have to reimplement all library functions to support await. This is a bad objection because you also have to do this for virtual threads. Based on how long it took to add virtual threada to Java vs adding async/await to other languages, it seems like virtual threads were much more complicated to implement.
The programming model here sounds analogous to using gevent with python vs python async/await. My opinion is that the gevent approach will die out completely as async/await becomes better supported and programmers become more familiar.
EDIT: Looking more at the "Related Work" section at the bottom. I think I understand the problem here. The "Structured Concurrency" examples are unergonomical versions of async/await. I'm not sure what I'm missing but this seems like a strictly worse way to write structured concurrent code.
Java example:
```
    Response handle() throws ExecutionException, InterruptedException {
        try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
            Future<String>  user  = scope.fork(() -> findUser());
            Future<Integer> order = scope.fork(() -> fetchOrder());

            scope.join();           // Join both forks
            scope.throwIfFailed();  // ... and propagate errors

            // Here, both forks have succeeded, so compose their results
            return new Response(user.resultNow(), order.resultNow());
        }
    }
```
Python equivalent
```
    async def handle() -> Response:
      # scope is implicit, throwing on failure is implicit.
      user, order = await asyncio.gather(findUser(), findOrder())

      return Response(user, order)
```
You could probably implement a similar abstraction in Java, but you would need to pass around and manage the the scope object, which seems cumbersome.
by jsyolo on 9/29/22, 11:20 PM
Keeping familiarity aside, why would one use Java/JVM instead of nodejs for the server of a web app? I need to call SOAP services.