from Hacker News

C++20: Building a Thread-Pool with Coroutines

by MichaEiler on 5/21/21, 1:58 PM with 62 comments

by tele_ski on 5/22/21, 10:08 PM
Really nice write up! I'm excited to see more coroutine tutorials and guides come out, I think this C++20 feature has huge potential to make C++ easier to use over the next decade. I will also say I was a bit surprised to see libcoro linked in the article! I'm glad you found it useful but I need to give most of the credit to Lewis Baker's cppcoro as well -- I learned most of what I implemented into libcoro from his fantastic repository and then tuning the coroutine primitives to how I'd want to use the library for an HTTP web server. I just generally find there is no better way to truly learn a difficult concept than to roll your own version.
by secondcoming on 5/22/21, 9:43 PM
I'll admit I find coroutines difficult to grok. It seems to me that 'callback hell' is turning into 'coroutine hell'. The only plausible use-case I can see is enabling functionality similar to that of Python's `yield`.
Does threadpool::thread_loop() not have to check if the popped coroutine is suspended before attempting to resume it?
Are they really more efficient than normal callbacks when doing async?
by ptr on 5/23/21, 6:39 AM
Does anyone know when Coroutines are expected to show up in compilers without enabling experimental flags?
by sannysanoff on 5/22/21, 11:44 PM
I used similar thing, baked on top of cppcoro library (wonderful thing). My application is heavily threaded with hundreds of thousands of short-lived micro-tasks, it's interpreter of highly-parallel expressions, and values are large matrices containing expressions, so it's highly parallelizable.
I moved to C++ coroutines from composable futures (CF) library that had few thread pool implementations if memory serves (and before CF all was written with callback hell). CF out of the box had extra CPU overhead because internal implementation was not efficient enough for my use, too much templates and copying when switching tasks. Also, spawned tasks had to reference shared pointers in user space (my app code), and unneeded frequent shared pointers copying added unneeded overhead.
I rewrote CF implementation later completely, so before coroutines my app used CF API extensively, but with stuff reimplemented, however shared pointers copying was something still far from perfection.
In addition to that I had some abstraction (like async/await/spawn/wait_all) on top of CF API, so transformation of application code was not painful. I had to rewrite synchronization primitives to use mutexes which came with cppcoro, and change my own internal scheduler to use some other new primitives.
I was afraid that storing local variable in coroutines frames (instead of stack frames) would affect performance, but for some reason it did not.
I also expected compilation time to increase, but for some reason it mostly did not. Probably template expansion takes all time, so coroutines code transformation fades in comparison.
Since then I stopped using C++ coroutines .
I dropped it for following reason:
1) unable to debug. Debugger does not have access to local variables, or I cannot enable it. Reference time point: around 9 months ago. Also, stack traces. They are missing, and of course, no help from tools. You have core file, go figure.
2) g++ support was missing in the early days when i employed coroutines (clang 9 was just released), but even clang 10 compiler produced wrong code, when using suspended lambda functions. I use lambdas a lot, and as suspended functions spoil the code base, lambdas inevitably become spoiled too. So, it was just occasional SIGSEGV or wrong values. There was a workaround to move 100% of the lambda body to a separated function and then call it from lambda, but it destroys all lambda beauty.
I moved to chinese libgo (can be found on github). I don't use syscall interceptors it offers, I just use cooperative scheduler it provides, along with synchronization primitives it offers. It's stackful cooperative multitasking which keeps all yummy things. And yes, it seemingly performs slightly better in my case. And yes, i had to patch it slightly.
TLDR: dropped c++ stackless coroutines in favor of stackful coroutines (cooperative stack switching), what a relief!
by sys_64738 on 5/22/21, 11:15 PM
GO style co-routines and native JSON support would pretty much consign GO to history, IMO.
by cletus on 5/22/21, 11:39 PM
C++20 coroutines confuse me. Like it's not clear to me what problem they solve.
For the last few years I've been doing Hack (Facebook's PHP fork) professionally and async-await as cooperative multitasking is pervasive. IMHO it's a really nice model. Generally speaking, I've come around to believing that if it ever comes down to you spawning your own thread, you're going to have a Bad Time.
Go's channels are another variant of this.
The central idea in both cases is that expressing dependencies this way is often sufficient and way easier to write than true multithreaded code.
C++20 coroutines don't seem to solve this problem as best as I can tell.
It actually seems like C++20 coroutines are closer to Python generators. Is this the case? Or is this a classic case of a camel is a horse designed by committee and the C++ standards committee tried to create primitives to handle these and possibly other use cases? I honestly don't know.