by MichaEiler on 5/21/21, 1:58 PM with 62 comments
by tele_ski on 5/22/21, 10:08 PM
by secondcoming on 5/22/21, 9:43 PM
Does threadpool::thread_loop() not have to check if the popped coroutine is suspended before attempting to resume it?
Are they really more efficient than normal callbacks when doing async?
by ptr on 5/23/21, 6:39 AM
by sannysanoff on 5/22/21, 11:44 PM
I moved to C++ coroutines from composable futures (CF) library that had few thread pool implementations if memory serves (and before CF all was written with callback hell). CF out of the box had extra CPU overhead because internal implementation was not efficient enough for my use, too much templates and copying when switching tasks. Also, spawned tasks had to reference shared pointers in user space (my app code), and unneeded frequent shared pointers copying added unneeded overhead.
I rewrote CF implementation later completely, so before coroutines my app used CF API extensively, but with stuff reimplemented, however shared pointers copying was something still far from perfection.
In addition to that I had some abstraction (like async/await/spawn/wait_all) on top of CF API, so transformation of application code was not painful. I had to rewrite synchronization primitives to use mutexes which came with cppcoro, and change my own internal scheduler to use some other new primitives.
I was afraid that storing local variable in coroutines frames (instead of stack frames) would affect performance, but for some reason it did not.
I also expected compilation time to increase, but for some reason it mostly did not. Probably template expansion takes all time, so coroutines code transformation fades in comparison.
Since then I stopped using C++ coroutines .
I dropped it for following reason:
1) unable to debug. Debugger does not have access to local variables, or I cannot enable it. Reference time point: around 9 months ago. Also, stack traces. They are missing, and of course, no help from tools. You have core file, go figure.
2) g++ support was missing in the early days when i employed coroutines (clang 9 was just released), but even clang 10 compiler produced wrong code, when using suspended lambda functions. I use lambdas a lot, and as suspended functions spoil the code base, lambdas inevitably become spoiled too. So, it was just occasional SIGSEGV or wrong values. There was a workaround to move 100% of the lambda body to a separated function and then call it from lambda, but it destroys all lambda beauty.
I moved to chinese libgo (can be found on github). I don't use syscall interceptors it offers, I just use cooperative scheduler it provides, along with synchronization primitives it offers. It's stackful cooperative multitasking which keeps all yummy things. And yes, it seemingly performs slightly better in my case. And yes, i had to patch it slightly.
TLDR: dropped c++ stackless coroutines in favor of stackful coroutines (cooperative stack switching), what a relief!
by sys_64738 on 5/22/21, 11:15 PM
by cletus on 5/22/21, 11:39 PM
For the last few years I've been doing Hack (Facebook's PHP fork) professionally and async-await as cooperative multitasking is pervasive. IMHO it's a really nice model. Generally speaking, I've come around to believing that if it ever comes down to you spawning your own thread, you're going to have a Bad Time.
Go's channels are another variant of this.
The central idea in both cases is that expressing dependencies this way is often sufficient and way easier to write than true multithreaded code.
C++20 coroutines don't seem to solve this problem as best as I can tell.
It actually seems like C++20 coroutines are closer to Python generators. Is this the case? Or is this a classic case of a camel is a horse designed by committee and the C++ standards committee tried to create primitives to handle these and possibly other use cases? I honestly don't know.