from Hacker News

Cost of a thread in C++ under Linux

by eaguyhn on 3/1/20, 12:41 PM with 133 comments

  • by drmeister on 3/1/20, 2:44 PM

    Threads are very expensive if you start throwing C++ exceptions within them in parallel. You see the overall time to join the threads increases with each thread you add. There is a mutex in the unwinding code and as the threads grab the mutex they invalidate each other's cache line. I wrote a demo to illustrate the problem https://github.com/clasp-developers/ctak

    MacOS doesn't have this problem but Linux and FreeBSD do.

  • by boulos on 3/1/20, 5:05 PM

    I find Eli Bendersky’s writeup [1] more useful as it actually goes closer to the details. For readers less familiar, it also makes it more clear what the time spent will depend on (how much state there is to copy). Eli’s post is actually a sub-post of his “cost of context switching” post [2] which is more often applicable (and helps answer all the questions below about threadpools).

    [1] https://eli.thegreenplace.net/2018/launching-linux-threads-a...

    [2] https://eli.thegreenplace.net/2018/measuring-context-switchi...

  • by bluetomcat on 3/1/20, 1:47 PM

    For CPU-bound tasks, it is best to pre-create a number of threads whose count roughly corresponds to the number logical execution cores. Every thread is then a worker with a main loop and not just spawn on-demand. Pin their affinity to a specific core and you are as close as possible to the “perfect” arrangement with minimized context switches and core-local cache data being there most of the time.
  • by shin_lao on 3/1/20, 1:19 PM

    Great reminder.

    Even if you pre-create a thread (thread pool), when the task is small enough (less than 1,000 cycles), it is less expensive to do it in place (for example, with fibers), because of the cost of context switching.

  • by hrgiger on 3/1/20, 6:25 PM

    Using taskset pinning my numbers improves:

    $taskset --cpu-list 8 ./costofthread avg: 11000~

    $taskset --cpu-list 8,11 ./costofthread avg: 33000~

    $./costofthread avg: 60000~

  • by saagarjha on 3/1/20, 1:01 PM

    Is a std::thread a thin wrapper around pthreads on Linux?
  • by known on 3/1/20, 2:37 PM

    On any architecture, you may need to reduce the amount of stack space allocated for each thread to avoid running out of virtual memory

    http://www.kegel.com/c10k.html#limits.threads

  • by isatty on 3/1/20, 1:02 PM

    Why is there such a big difference in timing between Skylake and Rome? Something compiler specific? The number of steps required to create a thread should be identical.

    I’ll also be interested to see the same benchmark but using pthread_create directly.

  • by maayank on 3/1/20, 2:12 PM

    Why the relative high cost of threads on ARM? If anything, I'd imagine it is more geared towards "massive parallel" scenarios (i.e. dozens of cores).
  • by Koshkin on 3/1/20, 2:22 PM

    Intel’s excellent TBB library is the answer to all your worries about threads in C++. (IMHO it should be made part of the standard library.)
  • by signa11 on 3/1/20, 1:39 PM

    imho, if _cost_ of thread creation is where the bottleneck is, then more likely than not, you are doing things wrong.
  • by brainscdf on 3/1/20, 1:59 PM

    My personal best practice is to always create a thread pool on program startup and distribute your tasks among the thread pool. I use the same best practice in all other languages too. Is this best practice sound or can it lead to problems in some corner cases?