from Hacker News

Magic-trace: Diagnose tricky performance issues with Intel Processor Trace

by trishume on 1/27/22, 10:22 PM with 18 comments

  • by haberman on 1/28/22, 5:44 AM

    The visualization tools presented look really nice, but they seem to present program execution as sequential and linear, which is a model that seems like it will really break down at these time scales (10s of cycles).

    Modern processors will look hundreds of instructions into the future and try to start executing them as soon as possible. Branches are predicted far in advance of when they can actually be evaluated. Many instructions can be executing simultaneously. A clean tidy flame graph showing 1-3ns slices (~5 cycles) cannot help but be a vast simplification of what the CPU is really doing.

    The linked page about Processor Trace says this:

    > instruction data (control flow) is perfectly accurate but timing information is less accurate

    The article mentions using magic-trace to detect changes in inlining decisions made by the compiler. This is a case where it will shine, since PT can perfectly capture the control flow, and it doesn't necessarily rely on having perfect timestamps for everything.

  • by temikus on 1/27/22, 11:02 PM

    Oh man, this is major. I would’ve loved to have something like that 10 years ago when CPU was a bit more precious. Still very useful today, just not to the same extent.
  • by signa11 on 1/28/22, 11:16 AM

    tracy (https://github.com/wolfpld/tracy), mentioned in this article as well, for some reason is criminally underused, unknown etc. by wider community.
  • by carlmr on 1/28/22, 9:10 AM

    The best I've found until now is gperftools (In contrast to perf you get good results even with -O3 optimization and heavily inlined code). This seems to be much more accurate, but I'm not sure we can handle that amount of data because we usually take longer traces.
  • by gnufx on 1/28/22, 3:42 PM

    This says post-skylake, but both my SKX workstation and i5-6200U laptop have 1 in /sys/bus/event_source/devices/intel_pt/caps/psb_cyc which seems to be the condition, though I haven't tried to use it.
  • by silverlake on 1/28/22, 1:50 PM

    Doesn’t VTune support processor trace? Some VMs support PT. And AWS has support also.