from Hacker News

AMD Disables Zen 4's Loop Buffer

by luyu_wu on 11/30/24, 8:47 PM with 141 comments

by shantara on 11/30/24, 11:06 PM
This is a wild guess, but could this feature be disabled in an attempt at preventing some publicly undisclosed hardware vulnerability?
by londons_explore on 11/30/24, 10:16 PM
The article seems to suggest that the loop buffer provides no performance benefit and no power benefit.
If so, it might be a classic case of "Team of engineers spent months working on new shiny feature which turned out to not actually have any benefit, but was shipped anyway, possibly so someone could save face".
I see this in software teams when someone suggests it's time to rewrite the codebase to get rid of legacy bloat and increase performance. Yet, when the project is done, there are more lines of code and performance is worse.
In both cases, the project shouldn't have shipped.
by Loic on 12/1/24, 8:53 AM
For me the most interesting paragraph in the article is:
> Perhaps the best way of looking at Zen 4's loop buffer is that it signals the company has engineering bandwidth to go try things. Maybe it didn't go anywhere this time. But letting engineers experiment with a low risk, low impact feature is a great way to build confidence. I look forward to seeing more of that confidence in the future.
by eqvinox on 11/30/24, 10:07 PM
> Strangely, the game sees a 5% performance loss with the loop buffer disabled when pinned to the non-VCache die. I have no explanation for this, […]
With more detailed power measurements, it could be possible to determine if this is thermal/power budget related? It does sound like the feature was intended to conserve power…
by eek2121 on 11/30/24, 11:00 PM
It sounds to me like it was too small to make any real difference except in very specific scenarios and a larger one would have been too expensive to implement compared to the benefit.
That being said, some workloads will see a small regression, however AMD has made some small performance improvements since launch.
They should have just made it a BIOS option for Zen 4. The fact they do not appear to have done so does indicate the possibility of a bug or security issue.
by rasz on 11/30/24, 10:40 PM
Anecdotally one of very few differences between 1979 68000 and 1982 68010 was addition of "loop mode", a 6 byte Loop Buffer :)
by fulafel on 12/1/24, 7:54 AM
Interesting that in the Cortex-A15 this is a "key design feature". Are there any numbers about its effect other chips?
I guess this could also be used as an optimization target at least on devices that are more long lived designs (eg consoles).
by Neywiny on 12/1/24, 9:47 PM
I have a 7950x3d. It's my upgrade from.... Skylake's 6700k. I guess I'm subconsciously drawn to chips with hardware loop buffers disabled by software.
by syntaxing on 11/30/24, 9:33 PM
Interesting read, one thing I don’t understand is how much space does loop buffer take on the die? I’m curious with it removed, on future chips could you use the space for something more useful like a bigger L2 cache?
by londons_explore on 11/30/24, 10:21 PM
In the "power" section, it seems the analysis doesn't divide by the number of instructions executed per second.
Energy used per instruction is almost certainly the metric that should be considered to see the benefits of this loop buffer, not energy used per second (power, watts).
by CalChris on 12/1/24, 3:07 AM
If it saved power wouldn’t that lead to less thermal throttling and thus improved performance? That power had to matter in the first place or it wouldn’t have been worth it in the first place.
by mleonhard on 12/1/24, 5:07 AM
It looks like they disabled a feature flag. I didn't expect to see such things in CPUs.
by ksec on 12/1/24, 2:32 AM
Wondering if Loop Buffer is still there with Zen 5?
( Idly waiting for x86 to try and compete with ARM on efficiency. Unfortunately I dont see Zen 6 or Panther Lake getting close. )
by Pannoniae on 11/30/24, 10:14 PM
From another article:
"Both the fetch+decode and op cache pipelines can be active at the same time, and both feed into the in-order micro-op queue. Zen 4 could use its micro-op queue as a loop buffer, but Zen 5 does not. I asked why the loop buffer was gone in Zen 5 in side conversations. They quickly pointed out that the loop buffer wasn’t deleted. Rather, Zen 5’s frontend was a new design and the loop buffer never got added back. As to why, they said the loop buffer was primarily a power optimization. It could help IPC in some cases, but the primary goal was to let Zen 4 shut off much of the frontend in small loops. Adding any feature has an engineering cost, which has to be balanced against potential benefits. Just as with having dual decode clusters service a single thread, whether the loop buffer was worth engineer time was apparently “no”."