by luyu_wu on 11/30/24, 8:47 PM with 141 comments
by shantara on 11/30/24, 11:06 PM
by londons_explore on 11/30/24, 10:16 PM
If so, it might be a classic case of "Team of engineers spent months working on new shiny feature which turned out to not actually have any benefit, but was shipped anyway, possibly so someone could save face".
I see this in software teams when someone suggests it's time to rewrite the codebase to get rid of legacy bloat and increase performance. Yet, when the project is done, there are more lines of code and performance is worse.
In both cases, the project shouldn't have shipped.
by Loic on 12/1/24, 8:53 AM
> Perhaps the best way of looking at Zen 4's loop buffer is that it signals the company has engineering bandwidth to go try things. Maybe it didn't go anywhere this time. But letting engineers experiment with a low risk, low impact feature is a great way to build confidence. I look forward to seeing more of that confidence in the future.
by eqvinox on 11/30/24, 10:07 PM
With more detailed power measurements, it could be possible to determine if this is thermal/power budget related? It does sound like the feature was intended to conserve power…
by eek2121 on 11/30/24, 11:00 PM
That being said, some workloads will see a small regression, however AMD has made some small performance improvements since launch.
They should have just made it a BIOS option for Zen 4. The fact they do not appear to have done so does indicate the possibility of a bug or security issue.
by rasz on 11/30/24, 10:40 PM
by fulafel on 12/1/24, 7:54 AM
I guess this could also be used as an optimization target at least on devices that are more long lived designs (eg consoles).
by Neywiny on 12/1/24, 9:47 PM
by syntaxing on 11/30/24, 9:33 PM
by londons_explore on 11/30/24, 10:21 PM
Energy used per instruction is almost certainly the metric that should be considered to see the benefits of this loop buffer, not energy used per second (power, watts).
by CalChris on 12/1/24, 3:07 AM
by mleonhard on 12/1/24, 5:07 AM
by ksec on 12/1/24, 2:32 AM
( Idly waiting for x86 to try and compete with ARM on efficiency. Unfortunately I dont see Zen 6 or Panther Lake getting close. )
by Pannoniae on 11/30/24, 10:14 PM
"Both the fetch+decode and op cache pipelines can be active at the same time, and both feed into the in-order micro-op queue. Zen 4 could use its micro-op queue as a loop buffer, but Zen 5 does not. I asked why the loop buffer was gone in Zen 5 in side conversations. They quickly pointed out that the loop buffer wasn’t deleted. Rather, Zen 5’s frontend was a new design and the loop buffer never got added back. As to why, they said the loop buffer was primarily a power optimization. It could help IPC in some cases, but the primary goal was to let Zen 4 shut off much of the frontend in small loops. Adding any feature has an engineering cost, which has to be balanced against potential benefits. Just as with having dual decode clusters service a single thread, whether the loop buffer was worth engineer time was apparently “no”."