from Hacker News

New x86 micro-op vulnerability breaks all known Spectre defenses

by DoomHotel on 4/30/21, 10:53 PM with 189 comments

by akersten on 5/1/21, 12:48 AM
I've been saying this from the start: the well of issues is infinitely deep as soon as you decide that multiple tenants running on the same physical hardware inferring something about another is a vulnerability. I assert, but cannot rigorously prove, that it is not possible to design a CPU such that execution of arbitrary instructions has no observable side-effects, especially if the CPU is speculating.
I don't know what that spells for cloud hosting providers - maybe they have to buy a lot more CPUs so every client can have their own, or commission a special "shared" SKU of CPU that doesn't have any speculative execution - but I know for me, if I have untrusted code running on my CPU, I've already lost. I could then care less about information leakage between threads.
We're going to wind up undoing the last 20 years of performance gains in the name of 'security', and it scares me.
by tester756 on 4/30/21, 11:09 PM
>"Intel's suggested defense against Spectre, which is called LFENCE, places sensitive code in a waiting area until the security checks are executed, and only then is the sensitive code allowed to execute," Venkat said. "But it turns out the walls of this waiting area have ears, which our attack exploits. We show how an attacker can smuggle secrets through the micro-op cache by using it as a covert channel."
>"In the case of the previous Spectre attacks, developers have come up with a relatively easy way to prevent any sort of attack without a major performance penalty" for computing, Moody said. "The difference with this attack is you take a much greater performance penalty than those previous attacks."
>"Patches that disable the micro-op cache or halt speculative execution on legacy hardware would effectively roll back critical performance innovations in most modern Intel and AMD processors, and this just isn't feasible," Ren, the lead student author, said.
by londons_explore on 4/30/21, 11:10 PM
The solution will be "do not share the micro-op cache between different address spaces".
Which for old hardware will translate to "flush the micro op cache every time the address space changes".
I would guess that can be done with a microcode update and that the performance hit wont be too massive.
by CalChris on 5/1/21, 4:29 AM
The paper:
I See Dead µops: Leaking Secrets via Intel/AMD Micro-Op Caches
http://www.cs.virginia.edu/venkat/papers/isca2021a.pdf
by amluto on 5/1/21, 7:12 PM
My response: https://lore.kernel.org/lkml/CALCETrXRvhqw0fibE6qom3sDJ+nOa_...
I don’t think any new mitigations are needed.
by floatingatoll on 4/30/21, 11:20 PM
I’d like to highlight this excellent post about x86 micro-ops “fusion” from three years ago, as it’s the reason I have any idea at all what micro-ops are:
https://news.ycombinator.com/item?id=16304415
by baybal2 on 5/1/21, 12:53 AM
You cannot realistically make a CPU invulnerable to performance analysis
And you don't need to.
There is really very few uses for real multi-system vs multi-process shared systems.
Take a look on that whole "cloud" thing.
All people I knew who worked in cloud hosting tell that most system are ridiculously overprovisioned, effectively nullifying any economic justification for a shared system
by totallyabstract on 4/30/21, 11:18 PM
There are separate micro op caches per core however they are typically shared among hyperthreads. I wonder if this could be another good reason for cloud vendors to move away from 1vCPU = 1 hyperthread to 1vCPU = 1 core for x86 when sharing machines (not that there weren't enough good reasons already).
by iam-TJ on 4/30/21, 11:45 PM
The U of V Engineering Faculty release is at
https://engineering.virginia.edu/news/2021/04/defenseless
by dataflow on 5/1/21, 12:37 AM
Question: How relevant are these for the average person? I know these matter for things like shared hosting, but I've yet to hear of an actual exploit in the wild that ordinary people have been attacked by, even with Spectre defenses turned off. Should normal people be worried about this?
by ForOldHack on 5/1/21, 8:21 PM
This had to come. The only fix will be to add a BIOS setting for Speculative Access or no speculative access. Gamers all turn it on, with a machine patched, that runs nothing but their game. Everyone else, like browsing the web, off. Look for a encoded binary java script exploit that will own any speculative access system. Its coming too, just like this paper would eventually come.
by Causality1 on 5/1/21, 6:09 AM
I expect this to be just like Spectre. The media sizes it as a tool to use fear to drive engagement, vendors partially cripple their hardware to guard against it, and literally nobody ever bothers trying to actually use it against innocent people.
by PopePompus on 4/30/21, 11:22 PM
I don't understand this at all; I didn't think the mico-op cache was visible to code written for the x86 ISA at all. Can anyone explain to an idiot (me) how something in micro-op cache can become visible to the outside world?
by anthk on 5/1/21, 1:53 AM
https://www.mail-archive.com/source-changes@openbsd.org/msg9...
OpenBSD disabled HT by default.
by spacemanmatt on 4/30/21, 11:20 PM
Is ARM so much better? I can migrate my AWS hosts.
by 1cvmask on 5/1/21, 2:49 AM
This quote from the article explains the danger quite well:
"Intel's suggested defense against Spectre, which is called LFENCE, places sensitive code in a waiting area until the security checks are executed, and only then is the sensitive code allowed to execute," Venkat said. "But it turns out the walls of this waiting area have ears, which our attack exploits. We show how an attacker can smuggle secrets through the micro-op cache by using it as a covert channel."
by SG2000 on 5/14/21, 8:05 PM
A close reading of the paper “I see dead uOps” would seem to indicate that Intel’s static thread partitioning of their micro-op cache would confer some inherent protection against uOp cache information leakage between threads - as compared to AMD’s dynamic thread partitioning scheme which could theoretically allow threads to spy on each other using the described techniques.
If true, wouldn’t this also imply that an Intel Skylake CPU mitigates against such attempted attacks by one user against another in a shared CPU/ISP/cloud environment, whereas an AMD CPU theoretically would not? If true, this would be a key point that the authors failed to mention in their concluding remarks.
Anyone else read it this way? Or am I missing something?
by failwhaleshark on 5/1/21, 7:54 AM
The act of loading code into memory, be it a hypervisor or a guest OS, should've been gated by sanitation and validation callbacks. Building all of these macro- and micro-op runtime defenses and mitigations in the processor and slowing down the OSes for every possible runtime edge-case are a waste of speed that can be avoided by establishing trust of code pages.
The morphing of data into code pages with JITs like JS should also be subject to similar restrictions.
by ineedasername on 5/1/21, 2:54 PM
undocumented features in Intel and AMD processors
Why is this at all a thing? Why would you ever leave something out there like that without documenting its existence?
by smasher164 on 5/1/21, 4:28 AM
Maybe EPIC [1] architectures need a revival. Rely on compilers to take advantage of explicit instruction-level parallelism, and keep the CPU dumb.
[1] https://en.wikipedia.org/wiki/Explicitly_parallel_instructio...
by juancn on 5/2/21, 6:05 AM
This may sound stupid, but the commonality in all these side channel attacks is that high precision time keeping is a non privileged operation.
Maybe it’s time to make clocks a privileged op as a mitigation. Even making execution time non predictable on untrusted code, such as JavaScript?
If precise time keeping is unavailable these become harder to do.
by vmception on 5/3/21, 11:07 AM
Out of curiosity, is Apple's M1 processor seemingly faster because it is actually more similar to a normal CPU progression but all the other common CPU's - x86 - had retroactive performance hits due to patching Spectre.
And therefore M1 seems so much more faster than it otherwise would?
by Woodi on 5/1/21, 5:55 AM
Simplest way around all of this is back to one-core MULTI-SOCKET systems for "civilian" computers like x86 is.
by druud62 on 5/1/21, 10:21 AM
The CPU needs to make the overheard signals look just like random noise. A cheap XOR-stream (compare 2FA like Google Authenticator, or the remote in your car keys) should cover that.