by aarkay on 12/11/20, 2:25 AM with 81 comments
1/ What specific design choices make the M1 so much better than the equivalent intel chips ? It looks like there are a bunch of changes -- 5nm silicon, single memory pool, a combination of high efficiency and high power cores. Can someone explain to me how each of these changes helps apple achieve the gains it did ? Are these breakthrough architectural changes in chip design or have these been discussed before?
2/ How did apple manage to create this when intel has been making chips for decades and that is the singular focus of the company ? Is it the fact that Mac OS could be better optimized for the M1 chips ? Given the design changes that doesn't seem like the only reason.
by kortex on 12/11/20, 2:56 AM
My 10,000' view understanding:
- Small feature size. M1 is a 5nm process. Intel is struggling to catch up to TSMC's 7nm process
- more efficient transistors. 7nm uses finFet. M1 probably uses GAAFET, which means you can cram more active gate volume in less chip footprint. This means less heat and more speed
- layout. M1 is optimized for Apple's use case. General purpose chips do more things, so they need more real estate
- specialization. M1 offloads oodles of compute to hardware accelerators. No idea how their code interfaces with it, but I know lots of the demos involve easy-to-accellerate tasks like codecs and neural networks
- M1 has tons of cache, IIRC, something like 3x the typical amount
- some fancy stuff that allows them to really optimize the reorder buffer, decoder, and branch prediction, which also leverages all that cache
by oneplane on 12/11/20, 3:25 AM
Intel can probably make faster stuff than they currently do but then their customers (PC manufacturers for instance) would have to modify all their stuff as well and they don't want to, or at least, won't want the same thing.
by rayiner on 12/11/20, 3:09 AM
by titzer on 12/11/20, 3:11 AM
For raw single-thread performance:
1. ARM64 is a fixed-width instruction set, so their frontend can decode more instructions in parallel.
2. They got one honking monster of an out-of-order execution engine. (630 entries), which feed:
3. 16 execution ports.
by titzer on 12/11/20, 4:03 AM
But in the end all of that went bye bye when Intel lost the process edge and therefore lost the transistor count advantage. Now with the 5nm process others can field gobs of transistors and they don't have the x86 frontend millstone around their necks. So ARM64 unlocked a lot of frontend bandwidth to feed even more execution ports. And with the transistor budget so high, 8 massive cores could be put on die.
Now, people have argued for decades that the instruction density of CISC is a major advantage, because that density would make better use of I-cache and bandwidth. But it looks like decode bandwidth is the thing. That, and RISC usually requires aligned instructions, which means that branch density cannot be too high, and branch prediction data structures are simpler and more effective. (Intel still has weird slowdowns if you have too many branches in a cache line).
It seems frontend effects are real.
by ibraheemdev on 12/11/20, 2:51 AM
by ksaj on 12/11/20, 2:35 AM
by acranox on 12/11/20, 2:45 AM
https://debugger.medium.com/why-is-apples-m1-chip-so-fast-3262b158cba2
https://medium.com/swlh/what-does-risc-and-cisc-mean-in-2020...by Fazel94 on 12/11/20, 3:59 AM
Apple M1 has 16 units that can pipeline their instructions.
Meaning, they can reorder sequential instructions that aren't dependent on each other to run in parallel. That is not threads or anything, that can be and is being done in a single threaded program.
AMD and Intel have 4 units for reordering tops, because their architecture is CISC and on instruction can be up to 15 bytes. M1 is RISC and instructions are just 4 byte fixed-length. Thus architecturally it is easier to reorder instructions for RISC than CISC.
CISC were better because of the specific instructions but now Apple has stuffed their CPU with specific hardware for alot of things including machine learning, graphic processor and encryption, instead of specific instructions, Apple has specific hardware, and can do with less instructions.
And since they control hardware, software SDKs and OS they can actually get away with such radical changes. Intel and others can't, without a big change in industry.
Source: https://debugger.medium.com/why-is-apples-m1-chip-so-fast-32...
by fstopmick on 12/12/20, 10:56 PM
(replaced the subjects with auto-industry corollaries)
I think it's the power of vertical integrations. When you aren't cobbling together a bunch of general-purpose bits for general-purpose consumers, you don't have to pay as many taxes. Sort of like SRP in software - a multi-purpose function is going to become super bloated and expensive to maintain, compared to several single-purpose alternatives.
https://en.wikipedia.org/wiki/Single-responsibility_principl...
Vertical integration is like taking horizontally-integrated business units and refactoring them per SOLID principles.
by throwarchitect on 12/11/20, 2:55 AM
by swang720 on 12/11/20, 3:03 AM
by Koiwai on 12/11/20, 3:54 AM
by jcfrei on 12/11/20, 3:58 AM
by chubot on 12/11/20, 3:06 AM
Intel CPUs don't know what memory they're talking to, i.e. they have to support a variety of memory. Likewise they don't necessarily know what OS they're running; how it context switches, etc. If it's virtualized, etc. Sure they have optimizations for those common cases, but the design is sort of accreted rather than derived from first principles.
To make an analogy, if you know your JS is running on v8, you can do a bunch of optimization so you don't fall off the cliff of the JIT, and get say 10x performance wins in many cases. But if you're writing JS abstractly then you may use different patterns that hit slow paths in different VMs. Performance is a leaky abstraction.
by wpg_steve on 12/11/20, 2:47 AM
by wangchucheng on 12/11/20, 4:11 AM
by 656565656565 on 12/11/20, 5:14 PM
Are we seeing a deeper bifurcation of the industry; personal vs server
Maybe intel and others can happily coexist?
by pedalpete on 12/11/20, 2:53 AM
I think you need to look at this from another angle. Yes, Apple did make some excellent choices, but the market was Intel's to lose.
The difference in the chips isn't limited to the 5nm, memory pooling, etc etc. Look at the base x86 vs ARM core architecture, and that is where you'll see the problem Intel had.
I'm sure there were discussions inside Intel which went along the lines of one person arguing that they had to start developing ARM based or other RISC based chips, and somebody else countering "but at Intel we build for the desktop, and servers, and RISC processors are toys, they're for mobile devices and tablets. They'll never catch-up with our..."
This change in architecture was a long time coming. As we all know, there is very little we do with our computers today, that we can't also accomplish on a phone (or tablet). The processing requirements for the average person are not that large, and ARM chips, made by Apple, Qualcomm, Samsung, or anybody else, have improved to the point they are up to some of the more demanding tasks. Even able to play high quality games at a good frame-rate or edit video.
So, now we have to ask, what was delaying the move from x86 to ARM. Apple aren't the only ones making ARM based computers. Microsoft has two generations of ARM based Surface laptops out, and I think samsung has made one too. I'm sure there are others. This is a wave that has been building for a long time.
So, now we can look at why Apple was able to be so successful in their ARM launch compared to Microsoft and the lackluster reviews of Windows based ARM devices.
From my understanding, it isn't the 5nm technology, though I a no expert in chip design. However, as you state, Apple was able to pool memory, and put their memory right on the chip, which (from what I understand) saves overhead of transferring memory in and out, as well as allowing CPU and GPU to share memory more efficiently.
As I understand it, the Qualcomm or other chips have a much smaller internal memory footprint, expecting the memory to be external to the CPU/GPU. Perhaps because this is just always the way it has been done.
Now this is where Apple's real breakthrough comes in. First off, they have the iOS app store and all the apps now available to use on desktop. This means all the video editing or gaming apps that were already designed for iOS can now run perfectly fine on the "new" ARM architecture. Then there is Rosetta2. Apple understood how important running legacy software for a small number of their users would be, and I suspect they also had very good metrics on what those legacy programs were. They did an exceptional job on Rosetta (from what I understand), and should be commended on that. Though most users will likely never use Rosetta extensively, it goes a huge way to making the M1 chip an absolute no brainer.
Compare Rosetta to Microsoft's attempt at backward compatibility, and the difference seems glaring. HOWEVER, I think again this comes down to strategy and execution. Apple knows that only a small number of their customers need a small number of apps to run in Rosetta. Microsoft, having both a larger user base, AND much more bespoke software running on their platform, don't have this luxury.
I'm sure there are other factors, but my thinking is it is less about direct technology and more flawed strategy/execution from Intel and absolutely amazing execution from Apple.
I'm very torn by this all tbh. I've been an Apple hater for a long time. Every Apple product I've bought has turned out to be crap (except my original generation 2 iPod, it was truly magical). I'm beginning to think Apple may have actually got the upper hand here.
by dboreham on 12/11/20, 3:24 AM
by parisianka on 12/11/20, 2:27 AM
by phendrenad2 on 12/11/20, 4:58 AM