from Hacker News

Ask HN: How did Apple manage to create such a better chip than Intel?

by aarkay on 12/11/20, 2:25 AM with 81 comments

The recent Apple M1 chip is faster and more efficient than the equivalent intel chips. Can someone with a better understanding of chip design explain to me:

1/ What specific design choices make the M1 so much better than the equivalent intel chips ? It looks like there are a bunch of changes -- 5nm silicon, single memory pool, a combination of high efficiency and high power cores. Can someone explain to me how each of these changes helps apple achieve the gains it did ? Are these breakthrough architectural changes in chip design or have these been discussed before?

2/ How did apple manage to create this when intel has been making chips for decades and that is the singular focus of the company ? Is it the fact that Mac OS could be better optimized for the M1 chips ? Given the design changes that doesn't seem like the only reason.

by kortex on 12/11/20, 2:56 AM
https://news.ycombinator.com/item?id=25257932
My 10,000' view understanding:
- Small feature size. M1 is a 5nm process. Intel is struggling to catch up to TSMC's 7nm process
- more efficient transistors. 7nm uses finFet. M1 probably uses GAAFET, which means you can cram more active gate volume in less chip footprint. This means less heat and more speed
- layout. M1 is optimized for Apple's use case. General purpose chips do more things, so they need more real estate
- specialization. M1 offloads oodles of compute to hardware accelerators. No idea how their code interfaces with it, but I know lots of the demos involve easy-to-accellerate tasks like codecs and neural networks
- M1 has tons of cache, IIRC, something like 3x the typical amount
- some fancy stuff that allows them to really optimize the reorder buffer, decoder, and branch prediction, which also leverages all that cache
by oneplane on 12/11/20, 3:25 AM
It's not just the M1's design, it's what they don't have to do: no need to support anything legacy. You can't change the x86 ISA to the point where it makes a huge difference because it would no longer run x86 code.
Intel can probably make faster stuff than they currently do but then their customers (PC manufacturers for instance) would have to modify all their stuff as well and they don't want to, or at least, won't want the same thing.
by rayiner on 12/11/20, 3:09 AM
Unfortunately Jon Stokes at Ars Technica and David Kanter at RWT have most stopped doing CPU design articles. The AnandTech one is the best I’ve seen: https://www.anandtech.com/show/16226/apple-silicon-m1-a14-de...
by titzer on 12/11/20, 3:11 AM
Lots of other comments point out the vertical integration.
For raw single-thread performance:
1. ARM64 is a fixed-width instruction set, so their frontend can decode more instructions in parallel.
2. They got one honking monster of an out-of-order execution engine. (630 entries), which feed:
3. 16 execution ports.
by titzer on 12/11/20, 4:03 AM
I think the M1 chip finally proves the inherent design superiority of RISC over CISC. For years, Intel stayed ahead of all other competitors by having the best process, clockspeeds, and the most advanced out-of-order execution. By internally decoding CISC to RISC, Intel could feed a large number of execution ports to extract maximum ILP. They had to spend gobs of silicon for that: complex decoding, made worse by the legacy of x86's encodings, complex branch prediction, and all that OOE takes a lot of real estate. They could do that because they were ahead of everyone else in transistor count.
But in the end all of that went bye bye when Intel lost the process edge and therefore lost the transistor count advantage. Now with the 5nm process others can field gobs of transistors and they don't have the x86 frontend millstone around their necks. So ARM64 unlocked a lot of frontend bandwidth to feed even more execution ports. And with the transistor budget so high, 8 massive cores could be put on die.
Now, people have argued for decades that the instruction density of CISC is a major advantage, because that density would make better use of I-cache and bandwidth. But it looks like decode bandwidth is the thing. That, and RISC usually requires aligned instructions, which means that branch density cannot be too high, and branch prediction data structures are simpler and more effective. (Intel still has weird slowdowns if you have too many branches in a cache line).
It seems frontend effects are real.
by ibraheemdev on 12/11/20, 2:51 AM
I just wanted to point out that it is not Apple out of the blue made a chip better than Intel's. They have also been designing chips for quite a while. The APL0098 chip that was used in the original iPhone was introduced back in 2007.
by ksaj on 12/11/20, 2:35 AM
My suspicion is that they don't need to worry about backward compatibility, or compatibility outside of their own hardware choices at all. That opens the door for them to tie OS decisions with CPU decisions straight from a single product family perspective.
by acranox on 12/11/20, 2:45 AM
There have been several HN links discussing this. Have you read these yet? I thought they did a pretty good job answering your questions.
```
  https://debugger.medium.com/why-is-apples-m1-chip-so-fast-3262b158cba2
```
https://medium.com/swlh/what-does-risc-and-cisc-mean-in-2020...
by Fazel94 on 12/11/20, 3:59 AM
ARM is RISC , Intel and AMD are CISC, One important reason is their new pipelining facility.
Apple M1 has 16 units that can pipeline their instructions.
Meaning, they can reorder sequential instructions that aren't dependent on each other to run in parallel. That is not threads or anything, that can be and is being done in a single threaded program.
AMD and Intel have 4 units for reordering tops, because their architecture is CISC and on instruction can be up to 15 bytes. M1 is RISC and instructions are just 4 byte fixed-length. Thus architecturally it is easier to reorder instructions for RISC than CISC.
CISC were better because of the specific instructions but now Apple has stuffed their CPU with specific hardware for alot of things including machine learning, graphic processor and encryption, instead of specific instructions, Apple has specific hardware, and can do with less instructions.
And since they control hardware, software SDKs and OS they can actually get away with such radical changes. Intel and others can't, without a big change in industry.
Source: https://debugger.medium.com/why-is-apples-m1-chip-so-fast-32...
by fstopmick on 12/12/20, 10:56 PM
> How did Tesla manage to create this when GM has been making cars for decades and that is the singular focus of the company?
(replaced the subjects with auto-industry corollaries)
I think it's the power of vertical integrations. When you aren't cobbling together a bunch of general-purpose bits for general-purpose consumers, you don't have to pay as many taxes. Sort of like SRP in software - a multi-purpose function is going to become super bloated and expensive to maintain, compared to several single-purpose alternatives.
https://en.wikipedia.org/wiki/Single-responsibility_principl...
Vertical integration is like taking horizontally-integrated business units and refactoring them per SOLID principles.
https://en.wikipedia.org/wiki/SOLID
by throwarchitect on 12/11/20, 2:55 AM
Guess where some of Intel's engineers have fled to? People move around, so it's not like one company has a strangle-hold on knowledge that can't be replicated by another company, especially when one of those companies is willing to pay more for talent.
by swang720 on 12/11/20, 3:03 AM
They weren't weighed down by the legacy bloat in the x86 instruction set architecture.
by Koiwai on 12/11/20, 3:54 AM
Several comments mentioned Apple M1 doesn't need to support legacy, but Rosetta 2's support for amd64(yes I choose this term over x86-64) is beyond great, and I looked into that specifically a while ago, some mention Apple had something designed specifically for amd64 emulation, So I'm against that point.
by jcfrei on 12/11/20, 3:58 AM
This makes me wonder: If there's such a benefit from creating an integrated and specialized chip, will the next consoles follow the same approach? Will they be ARM based? If Microsoft and Sony follow this same model then PC games might be left behind with poorer graphics and fewer titles.
by chubot on 12/11/20, 3:06 AM
I imagine part of it is vertical integration. If you make the hardware that the CPU integrates with, and the OS that runs on the hardware, you can do a lot of optimization.
Intel CPUs don't know what memory they're talking to, i.e. they have to support a variety of memory. Likewise they don't necessarily know what OS they're running; how it context switches, etc. If it's virtualized, etc. Sure they have optimizations for those common cases, but the design is sort of accreted rather than derived from first principles.
To make an analogy, if you know your JS is running on v8, you can do a bunch of optimization so you don't fall off the cliff of the JIT, and get say 10x performance wins in many cases. But if you're writing JS abstractly then you may use different patterns that hit slow paths in different VMs. Performance is a leaky abstraction.
by wpg_steve on 12/11/20, 2:47 AM
I read that the out of order execution of the RISC was simpler to handle with the fixed 32 bit instructions. They said Apple managed to dispatch 8 instructions in parallel whereas the hi end CISC (x86) tops out at 4.
by wangchucheng on 12/11/20, 4:11 AM
Apple has less technical debt and more aggressively eliminates old technologies.And Apple makes its own laptops and uses its own operating system. This allows Apple to provide relatively more complete support.
by 656565656565 on 12/11/20, 5:14 PM
Does the unified memory architecture mean these SoCs will always have “limited” memory as some is in use by graphics?
Are we seeing a deeper bifurcation of the industry; personal vs server
Maybe intel and others can happily coexist?
by pedalpete on 12/11/20, 2:53 AM
Innovators Dilemma.
I think you need to look at this from another angle. Yes, Apple did make some excellent choices, but the market was Intel's to lose.
The difference in the chips isn't limited to the 5nm, memory pooling, etc etc. Look at the base x86 vs ARM core architecture, and that is where you'll see the problem Intel had.
I'm sure there were discussions inside Intel which went along the lines of one person arguing that they had to start developing ARM based or other RISC based chips, and somebody else countering "but at Intel we build for the desktop, and servers, and RISC processors are toys, they're for mobile devices and tablets. They'll never catch-up with our..."
This change in architecture was a long time coming. As we all know, there is very little we do with our computers today, that we can't also accomplish on a phone (or tablet). The processing requirements for the average person are not that large, and ARM chips, made by Apple, Qualcomm, Samsung, or anybody else, have improved to the point they are up to some of the more demanding tasks. Even able to play high quality games at a good frame-rate or edit video.
So, now we have to ask, what was delaying the move from x86 to ARM. Apple aren't the only ones making ARM based computers. Microsoft has two generations of ARM based Surface laptops out, and I think samsung has made one too. I'm sure there are others. This is a wave that has been building for a long time.
So, now we can look at why Apple was able to be so successful in their ARM launch compared to Microsoft and the lackluster reviews of Windows based ARM devices.
From my understanding, it isn't the 5nm technology, though I a no expert in chip design. However, as you state, Apple was able to pool memory, and put their memory right on the chip, which (from what I understand) saves overhead of transferring memory in and out, as well as allowing CPU and GPU to share memory more efficiently.
As I understand it, the Qualcomm or other chips have a much smaller internal memory footprint, expecting the memory to be external to the CPU/GPU. Perhaps because this is just always the way it has been done.
Now this is where Apple's real breakthrough comes in. First off, they have the iOS app store and all the apps now available to use on desktop. This means all the video editing or gaming apps that were already designed for iOS can now run perfectly fine on the "new" ARM architecture. Then there is Rosetta2. Apple understood how important running legacy software for a small number of their users would be, and I suspect they also had very good metrics on what those legacy programs were. They did an exceptional job on Rosetta (from what I understand), and should be commended on that. Though most users will likely never use Rosetta extensively, it goes a huge way to making the M1 chip an absolute no brainer.
Compare Rosetta to Microsoft's attempt at backward compatibility, and the difference seems glaring. HOWEVER, I think again this comes down to strategy and execution. Apple knows that only a small number of their customers need a small number of apps to run in Rosetta. Microsoft, having both a larger user base, AND much more bespoke software running on their platform, don't have this luxury.
I'm sure there are other factors, but my thinking is it is less about direct technology and more flawed strategy/execution from Intel and absolutely amazing execution from Apple.
I'm very torn by this all tbh. I've been an Apple hater for a long time. Every Apple product I've bought has turned out to be crap (except my original generation 2 iPod, it was truly magical). I'm beginning to think Apple may have actually got the upper hand here.
by dboreham on 12/11/20, 3:24 AM
Only supporting 8G RAM may help too.
by parisianka on 12/11/20, 2:27 AM
I'd like to know this too.
by phendrenad2 on 12/11/20, 4:58 AM
It just shows the power of RISC. Soon there will be RISC-V chips (or at least Samsung ARM chips - remember them?) that will closely follow it's lead, mark my words...