by thinking_banana on 12/25/24, 4:58 PM with 82 comments
by aleinin on 12/25/24, 7:25 PM
by morphle on 12/25/24, 5:26 PM
[1] https://github.com/AsahiLinux/gpu
[2] https://github.com/dougallj/applegpu
[3] https://github.com/antgroup-skyward/ANETools/tree/main/ANEDi...
[4] https://github.com/hollance/neural-engine
You can use a high level APIs like MLX, Metal or CoreML to compute other things on the GPU and NPU.
Shadama [5] is an example programming language that translates (with Ometa) matrix calculations into WebGPU or WebGL APIs (I forget which). You can do exactly the same with the MLX, Metal or CoreML APIs and only pay around 3% overhead going through the translation stages.
[5] https://github.com/yoshikiohshima/Shadama
I estimate it will cost around $22K at my hourly rate to completely reverse engineer the latest A16 and M4 CPU (ARMV9), GPU and NPU instruction sets. I think I am halfway on the reverse engineering, the debugging part is the hardest problem. You would however not be able to sell software with it on the APP Store as Apple forbids undocumented API's or bare metal instructions.
by barkingcat on 12/25/24, 5:08 PM
There is Metal development. You want to learn Apple M-series gpu and gpgpu development? Learn Metal!
by rgovostes on 12/25/24, 5:44 PM
<Insert your favorite LLM> helped me write some simple Metal-accelerated code by scaffolding the compute pipeline, which took most of the nuisance out of learning the API and let me focus on writing the kernel code.
Here's the code if it's helpful at all. https://github.com/rgov/thps-crack
by billti on 12/25/24, 10:10 PM
With that base, I’ve found their docs decent enough, especially coupled with the Metal Shader Language pdf they provide (https://developer.apple.com/metal/Metal-Shading-Language-Spe...), and quite a few code samples you can download from the docs site (e.g. https://developer.apple.com/documentation/metal/performing_c...).
I’d note a lot of their stuff was still written in Objective-C, which I’m not that familiar with. But most of that is boilerplate and the rest is largely C/C++ based (including the Metal shader language).
I just ported some CPU/SIMD number crunching (complex matrices) to Metal, and the speed up has been staggering. What used to take days now takes minutes. It is the hottest my M3 MacBook has ever been though! (See https://x.com/billticehurst/status/1871375773413876089 :-)
by mkagenius on 12/25/24, 5:29 PM
by thetwentyone on 12/25/24, 6:11 PM
by dylanowen on 12/25/24, 5:21 PM
by feznyng on 12/25/24, 5:14 PM
by desideratum on 12/25/24, 7:48 PM
by rowanG077 on 12/25/24, 5:12 PM
by TriangleEdge on 12/25/24, 7:56 PM
by amelius on 12/25/24, 6:21 PM