by hexomancer on 12/17/24, 11:02 AM with 2 comments
by mikewarot on 12/17/24, 4:47 PM
If you're going to go for the absolute maximum performance, you're going to convert an entire layer from multiply accumulates, etc... to a directed acyclic graph of bitwise logical operations (and, or, xor, nor, nand, etc), then optimize out all of the gates you possibly can before building it into a part of the ASIC. In theory, you could get 100% utilization of the chip area, and one token per clock cycle out. Your limiting factor is going to be power consumption, as 50% of the gates will be toggling every clock (on average).
Nobody will do this, though... because developing an ASIC takes 6 months to a year, and the chip would be completely useless for anything else.
You could get close with a huge grid of LUTs that only talks to neighbors, it could compute the optimized graph from above, or any other, while keeping all the wires short, and thus all the capacitances low, and thus lower power, higher frequency.
by bjourne on 12/18/24, 12:18 AM