by InvisibleUp on 10/9/24, 4:47 AM with 126 comments
by shrubble on 10/9/24, 1:39 PM
Chuck Moore of Forth fame demonstrated taking the value, say 1.6 multiplied by 4.1 and doing all the intermediate calculations via integers (16 * 41) and then formatting the output by putting the decimal point back in the "right place"; this worked as long as the range of floating point values was within a range that multiplying by 10 didn't exceed 65536 (16 bit integers), for instance. For embedded chips where for instance, you have an analog reading with 10 bits precision to quickly compute multiple times per second, this worked well.
I also recall talking many years ago with a Microsoft engineer who had worked with the Microsoft Streets and Trips program (https://archive.org/details/3135521376_qq_CD1 for a screenshot) and that they too had managed to fit what would normally be floating point numbers and the needed calculations into some kind of packed integer format with only the precision that was actually needed, that was faster on the CPUs of the day as well as more easily compressed to fit on the CDROM.
by visarga on 10/9/24, 6:04 AM
It this were about convolutional nets then optimizing compute would be a much bigger deal. Transformers are lightweight on compute and heavy on memory. The weakest link in the chain is fetching the model weights into the cores. The 95% and 80% energy reductions cited are for the multiplication operations in isolation, not for the entire inference process.
by tantalor on 10/9/24, 1:22 PM
http://tom7.org/grad/murphy2023grad.pdf
Also in video form: https://www.youtube.com/watch?v=Ae9EKCyI1xU
by js8 on 10/9/24, 8:24 AM
I am asking not to dismiss it, I genuinely feel I don't understand logarithms on a fundamental level (of logic gates etc.). If multiplication can be replaced with table lookup and addition, then there has to be a circuit that gives you difficult addition and easy multiplication, or any combination of those tradeoffs.
by jenda23 on 10/20/24, 11:02 PM
by cpldcpu on 10/9/24, 6:15 AM
by pjc50 on 10/9/24, 10:02 AM
(from footnote in method section)
by CGamesPlay on 10/9/24, 6:18 AM
by ein0p on 10/9/24, 6:37 PM
by presspot on 10/9/24, 8:14 PM
by Buttons840 on 10/9/24, 10:49 AM
What about over time? If this L-Mul (the matrix operation based on integer addition) operation proved to be much more energy efficient and became popular, would new hardware be created that was faster?
by cpldcpu on 10/9/24, 10:19 AM
by scotty79 on 10/9/24, 8:21 AM
by concrete_head on 10/9/24, 9:59 AM
by dwrodri on 10/12/24, 12:21 AM
by md_rumpf on 10/9/24, 5:53 AM
by A4ET8a8uTh0 on 10/9/24, 1:37 PM
by m3kw9 on 10/9/24, 9:58 PM
by ranguna on 10/9/24, 9:45 AM
Nvidia funds most research around LLMs, and they also fund other companies that fund other research. If transformers were to use addition and remova all usage of floating point multiplication, there's a good chance the gpu would no longer be needed, or in the least, cheaper ones would be good enough. If that were to happen, no one would need nvidia anymore and their trillion dollar empire would start to crumble.
University labs get free gpus from nvidia -> University labs don't want to do research that would make said gpus obsolete because nvidia won't like that.
If this were to be true, it would mean that we are stuck on an inificient research path due to corporate greed. Imagine if this really was the next best thing, and we just don't explore it more because the ruling corporation doesn't want to lose their market cap.
Hopefully I'm wrong.