by eightysteele on 1/17/12, 7:00 PM with 15 comments
by ique on 1/17/12, 8:21 PM
BLAS is not only written in more efficient code, it's different algorithms altogether. BLAS can do a lot of optimizations that brings the total FLOP count to below what's usually considered required for matrix multiplication. (2m*n^2)
[1]: http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprogram...
by hogu on 1/17/12, 7:56 PM