by rcshubhadeep on 4/10/22, 10:32 AM with 4 comments
So to measure the performance, I created a simple benchmark of comparison. I created a Tensor with these dimensions (BATCH, X, Y). Like so -
a = torch.randn(10, 20, 30)
Then in Jupyter I did this
%%timeit
torch.einsum('b i j -> b j i', a)
AND
%%timeit
a.transpose(1, 2)
-----------------------
This is the result
5.43 µs ± 63.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) [Einsum]
1.15 µs ± 2.51 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [transpose]
Am I doing / reading something wrong? Is it a wrong way to benchmark? Or is it really true what I see, that einsum is order of magnitude slower than transpose?
by gus_massa on 4/10/22, 11:15 AM
a = torch.randn(10, 20, 30)
a = torch.randn(20, 40, 60)
a = torch.randn(30, 60, 90)
...
Is the "4µs" a constant difference or it's proportional to the size of the matrix?