by vanni on 5/8/20, 10:28 PM with 52 comments
by bigcheesegs on 5/9/20, 12:15 AM
X86-64 uses SSE registers for all floating point operations. I'm not sure that the author realized that they were looking at an -O0 binary. -O0 does not do vectorization (or anything else for that matter).
by rwmj on 5/9/20, 12:24 PM
card.cpp:16:2: error: ‘g’ was not declared in this scope
16 | <g;p)t=p,n=v(0,0,1),m=1;for(i k=19;k--;)
| ^
Edit: Yes there is. The ‘<g;’ seems like it should have been the single character ‘<’, perhaps a corrupted HTML escape.by danielscrubs on 5/9/20, 7:22 AM
I also tried to optimise the code, and got great speed increases with just constexpr the vector methods and could quickly see that rand was problematic and then Fabien releases this post with nvcc that are another level. Really great blog post!
by ectoplasmaboiii on 5/9/20, 11:49 AM
by blondin on 5/9/20, 12:31 AM
also been experimenting with pure html with an itsy-bitsy amount of css. for months now i wondered how to display code without involving javascript.
that textarea is so perfect! and i bet you when you copy and paste into word or your todo list application they won't even try to be "smart" about knowing what "rich text" is...
that's very cool.
by rrss on 5/9/20, 3:50 PM
> This is correlated with the warning nvcc issued. Because the raytracer uses recursion, it uses a lot of stacks. So much actually that the SM cannot keep more than a few alive.
Stack frame size / "local memory" size doesn't actually directly limit occupancy. There's a list of the limiters here: https://docs.nvidia.com/gameworks/content/developertools/des.... I'm not sure why the achieved occupancy went up after removing the recursion, but I'd guess it was something like the compiler was able to reduce register usage.
by fegu on 5/9/20, 10:49 AM
by ntry on 5/8/20, 11:59 PM
by mianos on 5/9/20, 10:10 AM
by tomsmeding on 5/9/20, 8:03 AM
The initial time is not 101.8 seconds, it's 11.6 seconds.
by lonk on 5/9/20, 12:31 AM