by farzher on 5/1/22, 5:54 PM with 68 comments
by daenz on 5/3/22, 11:56 PM
Am I missing something that makes the video novel?
by xaedes on 5/3/22, 9:33 PM
You really can achieve amazing stuff with just plain e.g. OpenGL optimized for your rendering needs. With todays GPU acceleration capabilities we could have town-building games with huge map resolutions and millions of entities. Instead its mostly only used to make fancy graphics.
Actually I am currently trying to build something like that [1]. A big big world with hundreds of millions of sprites is achievable and runs smoothly, video RAM is the limit. Admittedly it is not optimized to display those hundreds of millions of sprites all at once, maybe just a few millions. Would be a bit too chaotic for a game anyway I guess.
by aappleby on 5/4/22, 3:14 AM
200000 * 200 * 2 = 80M tris/sec
200000 * 200 * 32x32px = 40 gpix/sec (if no occlusion culling)
Neither of those numbers are particularly huge for modern GPUs.
I'd wager that a compute shader + mesh shader based version of this could hit 2M sprites at 200 fps, though at some point we'd have to argue about what counts as "cheating" - if I do a clustered occlusion query that results in my pipeline discarding an invisible batch of 128 sprites, does that still count as "rendering" them?
by quadcore on 5/3/22, 11:53 PM
edit: oh they do rabbits in the video as well what a bunny coincidence
edit2: the goroutines werent drawcalling btw, they were just moving the rabbits. The drawcalls were still made using a regular for loop, in case you wonder.
by _aavaa_ on 5/1/22, 7:07 PM
How are you finding working with it? Have you done a similar thing in C++ to compare the results and the process of writing it?
200k at 200fps on an 8700k with a 1070 seems like a lot of rabbits. Are there similar benchmarks to compare against in other languages?
by juancn on 5/3/22, 10:39 PM
My guess is that the rendering is not the hardest part, although it's kinda cool.
by chmod775 on 5/1/22, 7:48 PM
The CPU work would be O(n) and the rendering/GPU work O(m*k), where n is the number of bunnies, m is the display resolution and k is the size of our bunny sprite.
The advantage of this (in real applications utterly useless[1]) method is that CPU work only increases linearly with the number of bunnies, you get to discard bunnies you don't care about really early in the process, and GPU work is constant regardless of how many bunnies you add.
It's conceptually similar to rendering voxels, except you're not tracing rays deep, but instead sweeping wide.
As long as your GPU is fine with sampling that many surrounding pixels, you're exploiting the capabilities of both your CPU and GPU quite well. Also the CPU work can be parallelized: Each thread operates on a subset of the bunnies and on its own texture, and only in the final step the textures are combined into one (which can also be done in parallel!). I wouldn't be surprised if modern CPUs could handle millions of bunnies while modern GPUs would just shrug as long as the sprite is small.
[1] In reality you don't have sprites at constant sizes and also this method can't properly deal with transparency of any kind. The size of your sprites will be directly limited by how many surrounding pixels your shader looks up during rendering, even if you add support for multiple sprites/sprite sizes using other channels on your textures.
by farzher on 5/1/22, 5:54 PM
i got 200k sprites at 200fps on a 1070 (while recording). i'm not sure anyone could survive that many vampires
by liftm on 5/3/22, 11:43 PM
by sqrt_1 on 5/3/22, 10:35 PM
Curious how you are passing the data to the GPU - are you having a single dynamic vertex buffer that is uploaded each frame?
Is the vertex data a single position and the GPU is generating the quad from this?
by andrewmcwatters on 5/4/22, 12:19 AM
by jaqalopes on 5/4/22, 12:15 AM
by SemanticStrengh on 5/3/22, 10:58 PM
by jancsika on 5/4/22, 4:45 AM
by adanto6840 on 5/4/22, 1:27 AM
Using some slight shader/buffer trickery, and depending on what you're trying to do (as is always the case with games & rendering at this scale), you can easily get multiples of that -- and still stay >100FPS.
I agree, more of this approach is great. And I am totally flabbergasted at how abysmally poor the performance is with SpriteRenderer Unity's built-in sprite rendering technique.
That said, it's doable to get relatively high-performance with existing engines -- and the benefits they come with -- even if you can definitely, easily even, do better by "going direct".