from Hacker News

GpuScan and SSD-To-GPU Direct DMA

by matsuu on 9/18/16, 8:16 AM with 26 comments

by exDM69 on 9/18/16, 10:22 AM
There is no explanation how it works. Does it work on top of existing APIs in user space? Or is there a custom kernel driver bypassing user space?
I've done some high throughput streaming from HD/SSD to GPU before, and it's pretty easy to beat the naive solution but getting the most out of it would require kernel space code.
I was doing random access streaming of textures using memory mapped files for input and copying to persistent/coherent mapped pixel buffers on the CPU with memcpy with background threads. This was intended to take advantage of the buffer caches (works great when a page is reused) and intended for random access. If I would have been working on a sequential/full file upload, my solution would be entirely different.
Edit: here's the source: https://github.com/kaigai/ssd2gpu
It has a custom kernel module.
by zokier on 9/18/16, 10:11 AM
This is very interesting in the light of recent AMD announcement of their "Solid State Graphics", ie GPU with SSD ducktaped on: http://www.anandtech.com/show/10518/amd-announces-radeon-pro...
by foobar2020 on 9/18/16, 10:14 AM
This would be incredibly useful for distributed machine learning - imagine a Tensorflow implementation that almost entirely bypasses CPU.
by witty_username on 9/18/16, 9:44 AM
So, if I understand correctly, data is being loaded directly from the SSD to the GPU and then filtered by the GPU before the CPU handles the more difficult queries.
Neat.
by justinclift on 9/18/16, 1:02 PM
This is very awesome. If further developed + made into a feasible option for PostgreSQL, this has potential to do interesting things to TPC benchmarks. :)
by nl on 9/18/16, 10:35 AM
See also https://developer.nvidia.com/gpudirect and to some extent https://en.wikipedia.org/wiki/NVLink.
NVLink is in the Power9 servers Google is using.
by carbocation on 9/18/16, 5:04 PM
I'm really hoping that Optane delivers on the hype, in which case our durable storage could be just 10x slower than RAM. At least, I imagine that it would be really helpful for speeding up even this approach.
by Razengan on 9/18/16, 6:01 PM
I hope this brings us closer to widespread external GPUs, where you could use a slower-than-PCIe bus like Thunderbolt 3 or USB 3.1 to upload all assets to the EGPU's SSD during a one-time loading screen.
by foobarbecue on 9/18/16, 5:25 PM
Direct Direct Memory Access? That's pretty direct.
by musha68k on 9/18/16, 10:00 AM
Amazing results! We need more of that kind of thinking - GPU/SSD accelerate all the things!
by MrBuddyCasino on 9/18/16, 12:22 PM
Who is providing the DMA engine in this case? Has the GPU access to PCIe device memory?