from Hacker News

GpuScan and SSD-To-GPU Direct DMA

by matsuu on 9/18/16, 8:16 AM with 26 comments

  • by exDM69 on 9/18/16, 10:22 AM

    There is no explanation how it works. Does it work on top of existing APIs in user space? Or is there a custom kernel driver bypassing user space?

    I've done some high throughput streaming from HD/SSD to GPU before, and it's pretty easy to beat the naive solution but getting the most out of it would require kernel space code.

    I was doing random access streaming of textures using memory mapped files for input and copying to persistent/coherent mapped pixel buffers on the CPU with memcpy with background threads. This was intended to take advantage of the buffer caches (works great when a page is reused) and intended for random access. If I would have been working on a sequential/full file upload, my solution would be entirely different.

    Edit: here's the source: https://github.com/kaigai/ssd2gpu

    It has a custom kernel module.

  • by zokier on 9/18/16, 10:11 AM

    This is very interesting in the light of recent AMD announcement of their "Solid State Graphics", ie GPU with SSD ducktaped on: http://www.anandtech.com/show/10518/amd-announces-radeon-pro...
  • by foobar2020 on 9/18/16, 10:14 AM

    This would be incredibly useful for distributed machine learning - imagine a Tensorflow implementation that almost entirely bypasses CPU.
  • by witty_username on 9/18/16, 9:44 AM

    So, if I understand correctly, data is being loaded directly from the SSD to the GPU and then filtered by the GPU before the CPU handles the more difficult queries.

    Neat.

  • by justinclift on 9/18/16, 1:02 PM

    This is very awesome. If further developed + made into a feasible option for PostgreSQL, this has potential to do interesting things to TPC benchmarks. :)
  • by nl on 9/18/16, 10:35 AM

    See also https://developer.nvidia.com/gpudirect and to some extent https://en.wikipedia.org/wiki/NVLink.

    NVLink is in the Power9 servers Google is using.

  • by carbocation on 9/18/16, 5:04 PM

    I'm really hoping that Optane delivers on the hype, in which case our durable storage could be just 10x slower than RAM. At least, I imagine that it would be really helpful for speeding up even this approach.
  • by Razengan on 9/18/16, 6:01 PM

    I hope this brings us closer to widespread external GPUs, where you could use a slower-than-PCIe bus like Thunderbolt 3 or USB 3.1 to upload all assets to the EGPU's SSD during a one-time loading screen.
  • by foobarbecue on 9/18/16, 5:25 PM

    Direct Direct Memory Access? That's pretty direct.
  • by musha68k on 9/18/16, 10:00 AM

    Amazing results! We need more of that kind of thinking - GPU/SSD accelerate all the things!
  • by MrBuddyCasino on 9/18/16, 12:22 PM

    Who is providing the DMA engine in this case? Has the GPU access to PCIe device memory?