from Hacker News

Splatter Image: Ultra-Fast Single-View 3D Reconstruction

by heliophobicdude on 12/21/23, 1:38 PM with 42 comments

by mk_stjames on 12/21/23, 8:35 PM
Oof, the dependency tree on this.
It uses diff-gaussian-rasterization from the original gaussian splatting implementation (which, is a linked submodule on the git, so if you are trying to git clone that dependency remember to use --recursive to actually download it).
But that is written in mostly pure CUDA.
That part is just used to display the resulting gaussian splatt'd model, and there have been other cross-platform implementations to render splats – there was even that web demo a few weeks ago, that was using WebGL [0] – and if that was used as a display output in place of the original implementation there is no reason people couldn't use this on non-Nvidia hardware, I think.
edit: also device=cuda is hardcoded in the torch portions of the training code (sigh!). This doesn't have to be the case. pytorch could push this onto mps (metal) probably just fine.
[0] https://github.com/antimatter15/splat?tab=readme-ov-file
by catapart on 12/21/23, 4:09 PM
So if I'm tracking the progress correctly, now we should be able to do: Single Image -> Gaussian Splats -> Object Identification -> [Nearest Known Object | Algo-based shell] Mesh Generation -> Use-Case-Based Retopology -> Style-Trained Mesh Transformation
Which would produce a new mesh in the style of your other meshes, based on a single photograph of a real-world object.
...and, at this speed, you could do that as a real-time(ish) import into a running application/game.
Gotta say, I'm looking forward to someone putting these puzzle pieces together! But it really does feel like if we wait another month, there might be some new AI that shrinks that pipeline by another one or two steps! It's an exhausting time to be excited!
by joosters on 12/21/23, 5:18 PM
Probably a dumb question, but is this trained by the use of lots of inputs of similar objects, or is it 'just' estimating by the look of the input image?
Like, if you have an image of a car, viewed at an angle, you can gauge the shape of the 3d object from the image itself. You could then assume that the hidden side of the car is similar to the side that you can see, and when you generate a 360 rotation animation of it, it will look pretty good (cars being roughly symmetrical). But if you gave it a flat image of a playing card, just showing the face up side, how would it reconstruct the reverse side? Would it infer it based on the front, or would it 'know' from training data that playing cards have a very different patterned back to them?
by roflmaostc on 12/21/23, 3:38 PM
Since it's based on 3D Gaussians in space, is there a way to obtain sharp images? Inherently, Gaussian functions extent infinitely, so images always look blurry. Don't they? Of course, \sigma can be optimized to be small, but then it converges to some point representation, doesn't it?
Maybe some CV/ML people can help me understanding.
by XorNot on 12/21/23, 2:23 PM
I guess this is how you'd implement that thing in Enemy Of The State where they pan around a single-perspective camera view (which I think doesn't come across as absurd in the movie anyway since the tech guys point out it's basically a clever extrapolation).
by rijx on 12/21/23, 2:14 PM
Now we can finally turn Street View into a game world!
by eurekin on 12/21/23, 5:05 PM
For anybody wanting to take a look at the code, this time the Github link does include it - it's not empty, which is typicaly for those "too good to be true" publications
by lawlessone on 12/21/23, 3:10 PM
Am I imagining this ,or somebody making a newer and faster one of these every day?
I'm expecting Overwhelming Fast Splatter by January.
by teunispeters on 12/21/23, 4:47 PM
For a change, [code] works, but [arXiv] link is not present. Have to say this looks really interesting!
by billconan on 12/21/23, 5:57 PM
the paper link doesn't work for me. the correct link https://arxiv.org/pdf/2312.13150.pdf
by alkonaut on 12/21/23, 3:40 PM
Wouldn't it be more useful to generate a vector model than a "3d image" voxel/radiance field/splats/whatever it's called? Apart from the use case "I want to spin the thing or walk around in it" they feel like they are of limited use?
Unlike say a crude model of a fire hydrant which you could throw into a game or whatever. If the model is fed some more constraints/assumptions? I think I saw some recent paper that did generate meshes now instead of pixels.
by StreetChief on 12/21/23, 4:52 PM
All I have to say is "ENHANCE!"
by amelius on 12/21/23, 5:55 PM
This would be more powerful if you could feed it more input images for a better result, if desired.
by anigbrowl on 12/21/23, 10:18 PM
This could get prove useful for autonomous navigation systems as well.
by tantalor on 12/21/23, 3:23 PM
That "GT" method seems even better, we should just use that. /s