Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Nerf / Gaussian Splatting. A new way to make world maps?

Discussion in 'General Discussion' started by jackmememe, Sep 6, 2023.

  1. jackmememe

    jackmememe

    Joined:
    Jan 21, 2016
    Posts:
    138
    Nerf been around for a while, but now we are getting on a point that it can run on real time.
    Right now there is a lot of talk about 3D Gaussian Splatting since it can run fairly well on real time:





    While it might not be production ready yet, there is already some plugins to run it inside Unreal:
    https://twitter.com/KenjiASABA/status/1698508499691467256

    I can see it working on a year or two, perhaps it can work well for point and click games or a game similar to Myst.
     
    neoshaman likes this.
  2. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,492
    Nerf is expensive to run
    Gaussian is expensive to store

    I don't think they are game ready, but gaussian greatly simplify the process (by skipping neural), such that we can probably apply or find optimization to it.

    For those who don't know:
    - "3d Gaussian splatting" is an explicit representation of a scene, it's a an extension of photogrammetry (structure from motion) were the points cloud is replaced by gaussian splat. That's why it's expensive to store, though being explicit cloud of positions open to optimization, such as mipmap and clustering.
    - For the old gamer among you, it's like the rendering techniques of Ecstatica, where the solid ellipsoid primitive is replaced by fuzzy alpha ellipsoid splatted to screen.
    - The difference with NERF is that we apply the gradient descent process directly to the primitive (orientation, scaling, color+alpha and number through splitting and merging) instead of going through a neural network, making it a lot faster, trading for storage.
    - Also it's view base, so objects are not really approximated through volume. Common artifact are "floater" like nerf, because it's approximating rays, not objects. It mean it's fillrates intensive, though it's cheap fillrates (no complex brdf)?.
    - Lighting is baked into a SH representation, so it's akin to a free form lightprobe volume.
    - There is an animated extension called "Dynamic 3d gaussian"
    - they plan to release source code, but code isn't posted yet (as of this date).

    Mesh is still a superior real time representation, but future works might find ways to extract data from this:
    - We can augment photogrammetry mesh extraction already with NeRF, this follow the same principles.
    - Maybe we can find way to extract lighting BRDF? shader is a better representation we can actually manipulate and relight.
    - We can probably extract lightfield information, especially light probe volume for relighting.
    - Would probably work for any angular radiance data, so maybe an opening to optimize the lighting cage data extraction for material and 4d capture, by using less camera.
    - Combined with the latest HPG talk about differentiable shader, to extract realistic parameter calibration, to use with existing pipeline.


    EDIT:
    Another way to think about Gaussian splat:
    - It bring NeRF type of results into representation similar to traditional real time rendering,
    - which mean we can leverage that experience to learn from it.
    - SH at surfaces boundaries, to bake light, is something we already do.
    - Triangle mesh are a better approximation of surface than gaussian splat.
    - rasterization is a denser "point cloud" analogue solution than gaussian splat
    - Gaussian's fuzziness is a better approximation for final image by blending details (using overdraw).
    - Splatting and SDF have similar property, see Playstation's Dreams based on those paradigm.
     
    Last edited: Sep 6, 2023
  3. aras-p

    aras-p

    Joined:
    Feb 17, 2022
    Posts:
    73
  4. jackmememe

    jackmememe

    Joined:
    Jan 21, 2016
    Posts:
    138
    I don't think it will replace today 3D tech, but probably will have some cool uses. That is a cool project, thanks for the link
     
  5. Tom_3D

    Tom_3D

    Joined:
    Oct 1, 2014
    Posts:
    7
    Would this be a potential method for more optimized volumetric smoke/debris instead of raymarching?
     
  6. Murgilod

    Murgilod

    Joined:
    Nov 12, 2013
    Posts:
    10,032
    Not unless your debris and smoke is entirely static.
     
  7. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,496
    The problem with all tech like this is that you'll be restricted to real life objects. Meaning to build a game you'd need to build a real-life set.
     
  8. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,492
    Not really, it's a method of taking multiple views and translate it into the 3d gaussian splatting format. Which mean you can translate any 3d objects into it, for example baking hollywood level render for real time.

    IN fact it would be way easier to translate 3d objects to 3d splatting because you remove the guessing part by having the solid mesh as an evaluator. And it wouldn't stop at translating 3d objects, it would extract the entire lightfield, which you can bake in a probe.

    3d Gaussian can have relighting potential because they use spherical harmonics as light representation, which is easy to manipulate and industry standard. And ultimately, it's a cloud point, you can still create mesh approximation for physics, using regular photogrammetry techniques.

    EDIT:
    https://github.com/graphdeco-inria/gaussian-splatting

    The github is up for those who want to see the code to train with image.

    EDIT2:
    The major issue to use real life capture 3d gaussian splatting, aside for the raw memory cost at the moment (so you need to figure out compression on your own), is that light is baked, you would have to create a whole DELIGHTING pipeline to extract light and material approximation, akin to what unity propose for it's photogrametry pipeline. The technique is new, no one as done this.

    For 3d object capture, well this technique doesn't like flat uniform surface from the source image, just like photogrammetry, so you will have to do some new method to use the mesh as an estimator, which no one as done.
     
    Last edited: Sep 22, 2023
    BabaSiMitraliera likes this.
  9. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,492
    There is an implementation with movement



    More optimized I don't know, different bottlenecks certainly (memory pressure, overdraw using the naive billboard rendering process), plus I haven't seen any demo with volumetrics but that sound plausible, use the github to test on synthetic data yourself, reporting any results would be cool!

    The technique isn't mature yet, it need a bunch of small optimization and automated workflow to be used GENERICALLY in game (compression, relighting, etc...). But as a "matte painting" in scene with no dynamic light, you might be try something.
     
    BabaSiMitraliera likes this.
  10. Murgilod

    Murgilod

    Joined:
    Nov 12, 2013
    Posts:
    10,032
    To say nothing of how literally every paper about gaussian splatting goes "yeah, this isn't and never will be as performant as a mesh"
     
    neginfinity likes this.
  11. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,496
    And in the mean time NeRF does not have this limitation.
     
  12. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    20,740
    Luma AI has a NeRF plugin for UE 5.2.

    https://docs.lumalabs.ai/9DdnisfQaLN1sn
     
  13. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,492
    I was replying specifically to the fact someone said ONLY for static scene. Nothing more, nothing less.

    That's relative, this is faster than nerf, like way faster, and not a black box. Nerf has a limit in computation and visual quality.

    The issue is size but people toying with the algorithm made the viewer work on 4GB GPU (dataset is 24GB) and it's good enough minus some visual streak, proving it degrade gracefully. The main contribution of the algorithm was reducing multiview image to a cloud of point, not necessarily offering an efficient data format. However It seems there is a lot that can be done to figure out a better format FROM the raw data. It's basically a BMP, we need JPEG :p

    Gaussian on a 1650 with 4GB of ram


    Maybe then it will be competitive with mesh? To be frank I don't care, that's not my use case, and it doesn't solve other problem.

    edit:
    Source code to create your splatting
    https://github.com/graphdeco-inria/gaussian-splatting
    /edit

    NOW nerf is used and implemented right now in engine. But I feel like Gaussian will be very soon. Especially when we solve the artifact at low sampling of the raw data. Also it's a point cloud, you can hash them into voxel grid to accelerate sorting and make them compatible with the NERF sampling (which traverse a grid and evaluate at each point the density).

    Visualization of Gaussian point clouds



    Random notes.

    I'm really bothered by the elliptic shapes of gaussian, that create less spatially coherent shapes, which makes optimizing it (compression, sorting) a bit more troublesome. Now gaussian has some nice property, I forgot the name, of being able to blend correctly together, so maybe we can break the troublesome shapes into smaller coherent shape?

    The idea instantly click with me, because with my RTGI exploration, I learned about lightfield and its math (the plenoptic equation). One stipulation was we can reduce a 5d sampling (3D positions + 2D radial orientations) to a 4d sampling (2D position on screen + 2D radial orientation).

    A 5d sampling is basically the typical lightprobe array game use, and that hints we can compressed them to 2d plane (guarantied with convex space, concave space is a bit more complex due to the radial nature as each sampling point). That allow neat tricks like seeing through a sparse bush, by reconstructing the scene behind, with the rays that go through the bush. The trade off is that a (the 5d plenoptic) lightprove volume is O(1) memory sampling while the 4D surfaces would be O(rays) memory sampling, but that seems acceptable as a decompression steps.

    My initial idea was to hollowed out a Lightprobe volume and only keep the data at the boundary, basically a cubemap of lightprobe data. Then another idea would be to voxelize the scene and keep the probe at the surface of the each visible face of voxels. Such as the geometric complexity would be decoupled from the visual complexity. There is a few other idea, like limiting rays, which are half infinite lines, to a structure of shared segments, capturing view only within that segment.

    3D gaussian field, are basically elliptic probes (grrrr) that accumulate around surfaces, it's close in idea to the voxelization, but where surface is more ill define, but prove that empty space and surfaces are enough to approximate a geometric complexity. Now what's left be seen is if we can transform the chaotic placement of Gaussian into nice probe "screen plane" while keeping the visual property. Like capturing the gaussian contribution onto that plane by sampling the gaussian in view of that plane.

    My message is simple, as dev, when confronted with a new technique, we can leverage our knowledge to put it into context and find way to use it, not verbatim, but by adapting it where it matter. I'm probably not going to implement 3d gaussian as presented, but I'm certainly will be exploring its concept in my own work as I was converging to similar idea, since I'm dealing with lightfield in my GI algorithm.

    Saying a technique will replace X, is non sense, that's not how that works, a technique INFORMS how we do stuff. I lament a time where in this forum people could use their intelligence to recontextualize techniques instead of thinking about rot application, replacement, or if x as implemented it in the tool set, or if popular y is using it. I presented NeRF here very early, to similar reaction and now it's in engine, we are not obligated to be follower of trend, and every new stuff isn't just a gimmick to be seen with spite and worry.

    edit:

    For reference
     
    Last edited: Sep 26, 2023
    aaronsmiles likes this.
  14. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,492
    Someone asked a question about gaussian rendering, I put my answer here as a reference, to retrieve later.
    Technically it's not entirely new, Gaussian are equivalent to light-probe BAKING used in 3d games, we use SH (spherical harmonics, they are directional data, used to encode color from a direction) even at the texture level. IN some way Gaussian are unstructured placement of SH light-probe using ellipsoid shapes. It's not completely antithetic to traditional 3d, a textured mesh could be seen as procedural point cloud projected on screen using the rasterization process and which index a texture that could contain SH data. That make the mesh a more compact way to store point cloud data, by interpolating sparse points (the vertices).

    The BIG gotcha is that is a single pixel is resolved using "transparent shape" of Gaussian probes, not the surface, a uniformly green room could be represented by a single probe in the middle of the room, ie there is no relation between surfaces and probe, the probe must be seen as encoding LIGHT rays, as such the single green probe would the equivalent to a sky-box in game. And multiple probe s contribute to a single pixel being resolved on screen (hence the creation of floater artifact), which is where the transparent part is important. Scenes with mesh are typically rendered using single point of geometry, that's the key difference.

    But the question is can we render directly into 3d Gaussian?
    3d Gaussian is a type of 3d representation, it doesn't make sense to say to render directly in 3d point clouds or mesh, these are the representation not the rendering process, we render TO a screen. And it doesn't make sense to say without point cloud, since 3d Gaussian splatting ARE A TYPE of point clouds. The training process is how we convert 2d images into the 3d representations. The question should be rephrased as, can we CONVERT any 3d representation to Gaussian splatting WITHOUT using 2d images. The answer is probably yes, but the details aren't known, let's try think that through.

    The way training work is, that for 2d points in a pair of image, it tries to guess how much a given point has moved from one image to another (parallax), then triangulate the point to get a 3d positions. In a 3d mesh we already know the point absolutely. But as said the Gaussian don't "find" the point, it's a guess based on the ray direction (position within the image), and a guess of the camera position. The algorithm guess a bunch of possible camera positions, and extract data from the probable intersection of all the rays, to find which are make sense, then place points at these intersection, then we kind of match a Gaussian and a color (SH) to that point that tries to match all the source images. The more training there is, the more it get close to recreate the original images sets from the Gaussian.

    Given that, in 3d we know already every point and every possible camera position with absolute precision, it would be much easier to create the ray intersection lattice and put Gaussian at these position, by tracing rays and sampling colors into the SH.
     
  15. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,492
  16. Murgilod

    Murgilod

    Joined:
    Nov 12, 2013
    Posts:
    10,032
     
    AcidArrow likes this.
  17. ippdev

    ippdev

    Joined:
    Feb 7, 2010
    Posts:
    3,840
    These might be useful for long range dense forest cover. As well it seems to me that as pixels relate to a surface that a mesh, diffuse and normal could be generated from. It's utility to me is not generating local game world objects but procedurally generating distant stand-ins. I am also wondering if a compute shader may be fed a Gaussian Splat and render it. Perhaps in such a case lighting could be generated as well.This is another case where replacing the diffusion with a charge field may yield more useful 3D world results as these end up with a vector field and more information per node can be generated from higher dimensional maths which are well established.
     
    Ryiah and neoshaman like this.
  18. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,492
    After a few days, I would say at least for static synthetic objects:
    - we can relight them, these are light probes (SH with a position in space), we know how to add light to lightprobes
    - we can extract the baked material BRDF LUT at any point, basically we start at a point in space, then project "view pixel" from a sphere surface around that point. Either we have the point on a surface mesh (synthetic) or we reconstruct a mesh using photogrammetry.
    - It's good for complex and dense scenery, not just turning around objects, especially distant LOD (lower fidelity therefore footprint), baking high level rendering works.
    - It's fast enough for static objects, it's just particles data. We have been doing particle since the ps2, if the ps2 had the memory it could probably run natives on it using the VU2 (someone need to try). Properly handled, it has the potential to run on potatoes.
    - It's been "implemented" in engines already (basically importing particles). Given than NeRF was less useful and still made it's way into use case in games (ask Nvidia, as GI query structure).

    Current Problems:
    - Authoring workflow, and good practices, are non existent for best practical result, 3dgs aren't as cute as mesh in all views, they need specific angles to avoid some artifact (hence LOD usecase), which mean there is a way to capture them and a way to ensure proper view volume.
    - An actual shareable standard format that support compression. I mean a bag of raw unstructured point is cool and all (can be imported verbatim into unreal's niagara), but that takes place and we don't want everyone reinvent the wheel constantly. The format should also support a LOD scheme ,that work like a mipmap slider, that drop unnecessary points for extra performance.
    - Actual proof of concept game projects. It work well with synthetic scenes, there is no reason not to do a high fidelity rendering in any 3d soft and import the result. THEN merge and augment with regular mesh where appropriate.

    It's NOT a replacement for mesh or any existent workflow, just a nice complement.

    Other opportunity:
    - Bring back neural, I don't know where, but there is probably a use case to compress FROM the point cloud instead of resolving TO it lol. Also generative AI is possible (I know two paper already explored that). Or maybe just neural relighting from capture (for real scene).
     
    Ryiah likes this.