Search Unity

Graphics HXGI Realtime Dynamic GI

Discussion in 'Tools In Progress' started by Lexie, May 24, 2017.

  1. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    It's obvious nvidia is branching RTX for movie/vfx/audiovisual shop not game. THey said so in their presentattion through unreal presenter. Same with their work on digital clone, no game will support these fancy velus hair overhead.

    Isn't the SVOGI paper highlight the need for a lot of handplace volume to control light leaking? It doesn't seems to work out of teh box?

    I was wondering if a simplified implementation of Lexie's GI can handle simple AO (no bounce) or distant shadow for open world TOD.
     
  2. Vagabond_

    Vagabond_

    Joined:
    Aug 26, 2014
    Posts:
    1,148
    - check the video name and description

    seems only DX12.1 and NVidia plus High End Volta GPU - for now !
    NVidia promises games using RTX this year.

    But still looks to me that some approximated method like HXGI is better for now until consumer hardware to run RTX comes out !
     
    neoshaman and buttmatrix like this.
  3. Zuntatos

    Zuntatos

    Joined:
    Nov 18, 2012
    Posts:
    612
    Those videos really need a side by side comparison to make sense. In the video above for example, you still see the screen space reflection artifacts on water at 0:45 (fading out to edge of screen). So what are they even raytracing?

    I'm much more interested in GI than ray traced reflections. A fully dynamic GI opens many options for game design. Even something 1/3rd the way of GI like skybox occlusion approximation (I can't find any sources on the internet about it, what do I google? :/ ) would help a ton of games.

    Let's hope Lexie finds a scalable way!

    EDIT: I'm an idiot, the description of the video says the GI is raytraced - but technically enlighten is also realtime raytraced I suppose, just with precomputed transport)
    EDIT 2: Look at the lamps on the table at 1:20 :S
     
  4. elbows

    elbows

    Joined:
    Nov 28, 2009
    Posts:
    2,502
    Yeah since there are so many different ways to to use rays, I'll be waiting for the detail on that one. A straightforward analysis of nvidia ray-tracing via RTX performance vs other ray-tracing methods would also be helpful, especially when we are talking about nvidia stuff thats effectively in a black box, not just the API/pipeline side of things that is more visible. Plus, DirectX had plenty of aspects in the past that didnt really catch on, and I've already seen some people questioning whether a specific pipeline for raytracing is really the way to go.
     
  5. Grimreaper358

    Grimreaper358

    Joined:
    Apr 8, 2013
    Posts:
    789
    Zuntatos likes this.
  6. zenGarden

    zenGarden

    Joined:
    Mar 30, 2013
    Posts:
    4,538
    Yep, it will be available in their game.
    This means Unity could also integrated RTX GI in the HD pipeline soon :)
     
  7. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    In every other presentation it's heavily implied it's a glimpse into the future of rendering (ie not ready yet lol)
    IMHO it's marketing here, ie it's there for back cover quote, but not substantially to make the real difference, probably heavily aided by existing technique to meet visual target, which wouldn't be the first time tech has been use that way. I mean you barely see particle in game like in Infamous second son on ps4, "next gen hair" is still absent in most game, etc ... Controlled environment implementation is marketing back quote.
     
    theANMATOR2b likes this.
  8. Lexie

    Lexie

    Joined:
    Dec 7, 2012
    Posts:
    646
    I might have to put the new version of my GI on hold for a while. Our project is going through some larger changes to how the world is constructed that might have an impact on how lighting can/needs to be handled. Its also going in a direction that looks like its suited for runtime baked lighting anyway.

    I showed off some of my older attempts at Realtime GI over GDC and got a lot of people asking for that version.
    So I'm going to put some finishing touches on the single volume propagation method and release that in some capacity in the next few weeks. So at least something usable comes out of all this for now.

    First version might be limited to diffuse GI lighting in deferred. Ill try and clean up some of the specular attempts but from what I remember they were a little slow. I might be able to modify the base unity shaders so forward rendering is supported, but i need to look into that again. Ill have a better idea of all the features included shortly. Just trying to re-familiarize my self with the code base.

    I tried using my Octree with LPV but I needed to dilate the tree too much for it to be worth the speed trade off of accessing the octree. Still trying to explore a couple more methods for this but i don't think its going to work.
    The octree method is better suited for ray/path tracing and storing the results from all those rays in the octree, which is the direction my newest version is going. But that has a long way before its finished and might be limited to runtime baking.

    Anyway I just thought i'd let everyone know. Will have some stuff to show soon.
     
    Last edited: Apr 7, 2018
    knup_, tapawafo, scheichs and 13 others like this.
  9. Shinyclef

    Shinyclef

    Joined:
    Nov 20, 2013
    Posts:
    505
    Understood. Because it won't be sparse, do you expect it to only support a fairly small volume in terms of grid size/length, or is this the version that included the cascades for larger worlds?
     
  10. Lexie

    Lexie

    Joined:
    Dec 7, 2012
    Posts:
    646
    Size is limited from 64 to 256, you can assign a 256, 128, 256 volume though as you normally don't need the height.
    Cascades really didn't work very well so i wont be adding those. The cost to voxelize the larger cascade was too expensive, That's was one of the reasons i tried out the sparse data structure as it was able to voxelize the area over time.

    I'll have another look over the cascade version and see if there is anything worth salvaging. But honestly the light bleeding and light popping as you traveled the world was a little insane with any slightly complex structures.
     
    MarkusGod likes this.
  11. OnlyVR

    OnlyVR

    Joined:
    Oct 5, 2015
    Posts:
    55
    I have to confess I am addicted to GI. Give me it, please!!! :)
     
  12. buttmatrix

    buttmatrix

    Joined:
    Mar 23, 2015
    Posts:
    609
    Last edited: Apr 11, 2018
    castor76 likes this.
  13. Lexie

    Lexie

    Joined:
    Dec 7, 2012
    Posts:
    646
  14. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    Ken silvermann (Build engine creators) released the source code of his super fast cpu voxel renderer:
    http://advsys.net/ken/voxlap/pnd3d.htm

    In case it's any use.

    Here are his comment on optimization tricks:
    Rendering tricks:

    I often get asked what makes PND3D's rendering so fast. You think you can do
    as good as me? Well, you better have a thorough understanding of assembly
    language, multithread techniques, and GPU shaders. I will do you a favor and
    summarize all the tricks used in PND3D:

    * While not necessarily a trick, it should first be stated that PND3D uses a
    standard 8-way octree. Each non-leaf node is a 64-bit structure containing:
    char chi: child byte mask: 1=has children - must recurse, 0=pure air or
    pure solid - don't recurse), and a solid byte
    char sol: solid byte mask: 1=pure solid inside, 0=not pure solid inside
    char mrk, mrk2: dummy bytes for structure alignment
    int ind: an index pointer to the first child node. There can be anywhere
    from 0 to 8 children depending on the child mask.
    During rendering, the octree is visited in front-to-back order, depth-first.
    This 'sorting' helps speed up rendering by allowing occlusion queries to
    skip large sections. Again, this is probably nothing new here, but it needs
    to be stated.

    * I calculate bounding rectangles on screen using a single 'divps' instruction
    during projection. Each cube's bounding box is determined by only 4 of the 8
    vertices of a cube. These 4 vertices can be known in advance based on which
    rectangular tile of the screen it is inside.

    * The screen is divided into large rectangular tiles (typically around 16-48
    tiles per screen). This is done for 2 reasons:

    1. To calculate bounding boxes of cubes more quickly. The brute force
    method to calculate a bounding box of a cube is to transform and project
    all 8 points and then find the minimum and maximum x and y coordinates.
    This can be sped up considerably if you happen to know which 4 of the 8
    vertices actually contribute to an edge of the bounding box in advance.
    In fact, it turns out large sections of the screen have the same 4
    vertices contributing to the same edges. Even better, these sections are
    axis-aligned rectangular tiles on the screen.

    These tiles are determined by the 3 'vanishing' points of the voxel grid
    (and on the front half of the screen space). To calculate a vanishing
    point, imagine trying to render a cube really far away on each axis,
    such as at voxel grid location: (0,0,-1e9), (0,+1e9,0), etc.. If you
    projected these to screen coordinates, you would get 6 sets of x/y
    coordinates. Simply cut the screen horizontally and vertically (whenever
    they happen to be within the bounds of the screen - most aren't), and
    you will have your screen nicely divided into areas where a cube has the
    same 4 vertices generating the same 4 edges of the bounding box.

    2. For multithread optimization. I make a few additional horizontal cuts
    (about 16). Each tile is rendered as an independent job, clipped to the
    tile like it was a viewport, and then rendered from the root of the
    octree. So because of these cuts / tiles, some voxels along the edge can
    actually be rendered multiple times, but it's well worth it.

    * I do an occlusion query for every non-leaf node of the tree that contains
    some solid. If the projected bounding box is larger than 32 pixels of width,
    I exit early and simply assume it's visible and visit its children anyway.
    Note that there are not very many large or nearby octree nodes, but this
    does skip a lot of needless screen area when processing the query.

    For a bounding box of 32 pixels of width or less, I have a bit array (cover
    buffer) - 1 bit per pixel, where each bit determines whether that pixel has
    been rendered or not. Then using an 'and' instruction, I can test 32 pixels
    simultaneously. If the region is found to be fully occluded, I skip it and
    all its children of the octree.

    * For leaf nodes only (i.e. solid surface voxels being rendered), I visit each
    pixel of the bounding rectangle and perform 6 dot products in the inner loop
    using 2 'addps' and 2 'movmskps' instructions (25% waste) to determine
    whether the pixel actually falls inside the projected cube. Note that a
    cube, when projected to screen, results in a 4- or 6- sided convex polygon.

    * The above steps are all done on the CPU. A rendered pixel on the screen is
    not your typical 32-bit ARGB color. Instead, texture mapping is done during
    a second pass. I write a 64-bit pixel, which contains 12 bits x, 12 bits y,
    12 bits z, and a 28-bit index to the leaf node which holds information about
    the surface voxel. The GPU is given the voxel position and a pointer to its
    surface structure. The shader basically raytraces to a cube which it already
    knows it will hit. The GPU determines what face of the cube with a few
    simple cases. The GPU then does the work of texture mapping with fancy
    filters like bilinear and mip mapping.

    Meanwhile I'm still stuck trying to understand how it did his 3d wavesurfing with column scan in his old voxlap engine :(
     
    brisingre and arnoob like this.
  15. Lexie

    Lexie

    Joined:
    Dec 7, 2012
    Posts:
    646
    Thanks for the link, Interesting system, I don't see how tracing on the cpu is faster then the GPU though. I feel like standard octree tracing on the GPU would be faster then this system. Also if you're making a game you need the CPU to handle the game code, not the render loop. Your GPU is sitting there doing simple texture mapping. Seems like a huge waste.

    CPU is good for making the octree structure take up as little memory as possible though, Might be worth sending the voxel data to the CPU to pack into and octree and send the packed data back to the GPU.
     
  16. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    I can't comment on the viability, this is way above my skills, but since it's rendering very fine voxel on 1080p at 60fps, which some struggle to even achieve on gpu, and that it's mofo Ken silverman known for his insane optimization (down to assembly), I assume it was worth a look. Gpu are a 1.4 to 2.25 slower that a single thread on cpu for this class of problem (tree).

    I think most of the performance comes from the assembly trick and low level optimization, not really just from the hi level structure. People on r/voxelGameDev are just baffle by the performance, using the same trick they barely achieve interactive framerate on cpu.
    https://www.reddit.com/r/VoxelGameD...silverman_releases_successor_to_voxlap_pnd3d/

    The idea is that, if it works for fine voxel grid, since you are using a way coarser grid, that it might be viable (it's already below budget with the fine grid).

    But the catch is that he only render visible voxel, and you need maybe more to render bounce light from source not on the screen, and that the said insane optimization can be opaque to read in teh code.
     
    Last edited: Apr 17, 2018
  17. dahuilang

    dahuilang

    Joined:
    Jun 5, 2014
    Posts:
    32
    do you know "Sky Master" on assetStore ? it has a GI system, what difference between yours and it ?
     
  18. chiapet1021

    chiapet1021

    Joined:
    Jun 5, 2013
    Posts:
    605
    Sky Master Ultimate includes a GI proxy system, which basically dynamically manages additional point lights within the camera's FOV to fake global illumination. It's not the same as true GI and does not provide nearly the same quality of lighting nuance as true GI--although it is, at least, 100% dynamic.

    I know Nasos has been working on SEGI optimization to incorporate it into Sky Master. I'm not sure if that will be in his upcoming v4.0 update.
     
    MarkusGod and Lex4art like this.
  19. dahuilang

    dahuilang

    Joined:
    Jun 5, 2014
    Posts:
    32
    so there is no useable realtime GI for now?
     
  20. brisingre

    brisingre

    Joined:
    Nov 8, 2009
    Posts:
    277
    There is no useable realtime GI solution for Unity right now.
     
    buttmatrix likes this.
  21. Shinyclef

    Shinyclef

    Joined:
    Nov 20, 2013
    Posts:
    505
    @Lexie I am completely unsure of the right balance between "give the man some space" and "show your enthusiasm for his work", so I err on the side of caution and be mostly quiet in fear of flooding the thread haha. I have a feeling others might be the same?

    But right now I'd like to show my enthusiasm for your work...

    Is everything going well? If I were you I'd probably have felt some burn out several times along this journey. I have been tempted to start digging into LPV experiments myself but seeing how much time you've spent on it yourself felt like maybe it's a smarter move to just buy a solution from you lol.

    I've also been wondering if you plan to release source along with the asset (assuming that you do end up releasing an asset). I purchased SEGI back in the day and had to make some adjustments to the voxelization step so that it would work with my game, hence my interest.
     
    skyecrack and DMeville like this.
  22. yohami

    yohami

    Joined:
    Apr 19, 2009
    Posts:
    124
    Sonic Ether's solution is usable, though unfinished

     
  23. Shinyclef

    Shinyclef

    Joined:
    Nov 20, 2013
    Posts:
    505
    I feel that it's only really usable on some very restricted situations such as very small scenes. I spent a lot of time trying to make it usable for my situation but ultimately gave up. But perhaps it is usable for a few people.
     
    buttmatrix likes this.
  24. TchPowDog85

    TchPowDog85

    Joined:
    Dec 29, 2015
    Posts:
    124
    This is awesome. I love indirect lighting and ambient lit scenes. It just has adds a whole new level of realism. What's the performance like on this so far?
     
  25. brisingre

    brisingre

    Joined:
    Nov 8, 2009
    Posts:
    277
    Usable is a subjective term, I suppose, but that doesn't seem to be the consensus. (I don't have a ton of personal experience with SEGI, just 20 minutes with the demo deciding it couldn't do what I needed.)
     
    zenGarden likes this.
  26. Mauri

    Mauri

    Joined:
    Dec 9, 2010
    Posts:
    2,665
    It's been a while. Any news, Lexie? :)
     
  27. ekergraphics

    ekergraphics

    Joined:
    Feb 22, 2017
    Posts:
    257
    Ouch. We were waiting for a forward solution that worked well with VR.

    Is this a case like with the Shader Forge author, that you'd rather be a game developer than an asset store developer? I'm not putting any judgement at all in that question, I just want to know what to expect, since it was a bit unclear earlier.

    It seems you're planning to release a version now without any support?

    (For context, I'm one of those enterprise developers who would gladly have accepted a $200 price tag, with no issue, but we already had that discussion.)
     
  28. Lexie

    Lexie

    Joined:
    Dec 7, 2012
    Posts:
    646
    I've been working on a simpler version that just supports 2D GI. Some of the speed improvements I'm trying to add are kinda complex for me to work with, Doing a 2D version first is a lot easier to work with.

    There are still some optimizations to do, but it's already working at realtime frame rates. Next step is to work on some optimizations that can also apply to the 3D version. I can use this as a test bed to see what kind of speed improvements some of these optimizations give me, and then work on a 3D method for any that offer good results.



    Everything in this scene is dynamic. A signed distance field is generated to speed up path tracing.
     
    ftejada, MarkusGod, Martin_H and 12 others like this.
  29. brisingre

    brisingre

    Joined:
    Nov 8, 2009
    Posts:
    277
    That's totally sweet-looking! I love it.

    How do more complicated shapes affect performance? Is generating the distance field really expensive with things other than rectangles? Or is it pretty much just per-pixel?
     
  30. Lexie

    Lexie

    Joined:
    Dec 7, 2012
    Posts:
    646
    It's generated per pixel, works with any shape, Its a set cost to generate regardless of the amount of solid pixels.
    extremely complex scenes would cause the SDF to become less useful at speeding up the tracing. The plan is use a method called global lines instead. The SDF might be used to speed the process of generating the global lines so i wanted to make that first.

    The SDF costs around 1.5ms to generate a 512x512 texture on my 660 GTX.
     
  31. Adam-Bailey

    Adam-Bailey

    Joined:
    Feb 17, 2015
    Posts:
    232
    That looks great, I want someone to build a Breakout clone using it. :D
     
  32. PassivePicasso

    PassivePicasso

    Joined:
    Sep 17, 2012
    Posts:
    100
    I'm really excited about all your work here Lexie, I hope that you can release some version of HXGI for testing.

    Lighting is such an important concept, and since I'm personally only looking at doing procedural content my options right now are exceptionally limited, both in quality and scope, HXGI is the latest dream tech for me and I've been eagerly awaiting its arrival, in any form it takes.
     
  33. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    There is a new ubisoft talk on the gdc youtube channel about global illumination, it's baked, but use the simpler half life 2 "probes", does anyone has some technical insight about it relative to solution presented along this forum?

    The thing I was wondering during it, is could we use a single probe (cubemap) per "convex space" (instead of full voxelization) like we do for box projection's reflection, but feeding back the result on th environment to the probe to get cheap GI?
     
  34. zenGarden

    zenGarden

    Joined:
    Mar 30, 2013
    Posts:
    4,538
    You mean this tech used in AC 4 ?
    https://bartwronski.files.wordpress.com/2014/03/ac4_gdc_notes.pdf
    Or Frostbyte
    https://media.contentapi.ea.com/con...-precomputedgiobalilluminationinfrostbite.pdf
     
  35. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    The division:
     
    buttmatrix and Mauri like this.
  36. Adam-Bailey

    Adam-Bailey

    Joined:
    Feb 17, 2015
    Posts:
    232
    Are you describing something like LPPV's?
     
  37. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    LPPV is a grid of sh probe, I'm talking something with RT cubemap probe in two buffer, some extra data and a feedback loop in the light rendering of the box projection.
     
  38. zenGarden

    zenGarden

    Joined:
    Mar 30, 2013
    Posts:
    4,538
    This is also what have been used in last Far Cry games. Unfortunately no indies are thinking in working on a GI solution based on advanced probes system, so you are on your own.
     
  39. Lexie

    Lexie

    Joined:
    Dec 7, 2012
    Posts:
    646
    8 Hour bake time. just have to convince our player base to wait while the level loads.
     
    xVergilx and neoshaman like this.
  40. Lexie

    Lexie

    Joined:
    Dec 7, 2012
    Posts:
    646
    Overgrowth used a system like that, check out their videos.
     
    neoshaman likes this.
  41. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    The only video (and post) that reference cubemap and GI is the 219 and a 8 years old blog, and they use it (respectively) for global ambient and local ambient, and it's akin to what unity did with the old sh system (tetrahedral), nothing with real time rendering feedback, as far as I can tell.
    http://blog.wolfire.com/2016/03/Overgrowth-a219-video-changelog
    http://blog.wolfire.com/2010/03/Image-based-ambient-lighting

    Also I'm I wrong in thinking that all the baking is essentially due to the light integration necessary?
     
  42. Lexie

    Lexie

    Joined:
    Dec 7, 2012
    Posts:
    646
    Sure its used in a baked way for overgrowth, but there is nothing stopping them from updating X probes per frame to get realtime GI using the same system.
     
  43. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    If it's true for them, it also true for unity, and in fact we shared a video that did that in a way. The system I was proposing is a bit different though.

    But anyway, I was kind of fishing for information to design a system I could make, even if it's not as complex, accurate or complete as yours or SEGI lol :p And since you haven't released anything I can't study it either lol. I try to read anything about it to start something at my own level.
     
  44. Reanimate_L

    Reanimate_L

    Joined:
    Oct 10, 2009
    Posts:
    2,788
    http://codeflow.org/entries/2012/aug/25/webgl-deferred-irradiance-volumes/
    this?
     
  45. zenGarden

    zenGarden

    Joined:
    Mar 30, 2013
    Posts:
    4,538
    You don' t want to wait, your game is outdoor open world, CryEngine SVOGI is one solution with absolute no baking and it looks good.
     
  46. Lexie

    Lexie

    Joined:
    Dec 7, 2012
    Posts:
    646
    My game is indoors, SVOGI does not work well for that, we use a lot of emissive surfaces, So SVOGI has way too many issues for our use case. Also its slow.
     
    ekergraphics likes this.
  47. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
  48. zenGarden

    zenGarden

    Joined:
    Mar 30, 2013
    Posts:
    4,538
    The best games i played lastly are outdoor games ... Uncharted Legacy, God Of War , Horizon, Zelda Breath of The Wild ...
    Outdoor games use different techniques from Svogi to advanced probes systems, for indoor games you can still bake lightmaps with Unity new lightmapper.

    Svogi in indoor is not perfect but looks better than no global illumination at all and there is absolute no baking and good performance.


    UniEngine global illumination is also amazing for indoor



    I tried Godot 3 and the global illumination looks great and it works pretty well, it's realtime and based on voxel cone tracing, and it's about performance.



    Your goal is indoor quality GI without baking so i think it's harder to make something good and performant, than making an outdoor approximative GI solution with no baking.

    Anyway, i'm not waiting for tech to come in Unity, it is already available elsewhere and i use something existing and prooven when i need global illumination with no baking.
     
    Last edited: Jun 7, 2018
    buttmatrix likes this.
  49. Lexie

    Lexie

    Joined:
    Dec 7, 2012
    Posts:
    646
    I've tried Godots GI, runs at < 30 fps on my machine. So its not performant. I need the world to be dynamic, as soon as you make the volume update its data its way too slow.

    Also all those scenes are specifically setup to hide issues with the GI they use, I am making a procedural game,
    I cant spend 8 hours baking the lighting.
    I cant insure that our complex level gen AI doesnt make situiations that cause the GI to show its flaws.
    I cant have light bleeding from one room to another.

    There are a lot of problems that all these GI solutions you keep bring up have that make them not suited for my game.
    I have already tried them, They don't work for me.

    "The best games i played lastly are outdoor games" I dont see how this is relevant at all.
     
    Last edited: Jun 8, 2018
  50. SemaphoreStudios

    SemaphoreStudios

    Joined:
    Jan 14, 2010
    Posts:
    111
    Hello Lexie,

    We are in the same boat as you. Our game is an infinite procedurally generated survival game:


    and looking for a suitable GI solution that handles the cases you mentioned is what led us to this thread about a year ago. We are excited to try out HXGI with it since you mentioned a while back that you will release some version of it to the public.

    Our game is coming to PS4, Xbox One but depending on performance I would think HXGI would be reserved to Ultra quality setting on PC.

    If you ever need to test HXGI with our use case please let us know.
     
    Quique-Martinez likes this.