Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

SEGI (Fully Dynamic Global Illumination)

Discussion in 'Assets and Asset Store' started by sonicether, Jun 10, 2016.

  1. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    Well, I've got now got compute buffers sort of working... One demo scene works... Everything else looks like an acid trip through it... But progress is progress.
     
    hopeful and Shinyclef like this.
  2. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,492
    Well the lpv shadertoy code I shared earlier to the whole 1d index to 3d texture :D
    https://forum.unity.com/threads/segi-fully-dynamic-global-illumination.410310/page-44#post-3828637
    Well the lpv shadertoy code I shared earlier to the whole 1d index to 3d texture :D
    the code is visible in teh spoiler
     
    Shinyclef likes this.
  3. bluesixteen

    bluesixteen

    Joined:
    Nov 17, 2016
    Posts:
    28
    Just wanted to say thanks a lot for your effort to make better product for everyone @Ninlilizi
     
    hopeful, neoshaman, Ninlilizi and 2 others like this.
  4. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,492
    I was thinking I should comment the code I have highlighted ... :confused:

    The basic are simple, so let's start with a 2d example, how do we know what is the unique 1d position (or index) of a cell in a 2d 10x10 grid? Well if we start at the first row, that's simple we choose a corner (say upper left) as the starting point and a reading direction (toward the right) and we simply increase the index, first is 0 last is 9, so if we go to the second row, the first one will be 11 in the same reading direction ... and the last one will be 19, the third row will start with 31 and end with 39.

    Starting to see a pattern? assuming integer, a cell's index is simply the index = column number + (row number * total column)
    Therefore we can find
    - column number = index - (row number * total column)
    - row number = (index - column number) / total column
    A faster way is to use the reminder with modulo (assuming integer)
    - column number = index mod total column
    - row number = index / total column
    In 3d its the same but with an added slice size for the height value. (we count cells by cell, then row by raw and finally slice by slice)

    therefore
    Code (CSharp):
    1. vec4 fetch_lpv(vec3 p, float channel) {
    2.     p = clamp(p, vec3(0.5), lpvsize - 0.5);
    3.  
    4.     float posidx = packfragcoord3(p, lpvsize) + channel * (lpvsize.x * lpvsize.y * lpvsize.z);
    5.  
    6.     vec2 uv = unpackfragcoord2(posidx, iChannelResolution[0].xy) / iChannelResolution[0].xy;
    7.  
    8.     return texture(iChannel0, uv);
    9. }
    This code basically sample the LPV texture, that is the 3d volumes, except shadertoy as none of that, so they go through a complex set of conversion from 1d to 2d to 3d. I'm not sure why there is the need for 1d as of now, I haven't thought about it, but I guess it's similar reason why Ninlilizi suddenly need this!

    Anyway posidx is the one dimentional position, we know it's a float and the function that feed it has a float return, so we can deduce that packfragcoord3 basically translate a float3.xyz point to a single value using the size of the 3d volume. Now depending on your need the order of importance of each xyz might need some swizzling permutation. The part about channel*lpvsize is simply an offset to the source texture, probably because the 3d data is store sequentially, channel being the selector of the offset. UV is using the calculated index to sample another texture by translating the 1d position to a 2d position (it's using procedural volume objects), my guess is that it's simply the unfolded 3d value onto 2d slices.

    This should clue you what the pack and unpack function does, pack simply turn the given 3d position to the index, and unpack take the index back to teh 3d or 2d position needed. They all have the first parameter being the coordinate to convert and the second parameter being the size of the target conversion data.


    I have been thinking about why SDF failed, my conclusion is that we do not understand clearly what the occlusion accumulation actually do.

    upload_2018-11-8_3-17-5.png
    source: http://fumufumu.q-games.com/archives/TheTechnologyOfTomorrowsChildrenFinal.pdf

    - Voxel data represent statistical geometry contribution to the light data, instead of storing the geometry we store its lighting result, ie in each voxel the data represent the sum of all contribution of all intersecting geometry (triangles) with direct lighting, but we loose the geometric data itself. This data is directionless but the direction is given by the tracing and the locality of the data relative to the tracing source. It's more obvious with mipmap, where more data get included together in a single value.

    - The alpha store partial occlusion aka the statistical visibility, basically since voxel are only partially filled, we need to keep track of the potential coverage as the alpha value, so as we pass voxel the statistical coverage increase up until it's likely to cover all light. My hypothesis is that the alpha accumulation is the reason why the SDF failed, By jumping immediately to a potential end point, we miss the correct coverage, but I'm not sure how ...

    STUPID Note (that are dependent on cracking occlusion)
    - if we want to separate ray and lighting, instead of the result of the lighting (rgb) we would store the albedo (rgb)+emission(rgb)+worldnormal(rgb) (+bounce?), ie the render data, and then do the lighting when necessary in the cone tracing function. We wouldn't need to revoxelized to change light, but effectively as ninlilizi said, more VRAM usage.
    - Even more expensive would be to trace, and for every traced source store the resulting render data on the source, that is renderdata x number of ray, now we can simply render lighting ONLY by sampling the stored render data which faster (no trace). That's kinda the same as caching the result of tracing ninlilizi already did, but with full data instead of static lighting.
    - Using SH is simple too, a SH1 (4floats, SH2 is 7floats) would be enough to discriminate 6 faces of a cube, at the light gathering we would use the normal of the geometry (or the vector voxel center to triangle centroid?) and apply a
    - I wonder if there some representation and optimization that would make it viable ...
    - observation: a 3d texture footprint of size 256 = 2d texture footprint of size 4096
    http://fumufumu.q-games.com/archives/TheTechnologyOfTomorrowsChildrenFinal.pdf
     
    Shinyclef likes this.
  5. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    Had a measure of success today.
    The path trace cache now uses Structured Compute buffers for it's data. Which nets a small performance benefit. It's now in the repo head. It runs faster on my system... But after having to learn about GPU cache structures and the importance of maintaining data structures that align to the cache. It dawns on me it might not be faster on GPU's that lay out their cache differently. So I'm curious to hear of other peoples experiences with this version.
    There's no change in VRAM usage... I discovered a frustrating quirk that meant that maintaining cache alignment means encoding the data structure into something smaller would void the alignment and trash performance. But as the cache now stays in VRAM and doesn't touch system RAM. It should use a little less system RAM.
     
    ksam2 likes this.
  6. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    > I'm not sure why there is the need for 1d as of now

    Basically Compute Buffers only allow 1d data structures. And having the option to put stuff in a buffer is advantageous as the memory remains on the GPU, as opposed to a RWTexture. Which causes a CPU readback when it's finished which eats cpu time, gpu time and bus bandwidth. Which is why I embarked on a trip through buffer territory. I'd imagine the benefit is going to be a bigger deal on lower end systems than the meagre benefit on my over the top workstation.
     
  7. elbows

    elbows

    Joined:
    Nov 28, 2009
    Posts:
    2,502
    Are you sure about that? I have a fluid sim that uses a whole bunch of 3D rendertextures in compute shaders and although I havent tried to detect that specific bottleneck, I really doubt it would perform as well as it does if it was sending all that data back to the CPU every frame in the manner you are suggesting. Also if this was happening automatically all the time, and could not be disabled, what would be the point in the AsyncGPUReadback stuff, which is designed to let people send data from GPU to CPU without any stalling?
     
  8. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    Interesting... I remember reading about performance differences somewhere recently... Though I can't find the source now... Or maybe it's just a graphics.blit related issue? Guess I need to do more research into the subject
     
  9. ksam2

    ksam2

    Joined:
    Apr 28, 2012
    Posts:
    1,079
  10. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    Hmmm... Ok, maybe I'm confusing cpu and cache readbacks?
    Though I know rendertextures are automatically read back at /somepoint/

    Though when used in the right way buffers can be faster aside from that. Your skipping texture samplers and provided its' in the local cache and not (gpu's) global memory that's saving some cycles.
     
  11. elbows

    elbows

    Joined:
    Nov 28, 2009
    Posts:
    2,502
    Dunno, there is so much detail and architecture to understand and I only understand a fraction of it. I do know that when I converted the fluid sim to use textures instead of buffers, it got faster, not slower, but that might be because I was able to take advantage of the samplers ability to do bi-linear filtering rather than having to use compute shader code to do the filtering manually.
     
  12. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    I was reading up on that just this morning. After my initial buffer implementation was overall slower.
    Then came across this which tipped me off that I should be learning about cache and and memory architecture, and after making a few tweaks to fit around the cache boosted my performance back to where I'd expected it to be.

    It's certainly a rabbit hole of ever expanding things you need to understand to not shoot yourself in the foot when working with buffers.
     
    Shinyclef likes this.
  13. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    New mini feature... As it turns out the visual popping of the GI (which is let's face it pretty jarring) is caused by the rendering of the Sun Depth Texture across the cascades. And that problem is hard to fix. I've added a little temporal filtering to the sun depth texture. While it doesn't solve the problem. In the mean time, filtering over 60 frames causes all but the worst offending glitches to fade in and out smoothly... I will of course implement a proper fix later on, whence I've figured out how.
     
    hopeful, Shinyclef and neoshaman like this.
  14. Vagabond_

    Vagabond_

    Joined:
    Aug 26, 2014
    Posts:
    1,148
    Getting black scenes :D
     
  15. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    I dun goofed.
     
  16. Vagabond_

    Vagabond_

    Joined:
    Aug 26, 2014
    Posts:
    1,148
    Hmm, i restarted and was the same until i pressed play !
    Than it started working !
     
  17. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    Yeah... It takes the initial bunch of frames rendered to converge. Clicking in the game view and wiggling the mouse for a second will also bring it to life.

    Also, I dun goofed... Working on improving for the next iteration already.
     
    Shinyclef and Vagabond_ like this.
  18. IanStanbridge

    IanStanbridge

    Joined:
    Aug 26, 2013
    Posts:
    334
    Hi Ninlilizi

    I think the only thing you need to worry about with GPU buffers is that they are an exact multiple of 128 bytes. Any GPU will be happy with that as it is the smallest texture size they expect to work with so all GPU's will be optimized for it. Also it wants to be sent a large amount of jobs of the same type at the same time so it has as few different types of jobs to work on but has ones that are massively parallel. That's why a command buffer is in effect an array of jobs of the same type and size for it to do at the same time.

    Also a render texture is stored on the gpu not the cpu. The only thing it writes back to the cpu is a value to say it has finished updating the render texture rather than the texture itself back to the cpu. The cpu will still stall while waiting for the gpu to finish the render texture though unless it has been set to be asynchronous. Also if working on data with a cpu that is using a render texture it should ideally be in a form that both gpu and cpu can interpret directly. There are some texture formats that only the gpu can process and so need to be converted into a different format on the cpu before modification which is very bad for performance.

    If you are working with textures a render texture is fastest if you are working on mathematical calculations a command buffer is fastest.
     
  19. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    Understood. Thank :)
     
  20. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    Right... Small improvement to the temporal glitch filtering.

    I rekon I'll still be finding things to learn when I'm 90 with this shadery rabbit hole :p

    Totally picking the low hanging fruits of other problems right now.... That voxelizer is intimidating enough of a beast the urge to procrastinate on window dressings it strong.
     
    Shinyclef and neoshaman like this.
  21. Mauri

    Mauri

    Joined:
    Dec 9, 2010
    Posts:
    2,663
    (inb4 nothing to discourage you :) Just my thoughts.)

    The more I read, the more I'm asking myself, if it would be more worth starting SEGI entirely from scratch (or even trying to implement other solutions (like VXGI from NVIDIA) into Unity). You doing great, but sometimes it feels like a ship that's under constant canon ball fire and you just applied a provisional fix to 1 out of 100 other holes that are already there...
     
  22. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    Eventually I was going to head in that direction... My plan was get SEGI to a point it sucks the least possible for what it is... Then embark in a more radical direction with it... I originally picked up SEGI to work with as a learning experience. So figuring out how to patch the zillions of holes is providing me with valuable experience in how to do stuff.

    Regarding VXGI... I've actually been going thru it among other things. The hardest part there is figuring how to feed it the data it needs.
     
    Shinyclef and neoshaman like this.
  23. cheeze_12

    cheeze_12

    Joined:
    Aug 25, 2017
    Posts:
    3
  24. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,492
    yeah SEGI was for minecraft originally, but his spatialized dynamic voxel audio system is what made me drools
     
  25. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    What amazes me... Is so many people have built their own GI's as part of their thesis or other educational projects and apart from a bit of open source code for other engines... Nobody who did so in Unity, thought they'd be a big market for releasing it as an actual project.

    With this in mind last night, I took a break from SEGI to 'proof of concept' of using that is out there as a basis to build my own GI from scratch (Don't worry, I promised to keep working on SEGI too. And nin keeps her promises).... And last night build my own Reflective Shadow Map... So I rekon... Given that only took me a few hours, I'm feeling I've maybe learnt enough I'm approaching the requisite skill level to produce my own LPPV solution.

     
    ftejada, PROE_, Shinyclef and 4 others like this.
  26. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,492
    LPV (light propagation volume) or LPPV (light probe proxy volume)? :p

    Yeah I know doesn't matter they are almost the same thing anyway (grid of SH)

    /jk

    You will be the queen of GI soon :)
     
    Ninlilizi likes this.
  27. jefferytitan

    jefferytitan

    Joined:
    Jul 19, 2012
    Posts:
    88
    Hey, just curious, I'm looking at the demo scenes from the Github repository. The Sponza Atrium is very dark, and the SEGI Low Poly scene very visibly updates the lighting as you move around. Are these known issues, or are the scenes set up a bit wonky? Or have you set up the scenes as worst cases for testing?
     
  28. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    These are known issues... And hard to solve as they are... Which is why I'm investigating other approaches to try and better understand how to solve such things.
     
    Shinyclef likes this.
  29. jefferytitan

    jefferytitan

    Joined:
    Jul 19, 2012
    Posts:
    88
    I'm guessing the Low Poly one is something to do with revoxelisation and cascades? How about the Atrium one, is that an issue with directional lights?

    What's an optimal situation for the code as it stands? Limited distances and all lighting by emissives?
     
  30. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    The low poly suffers for being entirely vertex coloured... Having textures in place goes a long way to masking that issue.

    The Atrium... I'm not sure... Are you using the latest release package or did you pull from the repo head?
     
    Shinyclef likes this.
  31. jefferytitan

    jefferytitan

    Joined:
    Jul 19, 2012
    Posts:
    88
    Yeah I can see that vertex colored wouldn't help. Would the distance issue be smoothed a bit if when rendering using the final cascade you lerped to a fixed ambient?

    Good point, I believe I was using the head. I'll try the last release later.
     
  32. Shinyclef

    Shinyclef

    Joined:
    Nov 20, 2013
    Posts:
    502
    It amazes me also. Plenty of blogs and youtube vids of solutions out there, but none on the asset store is pretty surprising.
    I like :). Very nice. A lot of people are interested in what you have to offer :).
     
    chiapet1021 likes this.
  33. Zuntatos

    Zuntatos

    Joined:
    Nov 18, 2012
    Posts:
    612
    Typically the research versions come with significant problems in one or more areas like limited render range, bad performance, artifacts with moving objects, artifacts with moving lights, artifacts with both, too high memory usage or requires too much work to integrate (doesn't work with other shaders / rendering pipelines, needs custom all).
     
    hopeful likes this.
  34. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    Still, I'm finding them of educational value... So there's that.
    I managed to build a mostly working implementation over the last few days. I just lack inspiration for sane ways of getting data into the volumes at the start of the process.
     
    Shinyclef and neoshaman like this.
  35. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    Shinyclef, Vagabond_ and neoshaman like this.
  36. Shinyclef

    Shinyclef

    Joined:
    Nov 20, 2013
    Posts:
    502
  37. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    scheichs, ftejada, hopeful and 6 others like this.
  38. Shinyclef

    Shinyclef

    Joined:
    Nov 20, 2013
    Posts:
    502
    Stable AND fast? Yes please! But too good to be true?
     
  39. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    Too good to be true?
    One way to find out. Worth the gameble of a few days on the chance it's golden
     
    neoshaman and Shinyclef like this.
  40. Shinyclef

    Shinyclef

    Joined:
    Nov 20, 2013
    Posts:
    502
    Keep us updated! But I know you will heh.
     
  41. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    So... I merged this new voxelizer with the SEGI lighting calculations and my cone cache thing.... This voxelizer actually understands and works with actual lights in the scene rather than being limited to emissives which is nice... With some tweaking of the algorythym this conglomeration of code produces results like this: (also, note my gpu's clocked down some because it's that much faster)

     
    soleron, ftejada, chiapet1021 and 7 others like this.
  42. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    Also... 90FPS in VR at 70% load on a GTX1060.
     
    soleron, ftejada and Shinyclef like this.
  43. Shinyclef

    Shinyclef

    Joined:
    Nov 20, 2013
    Posts:
    502
    Any popping as the camera moves?
     
  44. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    Nope... Lighting resolves and adjusts smooth as butter... Mostly because the 'science of a finely tuned volume knob' is a thing and I dig psychology as much as code.
     
    neoshaman and Shinyclef like this.
  45. Shinyclef

    Shinyclef

    Joined:
    Nov 20, 2013
    Posts:
    502
    Very much so looking forward to trying this, that sunlight popping was the deal breaker for me. Let's talk min specs... What are the vram requirements on say a 'high' and a 'low' setting, roughly speaking?
     
  46. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    VRAM usage is a little less... But mostly because it needs less render textures to function this way.
     
    Shinyclef likes this.
  47. Shinyclef

    Shinyclef

    Joined:
    Nov 20, 2013
    Posts:
    502
    How many mb vram are we talking on for say medium settings. And what is performance like now in terms of milliseconds?
     
  48. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    I've not done the exact math at this point beyond noting that yep... faster and a bit less ram hungry... It's still in way too much of an experimental stage to give metrics... Though... Think twice as fast with about 2/3rds the VRAM.
     
    elias_t, neoshaman and Shinyclef like this.
  49. jefferytitan

    jefferytitan

    Joined:
    Jul 19, 2012
    Posts:
    88
    Is it at a stage it can be peeked at, or should we hold off?
     
  50. Ninlilizi

    Ninlilizi

    Joined:
    Sep 19, 2016
    Posts:
    294
    It's still far too experimental... More proof-of-concept.. To see if it's worth pursuing than a workable solution.
    It's basically a weeks work... If I don't get up and wonder away from my desk to turn it into something ready to show off... Unless you really want to live dangerously?