Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Exploration of custom diffuse RTGI "approximation" for open gl es 2.0 and weak machine

Discussion in 'Global Illumination' started by neoshaman, Sep 29, 2019.

  1. neoshaman


    Feb 11, 2011

    I have been conceptualizing a model of slow RTGI, originally I was just looking to have mock time of day shadows and volumetric hair rendering hacks, but exploring those two subjects made me realize I could use ideas to get some GI approximation. I want to explore many implementation to see how they translate visually, and how useful they can be.

    There is one key concept: MAGIC :p (Mapping Approximation of Global Illumination Compute). Basically we store surfel like elements on a texture, that compute light, then distribute their data to each other through "visibility textures, that store address of other visible surfels, with query relative to a point. It form a cyclic graph that propagate lighting recursively. It turn the whole problem to a sampling issue, simplest implementation only need two texture fetch per ray (address then data), that's 4 rays on ogles 2.

    Both texture are approximation of the geometry, and there is many way to translate these ideas, it can be as accurate as we want to be, given multiple practical trade off, like precomputing quality visibility off line. The main limitation is that all geometry to update must "more or less" fit a single texture (there is way around that). It's also environment to environment lighting mostly (there is way around that too, using UV probe that allow dynamic to sample the lightmap), since it's designed for weak machine, expect rough result. It has, however, the benefit that we can spread compute over many frames, async from frame rates. It also render to a texture, so it's "baked" for once it nothing change.

    The first implementation I want to try is MAGICAL :rolleyes: (MAGIC Applied by Lightprobe), where the visibility is stored in box projected addresses probes. Which is a solution that didn't need offline baking and was compatible with procedural generation. There is multiple way to implement it, but I wanted to find a way to place the lightprobe automatically, researching that I found a way to do a voxelization pass on open gl es 2.0 storing occupancy of cells in bits. Ogles 2.0 don't have bit testing, I found a way to do that without an expensive LUT.

    This implementation of MAGICAL have expected shortcoming, most notably because it ruthlessly approximate the environment through Box projection, as seen in the schema below it can be wildly inaccurate.

    box projection GI 02.png
    The sample here miss an obvious occlusion, worse, due to the visibility of the cubemap, the ray actually BENT and go to a geometry instead of the sky ... There is probably way to slighty mitigate that ... with offline baking of bent cone (which disqualify procedural generation), or designing environment that match the limitation (which on weak machine would be better than nothing). We could also spend cycle raymarching the cubemap stored depth, to check false positive, but then it's limited to the current cubemap, if the ray escape the visibility we should probably hash again to another visibility, the technique is no longer cheap by those standard ... There is probably other way to reduce the approximation.

    We just hope it's good enough for the target machine, and artistic rendering that don't want their visual to devolve into flat ambient (for pcg) with harsh shadow, or make time of day update more lively. As it is async and render to a texture, it can also bake ambience and lighting at initialization time (can be a preprocess before the level) and allow to not compute lighting in shader, only using sampling. That probably make it fast enough if good enough, if devoid of any fancy.

    The recipe is as is.
    - I must ensure I can correctly generate data from the geometry to a lightmap unwrapped texture, which will store the surfels (albedo color, world normal, world position, shadow masking).
    - that surfel data will be used to create a lightmap GBUFFER that will compute direct lighting.
    - I'll try to manually place probe first. That is designing a test environment around a grid of probe position. Voxelization will need to be tested at a later date, as a way to automate placement. Ideally each pixel of the lightmap can hash the cell it will query data from, or store it in an index channel (up to 256). The difficulty is how to manage index of pixel inside a non empty cell, as the voxel resolution is magnitude bigger than the pixel.
    - Each probe must compute the UV lightmap projection of the scene, with UV color being the address of each point, creating a visibility of each surfel. I need to find a way to encode miss ray (not part of the scene, probably reserving 0.0). The lightmap normal will be used to compute the sampling rays over its hemisphere, which will be spread over time with a cycle counter.
    - The cubemap will also store the depth of each points in the geometry to compute an attenuation.
    - I need to see if I can project a temp 6faces cubemap texture to an atlas of 2D octahedron cubemap. To make it easier for the target machine.
    - I need to figure out if I can correctly accumulate lighting (direct and sampled) in a target GI lightmap that will be sample by objects.
    - Test adding a far field cubemap (position 0.0) for sample that goes beyond the region of the lightmap (miss ray). It's not a skybox per se, it store the surrounded scene lighting (maybe other scene tiles), if those scene update GI, and then are capture by the far field, they effectively transmit their GI lighting to the scene, that's a minor solution to the locality of the lightmap.
    - In theory, we can do the full BRDF on hit samples to get higher quality light, I'll avoid it in the beginning in this implementation for simplicity. Also it increase the counter of sample needed (query the albedo and normal, so 2 more fetch) which limit open gl es 2.

    This first implementation of MAGICAL will be limited in precision:
    - to a lightmap texture of 256x256, because I will use 8 bits UV for the probe. GI lightmap are traditionally very low frequency, enlighten precompute GI recommend 1m per pixel.
    There is actually 5 RGB map (albedo, normal, world position, direct light, GI accumulation) roughly equivalent to one 1024 map + one 256 then. There is to see if the need to merge direct lighting and accumulation is needed. Having a separate direct lighting is useful to not loss data in the update, and not having to recompute light to inject, it can be kind of a cache I think? EDIT: The count is probably 6 RGB map, as we probably need to do double buffering on the color accumulation.
    - to a cubemap atlas of 2048x2048 with 256 cubemaps (each cubemap being 128x128 and arranged in a 16x16 grid) the projecting cubemap will sample the scene at 64x64 resolution per faces.
    - The first example will probably not use all 256 cubemap slots, that's probably overkill.

    Assuming we would need all slots, the upper initialization update is 256 x 6x64x64 = 6 291 456‬ pixels, that's EXACTLY one 2048² texture + two 1024² one, but we project back to the atlas that is just 2048, the problem being that it need up to 1536 render to fill all slots! and we are oversampling by half! size 32 undersample by half, and I'm not sure how projecting to octahedron conserve accuracy, since the data must be exact as addresses. Fortunately we can do render one at a time spread over time, and once it's done, we don't need to update unless the scene change, and if we move, we would need to only update edges.

    Then we need to assess visual artifact if they are severe enough, see if we can evolve the solution to take care of them, or if we need to design around them, which is a price to pay for doing GI on weak machine.

    My working station is a GT 705, for reference it's a bit weaker than a wiiU, it's the weaker of the current line of Gforces. My laptop is a weaker radeon R2, and my mobile phone is a logicom tab 741 with a mali 400 MP1 and a screen of 800x480 512 ram (expecting 32 for actual comfortable use, I'm a mad man I want to do an open world on that!).

    I just need to wait for reinstalling unity (had some issues) and github ... :D
    Last edited: Sep 29, 2019
  2. neoshaman


    Feb 11, 2011
    I have started to make the prototype a while ago, but I have some question about custom render texture, there isn't as much documentation and example as I expected from the community:
    Right now my brain isn't conceptualizing a solution :confused: it feels it should be simple however, I should spread this computation on many frame, which is kind of a problem because that's defacto 256 frames! that's roughly 5s, I'm not sure waiting 5s everytime we want to rebuild the atlas is great, not a problem on startup, but come on! We want the full meal course ... eventually :p

    I also figure out I should know more about bandwidth issues, I realized I don't have a complete grasp of the implication.

    Anyway my brain is currently foggy, I can't visualize the code I need t do even that right now, I can only work on stuff I have an habit of doing already. Well I hope the prototype can help me land a job somewhere.
    Last edited: Oct 26, 2019
    JoNax97 likes this.
  3. neoshaman


    Feb 11, 2011
    It's working as intended now. It takes ~800ms to generate the 1500 view on GT 705 this card is roughly the power of a switch but weaker!

    Octahedron mapping to atlas

    Cubemap UV view to atlas

    Next is generating the Lightmap G buffer (LMGB) which mean I have to do a bit of level creation to have something distinct to also test if the unwrapping shader is correctly set.
    keeponshading, AcidArrow and BenzzzX like this.
  4. Demhaa


    Jun 23, 2018
    Man, you've been wanting to do this since forever. How does it feel to finally get cracking on this concept?
    keeponshading, neoshaman and Martin_H like this.
  5. neoshaman


    Feb 11, 2011
    It feels fun, I just realized that what was holding me back was just being underfed anyway (missing 2 third of what's needed), I would have done it earlier, now I'm also working on optimizing a basic diet to meet the daily target, instead of just buying whatever I can. I'm not there yet, compiling data.

    Anyway. Right now I'm planning the Lay out and overthinking it. To do the LMGB I need a level design, then unwrap it. Back of the envelope calculation:
    - Each cubemap tile is 128²px (16 384 px - ray)
    - The basic implementation is limited by UV being 8bits so I can only index size 256, that mean the LMGB is limited to that size.
    - Currently I'm just plainly laying the 256 probes on a 16x16 grid naively for test, so level must take that into account, that is wall must be around the probe.
    - Since LMGB is limited to 256² of size, we have 65 536 surfel point to unwrap to. We are surface limited, the more complex the level, the more surface to share the surfel with, that is less surfel per surface. We need to considerate level design carefully.
    - If the level is just a plane below that encompass the grid, we got 16² pixel per probe (256 point).
    - I'm trying to find a simple way to get shadowmasking, ideally it would be real time, but I don't want to code and debug a shadowmap. I'm thinking of baking shadow, then use analytical projection of primitive, but that might not save time actually.
    - I'm not implementing far field yet.

    BRB I need to look at pro builder tutorial.
    keeponshading likes this.
  6. neoshaman


    Feb 11, 2011
    About shadow masking, despite not encoding depth yet in cubemap, I realized I could just check a light visibility using the lightprobe data, comparing the position of the light against the depth. Using multiple sample I could make all light area light too, but for now just a binary check will do. It avoid me to do regular shadowmap pass.

    My problem now is to find a strategy to encode depth, currently I think 16bits in the minimum, 8 bit depth is way too coarse. But the probe currently use RG for UV data and B for sky/far field masking, I have only 1 Channel left. Since B is basically binary, I would probably reserved a number (like 0 or max) to encode far field.
    keeponshading likes this.
  7. neoshaman


    Feb 11, 2011
    Hey after some hardware misadventure (the dev machine, the laptop is dying) here is some quick update:

    The black stuff in scene and game view is the unwrap shader test, the rainbow texture on the bottom right is the array of cubemap capturing the scene UV correctly.

    So what's next is to:
    - properly bake the LMGB data.
    - Write the direct lighting shader
    - Write the GI accumulation
    - Write object lighting.
    Last edited: Jan 30, 2020
    Arthur-LVGameDev likes this.
  8. Demhaa


    Jun 23, 2018
    This is pretty much GIA, but realtime...
  9. neoshaman


    Feb 11, 2011

    I'm a bit skeptical of the data, I don't see srgb stuff on texture parameter, but at least it correctly unwrap the LMGB. Model should be made to texture size specification, but for now the pixel UV bleeding will do >.>

    Let's try to make the proper code for the direct light baking and then correct from there.
  10. neoshaman


    Feb 11, 2011
    I was a bit stress than Unity seemingly didn't have structure (in built in) to query light, after 3 days of reading the whole scripting reference line by line, I finally found the relevant part ...

    Code (CSharp):
    1.         Debug.Log(FindObjectsOfType<Light>().Length);
    seems to confirm the job, thankfully

    ... at least I don't have to manually redo a whole lighting structure :confused:

    I'm doing meshless lighting by having all the mesh data baked into texture, I can't just use the automatic culling unity does, and I probably need to reimplement manually all the lighting type too :( Though I'm open to alternatives ...
  11. neoshaman


    Feb 11, 2011
    SO I have been investigating what it mean to make direct lighting in the LMGB space and the overall implication for optimization.

    The naive method would simply to get the light list and do a light pass per light on the LMGB. Naive optimization would be to separate each light type in their own list and just pass that as a single list, it seems we are guarantee 128 float4 at worst, that's roughly 64 directional/point lights per pass.

    But since the goal is to do a coarse GI approximation I had to account for occlusion, basically shadow. That's where thing get a bit costly, ie rasterizing shadowmap and storing map. we can separate those into two major map, directional and local light (spot and point), the idea being that local light have less range and need less precision, so we can store two local shadowmap in a single map by packing them as 16bits (2 channels) Which potentially could allow to also test in a single operation two shadow map.

    But if go back to the basic idea of MAGIC, it's that we reduce GI to solving the light of single a geometric point, through its hemisphere visibility integration, intervisibility between points making up a light graph.

    So is there a way to do away with costly shadowmap? For each geometric point we consider every visible light contribution, so by definition we don't get non visible light, so shadow would have been implicit to visibility, dependant on the quality of visibility structure.

    I alluded to that by adding a depth buffer in the (approximate) visibility structure, it's "theoretically" not necessary given the LMGB has already position of geometry, so we could compare the sample position to the light and infer occlusion. BUT the depth buffer was basically an optimization as we wouldn't need to sample the position map. BUT given the visibility structure (cubemap) we use is a very coarse approximation (that actually bend light), shadow would probably be discontinuous, they would mostly qualify as coarse occlusion factor, it wouldn't produce quality shadows.

    The direct light map is an optimization, the goal is to separate light computation from the integration. Given, for a geometric point, integrating mean sampling surfel data from the LMGB, each sample mean sampling the PVS (indirection) and 3 or 4 (albedo, position, normal, bake shadow) LMGB data map, and running for each sample a light computation, that limit the number of sample possible in a single pass. Given each geometric point will already have computed its own light, we can decouple the computation on each surfel, then just sample the result (reduced to 2 samples), it also decouple the light update from the GI update. It's kind of like light prepass methodology.

    The main issue of this optimization is that it's encoding light per geometric point and don't have any concept of occlusion, that mean that occluded point WILL get light, this will break the GI effects. We can use the pvs as occlusion, that is comparing the light position (or direction for directional) to the position (for local light, skybox/far field occlusion for directional) of the pvs sample in that light direction, obtaining the occlusion factor.

    I also looked at Tiled and clustered lighting optimization, to see if it can be applicable to the lighting. These techniques take advantage of space locality, in screen and world space, to discriminate which light is applied on which pixel, based on spatial partition. There is two way to think about it for LMGB lighting, in world space using voxel group, or in lightmap space.

    BUT Lightmap space don't offer the proper spatial coherence. Given we use surfel, lightmap lay out is really just a convenient one for the PVS used in this particular implementation (rasterizing indirection to cube map), each surfel are addressed absolutely with indirection texture, they are technically independent of each other, they can be packed using whatever other method (than cubemap rasterization). Also lightmap layout are coherent at the surface level, but not a the geometry level, surface can be spatially close in light map space but not in world space. Local light volume implied world space coherence.

    Voxel space allow more coherence, basically you store a list of all non empty voxel, then for every non empty voxel you keep track of the light list. For every geometric point, you hash the position to get the key to the voxel and test only the light in that list. The problem is that lighting is made at the pixel level, so each pixel would have different light and list length, since we treat all pixel at the same time it doesn't solve much. We could then use tile in lightmap space, but then the same problem of coherence is maintained, this time complexity instead of spatial but they are linked, as spatially close pixel would share similar light list.

    Since geometric points are independent, we could pack them coherently to take advantage of tile, but then we lose the simplicity of the rasterization with cubemap as PVS, which is at the surface scope. Same problem if we try a method to prioritized pixel in view space. A different packing structure, other than cubemap rasterization of geometric lightmap address, would probably help (MAGIC instead of MAGICAL), buit finding one that is more fast and efficient (than cubemap rasterization) for run time instead of offline packing, especially on weak machine or pgc, has yet to be defined.

    Given the method favor geometric surfaces, emissive surfaces are trivial to add, we can just rasterized the emissive channel directly into the direct light map, the intensity will be picked up when the GI update will compute. Using emissive might be the optimal way to add some local light, and they are area light by default. If we want the emissive to be dynamic, it's probably a good idea to render them on another map to sample at the same time as the direct light, for the cost of another sample, or pack them as special surfaces. Given that lightmap and PVS resolution are fixed, it's great to remember that the more surfaces there is, the more pixel they have to share, and that the pvs might not actually pick the full extent of a surface, potentially leading to some wastes.

    For the prototype it's unlikely I try anything complex. I anticipate that directional light would be mostly singular (sun) and could probably rolled into the skybox evaluation during GI update. I'll probably just do the occlusion factor and don't bother with accurate shadow for local light, and local light might just be emissive light map (no need to render direct light?). Then we will evaluate the value of the result artistically.
    Last edited: Mar 3, 2020
  12. neoshaman


    Feb 11, 2011
    This are getting a bit hash on my side, if anyone is interested in the result, some modest financial contribution can go a long way to accelerate the dev. I'll be also happy with material contribution too.
  13. Mauri


    Dec 9, 2010
    Why not setting up a Patreon or Ko-fi?
  14. neoshaman


    Feb 11, 2011
    It's not ready anyway, I don't have the proof of concept, it was mostly because many people try to leach code away in DM, so I put that to let them know they can help monetary if they want to see progress, in which they disapear lol

    I'm stuck debugging the atlas octahedron cubemap, which is like the true last challenge, and I don't understand yet what's wrong:

    It seems I select the right part, but I don't know how to check for bad offsets and sizing, I tried using a 16x16 plain color map for hash position, which seems correct, the box projection seems to work, but the sampling seems off :(

    Code (CSharp):
    1. Shader "MAGIC/BoxTEST"
    2. {
    3.     Properties
    4.     {
    5.         _MainTex ("Cubemap Atlas", 2D) = "white" {}
    6.     }
    7.     SubShader
    8.     {
    9.         Tags { "RenderType"="Opaque" }
    10.         LOD 100
    11.         Cull Off
    12.         Pass
    13.         {
    14.             CGPROGRAM
    15.             #pragma vertex vert
    16.             #pragma fragment frag
    17.             #include "UnityCG.cginc"
    18.             #include "MAGIC.cginc"
    20.             struct d
    21.             {
    22.                 float4 vertex    : POSITION;
    23.                 float2 uv        : TEXCOORD1;
    24.                 fixed4 color    : COLOR;
    25.                 fixed3 normal   : NORMAL;
    26.             };
    28.             struct v2f
    29.             {
    30.                 float4 vertex    : POSITION;
    31.                 float4 wpos     : TEXCOORD1;
    32.                 fixed4 color    : COLOR;
    33.                 fixed3 wnormals : NORMAL;
    34.             };
    35.             //------------------------------------
    37.             v2f vert (d v)
    38.             {
    39.                 v2f o;
    41.                 //vertex world position
    42.                 o.wpos = mul(unity_ObjectToWorld, v.vertex);
    44.                 //vertex screen position
    45.                 o.vertex = UnityObjectToClipPos(v.vertex);
    47.                 //normal to world normal
    48.                 o.wnormals =UnityObjectToWorldNormal(v.normal);
    50.                 //o.vertex = UnWrapToScreenSpace(float2 v.uv, float4 v.vertex);
    51.                 o.color = float4(v.uv, 0,1);// v.color;
    52.                 return o;
    53.             }
    55.             sampler2D _MainTex;
    56.             //float4 _MainTex_ST;
    58.             fixed4 frag (v2f i) : COLOR
    59.             {
    60.                 //set size
    61.                 const float size = 4;
    62.                 const float2 cuberange = float2(16,16);
    63.                 //hash position to read the right cubemap in the atlas
    64.                 float4 hashpos = floor(i.wpos/size); //select the proper cell
    65.                 float2 hash_id = max(float2(0,0), min(hashpos.xz, cuberange)); // range limit to keep inside
    66.                 float3 hash_offset = hashpos*size;// float3(hash_id.xy,hashpos.y) * size; //start position of each cell
    68.                 float3 cubecenter =  hash_offset + (size/2) ;
    69.                 float3 mincube = float3(0,0,0) + hash_offset;
    70.                 float3 maxcube = float3(size,size,size) + hash_offset;//boxproject(wpos,wnormal, cubecenter, cubemin,cubemax)
    71.                 float3 projected = BoxProjectVector(i.wpos,i.wnormals, cubecenter, mincube, maxcube);
    73.                 //get the oct position on teh cubemap
    74.                 //-first get the id
    75.                 //-reduce the range to the size of the atlas unit (1/range)
    76.                 //-offset by id
    78.                 float2 octnormal = PackNormalToOct(projected);
    79.                 //transform oct to hashed cubemap
    80.                 float2 samplepos = (hash_id + octnormal/2)/cuberange;//(size*hash_id)/64 + octnormal;
    81.                 float4 cubesample = tex2D(_MainTex, samplepos);//sample the cubemap in the direction (world to oct)
    82.                 return cubesample;//return hashpos/16; //float4(hash_offset,1)/64;
    83.             }
    84.             ENDCG
    85.         }
    86.     }
    87. }
  15. neoshaman


    Feb 11, 2011
    damn it

    after all these month of debugging
    it was wrong all along :(
    since third post :confused:
    since the very beginning, i was curing the symptom :oops:
    the repeated motif is supposed to be an unwrapping of a sphere normal on a square folded as an octahedron(modifié)
    so the corners should all be the same color :rolleyes:
    they obviously aren't :eek:
    so when they joined they do S*** o_O

    Code (CSharp):
    1. float3 UnpackNormalFromOct(float2 f){
    2.     float3 n = float3(f.x, f.y, 1.0 - abs(f.x) - abs(f.y));
    3.     float t = max(-n.z, 0.0);
    4.     n.xy += n.xy >= 0.0 ? -t.xx : t.xx;
    5.     return normalize(n);
    6. }
    This isn't it?
    I mean I pass the uv coordinate to translate into normal space, to get a full sphere unwrap (uv 0 to 1 in oct space)
    I didn't made that, I picked up elsewhere
    Code (CSharp):
    2.             fixed4 frag (v2f i) : COLOR
    3.             {
    4.                 float3 normal = UnpackNormalFromOct(i.uv;);
    5.                 return float4 (normal,1);                                  
    6.                 //texCUBE(_Cube, normal);
    7.             }
    where is teh flaws?
  16. neoshaman


    Feb 11, 2011
    First rules of asking for help, if you ask, you solve the second after ...
    I had an intuition that the UV range should probably rescale to negatives

    MAGICAL .png

    correct looking wrapping


    correct looking projection

    so this is done, there is harsh transition when going from one cell to another, obviously, since I hash and sample per fragment, I'll blend it in polish phase though, I'm going to finally have the light sampling to code ...
  17. neoshaman


    Feb 11, 2011

    applied to a non sphere mesh yield mostly black, which don't make sense, upward normal should at least be blue in most case due to sampling the sky ..... :confused:
  18. neoshaman


    Feb 11, 2011

    Looking at the box projected normals show some artefacts I have no idea where they come from, they are also spatially coherent with the mesh .... :eek: WTF
  19. neoshaman


    Feb 11, 2011

    Added an epsilon to world normals solved it ....

    occlusion factor.png
    The box projected occlusion factor, ie basically masking the skybox with the box projected mask

    I'll try to make a demo with dynamic light, right now I encode light for test as
    Code (CSharp):
    1.                 float3 fakelight    = normalize(float3(0.5,0.5,0.5));
  20. neoshaman


    Feb 11, 2011
    It's worth noting the image above don't have the level properly aligned to hash grid vertically, which is why the wall have the bottom "lighted", they cross two hash cells
  21. neoshaman


    Feb 11, 2011
    Shader with offset to snap the spatial hash to the mesh bound.

    Here is the bend light shadow
    ie sampling occlusion in the direction of the box projected light.
    The artifact are what's expected, it's also smoothly moving the shadow inside the hashed space per pixel

    Here is direct light
    ie sampling without box projection,
    we lose resolution as each hash sample the same angular pixel, angular transition is therefore hash and per hash rather than per pixel

    Here is direct light with mip maps sampling and bilinear sampling
    (depth 4)
    This make transition smoother as we see the grayscale evolve, but we still have the low resolution per hash size.

    Spherical occlusion factor
    ie depth 7, the whole cubemap is a single pixel, light dir don't work anymore, both bend and direct light mipmap converge to this.
    Basically it's the total visibility factor of a single probe.
    I don't call this last one ambient occlusion, because generally ambient occlusion is define on surface using hemisphere occlusion, this is a whole sphere, therefore lack directionality.
    BUT given sky contribution is generally hemispherical and aligned to up vector, we can probably boost selectively to reconstruct ao using a dot product of the surface normal with the up vector, to fake an effect that's closer.
    While we don't have the directional contribution discriminate, surface with open sky and distant occluder are likely to be more bright that those with close occlusion, no matter where it is on the hemisphere. Downward facing surface will have the ground blocking anyway. Side facing only receive part of the sky contribution across their horizontal plane.

    While the box projection hack doesn't allow for mipmap due to the nature of the data baked (the uv adress), I have activated it for shadow as a test as there is a mask to signal non uv pixel, which also happen to double as a sky occlusion, we can't rely on it on a final shadow solution. THEREFORE the smooth sampling is no go, unless we bake the result on a supplemental texture, using mipmap will then smooth both the spatial hash and teh pixel details. We already have a shadowmask texture, we will have to experiment doing that. I already done the meshless direct lighting that sample baked world normal and do the ndotl lighting.

    Next is the light accumulation gathering operation, which will open to proper per pixel ao and gi.

    The system now is baking the light accumulation texture, then make a shader than blend the direct light and light accumulation for final rendering.
    PROE_, jjejj87 and JoNax97 like this.
  22. neoshaman


    Feb 11, 2011
  23. neoshaman


    Feb 11, 2011
    Okay I have a few gotcha:

    1. I no longer have access to my work pc, os I'm now on weaker laptop with a weaker GPU, I wasn't able to save the last work, though it's not lost, just locked away on the work pc.

    It's an opportunity to redo things cleanly as the code, being experimental, had a lot of inefficiency and was messily organized. A lot of stuff was redundant and implemented in different way, that's an opportunity to rationalize that.

    What that meant is the creation of surfaceCanvas, ie a kind of partial reimplementation of CustomRenderTexture that can do more than one update and don't rely on the main renderer, it's simple, it's just code that takes a surface or mesh, to render it onto a texture, using a painter shader.

    Okay now unity tell me what's the deal with DrawMeshNow? Also I have to pay attention to how matrix work properly now ....

    2. I wanted to have unity's shadow map for test purpose of proof of concept, they proved to be more hassle than worth, I'll need to test with fixed light with baked shadow. This mean I had to reconsider how to implement shadow, and by that I really mean light occlusion:

    a. The obvious solution would be to try the technique I told about storing depth into light probe, then check against the lightprobe the position of the light. There is issue with bend light obviously, but also it's complex as it mean implementing a whole pipeline just to test if the technique works.

    b. Directional light just need a binary occlusion, but I could probably have semi working shadow with bent light attenuated by the depth to have a sort of lightprobe contact shadow, and I want to explore mixing that with the direct light occlusion from the probe.

    Problem is, the nice soft low freq shadow of the direct probe light made by sampling mipmap wouldn't work due to pollution by random data around occlusion. I can't MRT either, and I would like to keep the texture memory low. If I store 16bit depth per probe, I have the 4 channels taken with the UV data on top, the occlusion would be a fifth, which then would take a whole extra texture, ie 4 channels, just for binary data, that's not optimal.

    Of course the method can be extended for better hardwares, where those constraint wouldn't apply, I have many idea for extension to scale the technique, it's not useful to expose them here.

    c. I'm considering baking an atlas of lightmap shadow to mimic time of day lighting, but just for the proof of concept. It's just a texture read so implementation wouldn't be complex, also my end goal for the shadow masking was to use zonal harmonics to get baked shadowmap compression.

    d. I can still try to implement a regular shadow map, but I need to evaluate deeply how it interact with the setup to future proof it.

    3. Also I met a snafu with the "gathering over time" buffer, one idea of the method is to use a texture to accumulate light, by spreading the compute onto multiple frame. Problem is that texture being 8bits RGB, at 64 rays, each ray could only sample data compressed into 4 values.

    I had to find a way to store the data correctly, I finally settle on storing the luminance and chroma (hue, sat) separately, with each being a 16bit value, which basically trade accuracy for precision for light intensity and the reverse for "color". Though I haven't implemented it yet, since I have to resolve the shadow situation.
    ftejada likes this.
  24. neoshaman


    Feb 11, 2011
    A few consideration I had for shadow mapping is that I want to think through the implementation of real time adjustment, one problem with real time adjustment would be evaluating shadow outside of the GI scene, since the system is rather self contained.

    But If I want to make an open world I need to consider "far field" contribution to the "active scene", previously I figure out that lightprobe already discriminate skybox vs gi scene, therefore if there was no UV hit I would simply sample the far field cubemap instead of the light accumulation map. But that also mean the far field was didn't compute GI itself.

    I figure out that I can actually cascade the GI fairly simply, that is a LOD scene compute its own GI with reduce detailed, then the farfield skybox is refreshed by sampling back that scene. Which mean contribution flow from low details to high details scene only. Given the current implementation limit map to 256² We can have 4 cascade in a 512² texture. On top of that the Mesh of the LOD isn't needed as the system is mesh free mostly. Though I have been thinking to reintroduce mesh data as sampling normal and position takes sampler away in the shader.

    About the baking of shadowmap, I did some back of the envelop calculation: Given we are limited to 256² texture, and that shadowmap are grayscale or binary, I realized a texture of 2048² could contain up to 256 shadowmap, if we do pseudo bit trick we can store 8192 binary shadowmap, accessible per batch of 32 per texture pixel, so we can do extra computation to blend them... multiply by 4 if you have access to 4096² size textures (that is 1024 grayscale map and 32768 binary, with a 8K texture that goes to 4096 and 131 072), seems legit. I want to see how that compare to zonal harmonics compression.

    hyper atlasing:
    2048² / 256² = 8² (64 tiles)
    x 4 channels = 256 shadowmap (grayscale)
    x32 bitplanes = 8192 shadowmap (binary)
    *4 if using 4096² instead (32 768 map)

    I made a mistake, I multiplied each channel by 32bits, but that's all channel, each channel is only 8bits, therefore we need to divide by 4:
    - 8192 is really just 2048
    - 32 768 is 8192
    - 131 072 is 32 768
    Last edited: Aug 23, 2020
    JoNax97 and ftejada like this.
  25. neoshaman


    Feb 11, 2011
    The ongoing main test code
    Code (CSharp):
    1. Shader "MAGIC/BoxTEST"
    2. {
    3.     Properties
    4.     {
    5.         _MainTex ("Cubemap Atlas", 2D) = "white" {}
    6.         _DirectLightMap ("Direct Lighting", 2D) = "white" {}
    7.         //_MainLight ("Main Light", Vector) = (1,1,1,1)
    8.         //_Origin ("Origin", Vector) = (0,0,0,0)
    9.     }
    10.     SubShader
    11.     {
    12.         Tags { "RenderType"="Opaque" }
    13.         LOD 100
    14.         //Cull Off
    15.         Pass
    16.         {
    17.             CGPROGRAM
    18.             #pragma vertex vert
    19.             #pragma fragment frag
    20.             #include "UnityCG.cginc"
    21.             #include "MAGIC.cginc"
    22.             struct d
    23.             {
    24.                 float4 vertex    : POSITION;
    25.                 float2 uv        : TEXCOORD1;
    26.                 fixed4 color    : COLOR;
    27.                 fixed3 normal   : NORMAL;
    28.             };
    29.             struct v2f
    30.             {
    31.                 float4 vertex    : POSITION;
    32.                 float4 wpos     : TEXCOORD1;
    33.                 fixed4 color    : COLOR;
    34.                 fixed3 wnormals : NORMAL;
    35.             };
    36.             v2f vert (d v)
    37.             {
    38.                 v2f o;
    39.                 o.wpos = mul(unity_ObjectToWorld, v.vertex);    //world position
    40.                 o.vertex = UnityObjectToClipPos(v.vertex);      //screen position
    41.                 o.wnormals =UnityObjectToWorldNormal(v.normal); //normal to world normal
    42.                 o.color = float4(v.uv, 0,1);// v.color;
    43.                 return o;
    44.             }
    46.             sampler2D _MainTex;
    47.             float4    _MainLight;
    48.             float4    _Origin;
    49.             //lightcolor,
    50.             //mesh position to align with grid potentially in vertex clamping using the bounding box
    51.             //pass the size and grid range, compute cell size
    52.             //pass number of samples over hemisphere
    53.             fixed4 frag (v2f i) : COLOR
    54.             {
    55.                 //set size
    56.                 const float size    = 4;
    57.                 const float2 cuberange = float2(16,16);
    58.                 float  epsilon      = 0.000001;
    59.                 float3 origin       =;
    60.                 float3 worldnorm    = normalize(i.wnormals) + epsilon;
    61.                 float3 pos          = - origin + 0.001;
    62.                 //hash position to read the right cubemap in the atlas
    63.                 float3 hashpos      = floor(pos / size);
    64.                 float3 hash_offset  = hashpos * size;
    65.                 float2 hash_id      = max(float2(0,0), min(hashpos.xz, cuberange));
    66.                 //box projection
    67.                 float3 cubecenter   = hash_offset + (size / 2);
    68.                 float3 mincube      = hash_offset + 0;
    69.                 float3 maxcube      = hash_offset + size;
    70.                 float3 projected    = BoxProjectVector(pos, worldnorm, cubecenter, mincube, maxcube);
    72.                 //sampling the atlas
    73.                 float2 octnormal    = (PackNormalToOct(projected) + 1) / 2;
    74.                 float2 samplepos    = (octnormal + hash_id) / cuberange;
    75.                 //light test
    76.                 float3 light        = normalize(_MainLight);
    77.                 float  ndotl        = saturate(dot(, worldnorm));
    78.                 float  skyocclusion = saturate(dot(float3(0,1,0), worldnorm));
    79.                 //skyocclusion *= skyocclusion;
    80.                 //shadow sampling, box projected and direct
    81.                 float3 lightproj    = BoxProjectVector(pos, light, cubecenter, mincube, maxcube);
    82.                 float2 lightbox     = (PackNormalToOct(lightproj) + 1) / 2;
    83.                 float2 shadowbox    = (lightbox + hash_id) / cuberange;
    85.                 float2 lightdirect  = (PackNormalToOct(light) + 1) / 2;
    86.                 float2 shadowdirect = (lightdirect + hash_id) / cuberange;
    88.                 //gather loop
    89.                 const float PI = 3.14159265359;
    90.                 const float phi = 1.618033988;
    91.                 const float gAngle = phi * PI * 2.0;
    92.                 const int numSamples = 64;
    93.                 float4 gi;
    94.                 float4 traceResult;
    95.                 for (int i = 0; i < numSamples; i++)
    96.                 {
    97.                     float fi = (float)i;
    98.                     float fiN = fi / numSamples;
    99.                     float longitude = gAngle * fi;
    100.                     float latitude = asin(fiN * 2.0 - 1.0);
    102.                     float3 kernel;
    103.                     kernel.x = cos(latitude) * cos(longitude);
    104.                     kernel.z = cos(latitude) * sin(longitude);
    105.                     kernel.y = sin(latitude);
    106.                     kernel = normalize(kernel + worldnorm);
    107.                     if (i == 0){
    108.                         kernel = float3(0.0, 1.0, 0.0);
    109.                     }
    110.                     traceResult += 1;// ConeTrace(,,;
    111.                 }
    112.                 traceResult /= numSamples;
    113.                 //gi.rgb = traceResult.rgb;
    114.                 //gi.rgb *= 4.3;
    115.                 //gi.rgb += traceResult.a * 1.0 * SEGISkyColor;
    116.                 //float4 result = float4(gi.rgb, 2.0);
    117.                 //cubemap result
    118.                 float4 cubesample   = tex2D   ( _MainTex, samplepos );
    119.                 float4 boxshadow    = tex2Dlod( _MainTex, float4(shadowbox,0,7));//tex2D(_MainTex, shadowtest);
    120.                 float4 directlight  = tex2Dlod( _MainTex, float4(shadowdirect,0,4));
    121.                 float4 occlufactor  = tex2Dlod( _MainTex, float4(shadowdirect,0,7));
    122.                 float4 occlusion    = occlufactor.b * (skyocclusion + 1.0);
    123.                 return occlusion;
    124.             }
    125.             float4 traceRays(){
    126.                 //box sample cubemap to get uv
    127.                 //use uv to sample direct lightmap to get illumination
    128.                 //mask sample by skybox mask
    129.                 //sample skybox
    130.                 //mask skybox by skybox mask
    131.                 //return sample + skybox
    132.             }
    133.             ENDCG
    134.         }
    135.     }
    136. }
    Last edited: Aug 26, 2020
  26. neoshaman


    Feb 11, 2011
    Had to stop to get money to invest in some material like a graphic screen for drawing and ramping up for my game.

    Here is the current schema of the refactoring, bold stuff are the thing that need to be coded how are in teh process to be coded (and research how to do)

    Really the important part before next test is chroma lum and fp16 texture encoding, I might update this post with sources that tells me how to do it in the future lol :D

    ...why are the picture only showing in the post preview?
    Last edited: Mar 7, 2021
    ftejada likes this.
  27. neoshaman


    Feb 11, 2011
    I haven't given up this yet, i still need to find way to generate the shadow atlas easily, implement and find a way of testing the fp16 codec, i also wanted to unify the texture generation code into surface canvas, but they are 3 very distinct way (cubemap, UV unwrap, stamping), I'm trying to consolidate redundant code into a cginc. Online shadow also prove to be an hassle to conceptualize, though i can probably do the atlas again, 2048/256 =8 so 64 TOD shadow at 32bits prevision, 128 for 16bits.

    I need to limit the scope of the tech demo to all GI tile only 4 meter high, 16x16 grid of 64m side, origin at zero with only positive coordinate.

    For future exploration, the uv packing is wasteful pn texture space, sampling point not covered by the uv are simply wasted. If we could find an equivalent projection from a proxy geometry made of quad, we could have better sampling as quad cover the texture space better. Also i was trying to emulate cone sampling by jittering the sampling ray or sampling the mip map, but it's difficult and expensive against arbitrary uv. Quad on the other hand are more predictable we could find analytically the boundaries. Also instead of uniquely identifying single point, we can sample the quad id and figure out the barycentric point to sample, which mean more points instead of the 65536 limit aka 256x256, we get to have as much quad id instead. Using voxel like surface proxy ie axis aligned quad of same size would simplify things a lot, instead of a single color like traditional voxel, they would have a snapshot of close geometry, barycenter coordinates would also be just a spatial hash of the sample using ray intersection as we know we get a hit, and the grid is spatially regular. The question being then, how to generate that snapshot.
    Last edited: Oct 21, 2021
  28. neoshaman


    Feb 11, 2011
    I have been working on massively refactoring the code to future proof it so I can implement varient of MAGIC beyond MAGICAL like MAGIC HAT (MAGIC with Hemisphere Atlas Tiling) AND MAGIC TRICK (MAGIC with Tiled Ray Indirection in Compressed Kernel). They require offline baking of data though, HAT is using per pixel hemisphere basically hemicubemap that hold all the ray sampling hit. TRICK is basically HAT but with compressed data, ie we sort rays by most contributing and add a weight values to bundle similar rays together, to reduce waste from weaker rays.

    The architecture is rather simple, the goal is to have a rolling chunk aray of scene objects, to which you pass the data to a GIscene that is in charge of rendering a lightdata textures for that scene and updating it.

    So you have:

    - a (currently monobehavior) class MAGICManager that hold global light and an array of GIscene, it is responsible for attaching GIscene to game scene (not currently, it is attached to the test gameobject scene), initializing and updating the GIscene data.

    - GIscene is a (scriptable object) class that hold the various lightdata texture (Probes, GI buffers), attach them to their root gameobject scene, initialize and and update the data through shader for illumination to a lightmap. (Currently a scriptable objects for convinience since there isn't yet a proper scene objects, it has a field that hold the prefab of the scene gameobject).

    The lightdata class are responsible for generating, initializing and updating their data, these are basically copy paste from part of the spaghetti code, cleaned up and tested:

    - There is a the lightprobe data, which hash spatially the scene and create an array of probe stored as an atlas of octohedron cubemap. I simply capture a regular cubemap at every hash position, with an internal camera, and transfer the data to the atlas.

    - the LMGB simply hold graphic buffer data (albedo, shadow, normal and position) in lightmap format using direct rendering, these are input to the GI solver.

    - the MAGICAL class is the GI solver, generate a direct lightmap texture cache using the LMGB, it accumulate one ray sampling of color per frame in a double buffered accumulation texture, swap the texture after one round of accumulation.

    However I'm hitting a big stopper that hasn't had anything with the GI procedure, I need help figuring what is happening.
    see here::
    wetcircuit likes this.
  29. neoshaman


    Feb 11, 2011
    Currently I have a LMGB (light map Graphic buffer) which allow to do meshless lighting resolution. Currently i'm still rendering the mesh as UV unwrap to save texture sampling on opengl es 2.0.

    But the point in the LMGB doesn't have to correlated spatially, neither do the indirection data on the atlas of ray caches (AL, HAT, TRICK), which why TRICK can be implemented as a compression of HAT. This fact allow to use other more efficient distribution and pragmatical use of the texture surfaces, assuming each points is a surfel for example.

    But we can also use the spatial coherence to get more efficient in sampling and texture surface utilization, for example having square surface proxy allow a bigger sampling area (the entire surface is a single ID to reference), and have approximation of cone tracing using mipmap, and allow jittering to temporally spread rays on the surface to get more data. Another solution is using voxels.

    The issues with decorrelating the surfaces with the mesh is that we loose the obvious mapping between the too, which mean we need to introduce another texture of indirection, which map the proxy addresses to the mesh mapping, which would be used as illumination of the mesh by sampling the proxy map. Solution to map mesh and proxy is an area of research TBD.

    The last evolution would be to do a DDGI style of global illumination, where the raytracing is replaced by the indirection cache. It might even be possible to actually introduce some form of raytracing on slow hardware by taking adventage of the cache to update as infrequently as possible, by marking change in teh raytracing structure and only tracing those change (which would cut a lot the tracing time to just a few instances).
  30. neoshaman


    Feb 11, 2011
    Last edited: Jun 11, 2022
    ftejada likes this.
  31. hippocoder


    Digital Ape Moderator

    Apr 11, 2010
    Er.... since when was "Macbook Air M1" a weak processor? haha. But lovely looking results aren't they? Wonder how much work would be needed to polish it up.
    neoshaman likes this.
  32. neoshaman


    Feb 11, 2011
    I'm finally at a point where it bugs in predictable way, because some formula are purposely undercooked, I'm finally out of the tunnel of stupid bugs.

    florianBrn likes this.