Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice

Question How to create a Lens Flare effect without Physics?

Discussion in 'General Graphics' started by Peter77, Feb 7, 2021.

  1. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,625
    I'm using Unity 2019.4 with the built-in forward renderer. I want to implement a lens flare effect for a 3D mobile project (Android, iOS). The project does not use Unity's physics engine.

    I found the Lens Flare and Flare Layer Components allow Unity to produce a lens flare effect. However, Unity's lens flare system requires Unity's physics engine and colliders, which the project doesn't use. The Lens Flare implementation in Unity is also not great, it seems to shoot a single ray to determine whether sight to the sun is occluded.

    In the past, non-Unity time, I used GPU occlusion queries to determine how many pixels can be drawn (are not occluded) from the sun, which I used to adjust the lens flare opacity.

    Unfortunately, I can't find occlusion queries in Unity and I'm stuck how one would create a lens flare effect otherwise. Any ideas?
     
  2. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    If you're targeting shader model 5 hardware, you can write to UAVs (RWStructuredBuffer) from pixel shaders. You could then draw a the sun disc and have each visible pixel increment a counter in an UAV to tell you the number of visible pixels, then use that buffer in the flare shader to adjust its size and opacity.

    You can also do it without SM5 by reading depth values from the screen space rect which contains the sun disk after the opaque pass, and checking how many pixels are smaller than the max depth.
     
    Peter77 likes this.
  3. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,625
    Thanks for your reply! Preferably I'd want to get it to run with SM3. How exactly would the second option work? Things that are unclear to me:
    • Do I need to render an additional depth texture or can I use the "hardeware-depth"?
    • When I render the sun disk, where would I store the data how many pixels are smaller than the max depth?
    • How would I transfer the "how many pixels are smaller than the max depth" data from the GPU to CPU? Would I use Rendering.AsyncGPUReadback.Request?
     
  4. joshuacwilde

    joshuacwilde

    Joined:
    Feb 4, 2018
    Posts:
    734
    Another option would be to just change the sun flare position / opacity / size in a compute shader that reads the depth buffer.
     
  5. BrandyStarbrite

    BrandyStarbrite

    Joined:
    Aug 4, 2013
    Posts:
    2,076
    If you want, you could make your own png. sun flare in photoshop, and use it in unity.
     
  6. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    Ideally, you should avoid reading back into the CPU. The idea would be to have a fragment or compute shader read from the depth texture and store the results into a 1x1 texture buffer, which you then read in the flare vertex and fragment shader. Compute shaders would require SM3.1 and sampling textures in vertex shaders requires SM3.0.

    For SM2.0 there's no alternative other than using async readback to get the data to the CPU. You could mask 3-frames readback delay by gradually interpolating between the current value and the new one, so the flare gracefully fades out a little shorter after being occluded.

    Trivia: this is pretty much how The Legend Of Zelda: Ocarina of Time did lens flare in the Nintendo 64. Since the CPU could freely access GPU memory, the game would read the depth value at the light source's screen-space position and compare it against it's actual depth to determine if it was occluded or not without ray casting. This was very troublesome to emulate for a long time on PCs.
     
    Peter77 likes this.
  7. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,375
    For SM2.0 I would start adding collision shapes / meshes to your objects to do single traces against on the CPU. :p Anything else is going to be way, way too slow. There's no async readback on those devices, so you're going to be waiting several MS each time, and the CPU is way too slow to iterate over the texture.

    You could conceivably render a focused view of each flare as a white circle to an R8, downscale it, and use the smallest mip level to get the brightness w/o needing vertex texture sampling. Not sure that'd be terribly efficient though.
     
    Last edited: Feb 9, 2021
    Neto_Kokku and richardkettlewell like this.
  8. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,625
    Thanks again for the help! That sounds like a slick approach. I'm going to give it a try at the weekend and post an update when I have something to show.

    I always loved the trickery how developers created effects on older hardware.
     
  9. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,625
    Thank you for reply. I don't know what "render a focused view" in particular means. How exactly would that work?
     
  10. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,375
    "Short" version is:
    Create a camera from script, disabled, as a child of your main camera.

    Assign it a render texture that's something small like 32x32, R8 format, 16 bit depth buffer, and with mip maps enabled.
    https://docs.unity3d.com/ScriptReference/RenderTexture-ctor.html
    https://docs.unity3d.com/ScriptReference/RenderTexture-useMipMap.html

    Assign a replacement shader to the camera that draws everything as solid black
    https://docs.unity3d.com/Manual/SL-ShaderReplacement.html
    https://docs.unity3d.com/ScriptReference/Camera.SetReplacementShader.html

    Except for a quad the size of your flare visibility, and which is facing the camera. Have it use a shader that renders as a white circle (needs to use a custom
    RenderType
    tag with the circle shader in the replacement shader), and which is only visible to that camera. You can do that either with layers & the camera's culling mask, or you can enable it just before you call
    Camera.Render()
    as you're going to be doing most of this from c# before the scene renders.

    Align the camera to point at that quad, and adjust the FOV so it fills the view:
    https://forum.unity.com/threads/fit...of-view-focus-the-object.496472/#post-3229700
    And then call
    Render()
    on the camera.

    That render texture's smallest mip now has how occluded the flare is, and can be sampled from the fragment shader of your lens flare with
    tex2Dbias(_FlareOcclusionTex, float4(0.5, 0.5, 0.0, -10.0)).r;


    For one or maybe two flares, this is totally plausible, if overly complicated.
     
    Peter77 likes this.
  11. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,625
    After playing with some of the here mentioned approaches (thank you KokkuHub and bgolus for the write up), I decided to go with what I thought to be the simpler route and give RWStructuredBuffer a try.

    However, I can't get it to work. I created a simple test project that I try to get running (also attached to this post). I have two shaders, both use the same ComputeBuffer. On the C# side I create the ComputeBuffer and assign it to both materials:
    Code (CSharp):
    1. void Start()
    2. {
    3.     m_BufferData = new float[1];
    4.     m_ComputeBuffer = new ComputeBuffer(m_BufferData.Length, sizeof(float) * 1, ComputeBufferType.Default);
    5.     Graphics.SetRandomWriteTarget(1, m_ComputeBuffer);
    6.  
    7.     m_ComputeBuffer.SetData(m_BufferData);
    8.  
    9.     m_SunDiscMaterial.SetBuffer("_BufferData", m_ComputeBuffer);
    10.     m_LensFlareMaterial.SetBuffer("_BufferData", m_ComputeBuffer);
    11.     //Shader.SetGlobalBuffer("_BufferData", m_ComputeBuffer);
    12. }
    The sun disc shader writes
    1
    to the
    _BufferData
    in the fragment shader:
    Code (CSharp):
    1. uniform RWStructuredBuffer<float> _BufferData : register(u1);
    2.  
    3. fixed4 frag(v2f i) : SV_Target
    4. {
    5.     _BufferData[0] = 1;
    6.     return fixed4(1,1,0,1);
    7. }
    My assumption was: The fragment is only executed for pixels that are visible (not z-rejected). So if no pixel is visible, the fragment shader would not write to the buffer. This assumption seems to be wrong?

    The lens flare shader then reads the
    _BufferData[0]
    value, which should be 0 or 1, and uses it as color:
    Code (CSharp):
    1. uniform RWStructuredBuffer<float> _BufferData : register(u1);
    2.  
    3. fixed4 frag(v2f i) : SV_Target
    4. {
    5.     return _BufferData[0] * 0.1;
    6. }
    On the C# side, I set _BufferData to 0 every frame:
    Code (CSharp):
    1. void Update()
    2. {
    3.     m_BufferData[0] = 0;
    4.     m_ComputeBuffer.SetData(m_BufferData);
    5. }
    Assuming this would cause _BufferData[0] to be 0 when the entire sun disc is occluded and 1 if a single fragment of the sun disc is visible. But _BufferData[0] seems to be 1 always, regardless whether the sun disc is occluded by another object that writes to the z-buffer. PS: Writing to m_BufferData[0] on the C# side makes it to the shader side, I tested this.

    The yellow sphere is the sun disc. The transparent white squares are particles rendered with the flare shader.
    upload_2021-2-21_17-19-15.png

    However, moving the sun disc behind a green occluder doesn't cause particles to disappear, because _BufferData[0] is still 1.

    upload_2021-2-21_17-20-37.png

    What I'm doing wrong?
     

    Attached Files:

    Last edited: Feb 21, 2021
  12. joshuacwilde

    joshuacwilde

    Joined:
    Feb 4, 2018
    Posts:
    734
    Hmm, I may be wrong... But I think IMUs (Immediate Mode Renderers, like desktop GPUs) will execute the fragment shader even on occluded fragments, as they don't have built in depth-culling.

    EDIT : Didn't realize you were targeting mobile, so none of the above is relevant...
     
    Last edited: Feb 22, 2021
  13. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,375
    The fragment shader will absolutely not be run if it's fully occluded.*


    * Assuming nothing is preventing occlusion from working.

    If the shader you're using to write to the buffer is using
    ZTest Always
    , or is modifying the vertex shader's output clip space position z so it's always close to the camera, or if you're writing to
    SV_Depth
    in the fragment, all of these will skip early depth culling, meaning the fragment shader will run for every pixel all the time. So make sure you didn't leave any of those in from the original lens flare shader. Obviously you're not writing to the depth, but make sure you don't have
    ZTest Always
    . There's also the issue of pixel quads, where for each 2x2 group of pixels if one pixel is visible, the fragment shader for all 4 will be run. There's not actually a way to avoid that last issue that I'm aware of.

    Also, be mindful that this technique will limit you to mid to high end phones from the last 5-6 years.
     
  14. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,289
    I think you should run this through RenderDoc. It can give you a step by step of what the GPU is doing on specific frames. (on PC).

    A question I have, that RenderDoc could answer: is the sun being rendered *after* all the geometry?
     
  15. joshuacwilde

    joshuacwilde

    Joined:
    Feb 4, 2018
    Posts:
    734
    One more thing, is I would try and see if it renders any different on device compared to in the editor. I have a feeling it might.
     
  16. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,375
    As much as I love RenderDoc, it isn't quite accurate enough for this specific case. RenderDoc doesn't differentiate between pre or post fragment depth rejections, which is important here. It also doesn't show you anything to do with pixel quad behavior. It's just showing you what the ideal rejection for the merge is, and it doesn't always even match that. Using something like
    SV_DepthLessEqual
    with an out range value (like 0.9999) and you can very easily see this.

    Code (CSharp):
    1. Shader "Unlit/BadDepthLessEqual"
    2. {
    3.     Properties
    4.     {
    5.     }
    6.     SubShader
    7.     {
    8.         Tags { "Queue"="Transparent" }
    9.  
    10.         Pass
    11.         {
    12.             CGPROGRAM
    13.             #pragma vertex vert
    14.             #pragma fragment frag
    15.            
    16.             #pragma target 5.0
    17.  
    18.             #include "UnityCG.cginc"
    19.  
    20.             float4 vert (float4 vertex : POSITION) : SV_POSITION
    21.             {
    22.                 return UnityObjectToClipPos(vertex);
    23.             }
    24.  
    25.             fixed4 frag (out float oDepth : SV_DepthLessEqual) : SV_Target
    26.             {
    27.                 oDepth = 0.9999;
    28.                 return 0;
    29.             }
    30.             ENDCG
    31.         }
    32.     }
    33. }
    upload_2021-2-22_9-56-40.png
    The purple area is where RenderDoc thinks the quad's fragments are going to be limited to. The blocky black artifacts on the interior of the sphere are where the GPU is still running the fragment shader.
     
  17. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,625
    Thank you everybody for the help so far. I recorded a video where I go over the RenderDoc capture. Sorry for bad English.
     
  18. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,375
    Nothing obviously wrong there. Though I do have two thoughts.

    In the past I've had problems with modifying compute buffers directly from script being delayed by a frame. It's possible it's something like that is happening here. Could try having another shader that clears the buffer to 0 rather than doing it from script, or ping/ponging between two buffers so you're zeroing out the one that's not going to be used for that frame.

    RenderDoc showing
    GreaterEqual
    is not a bug, that is actually correct. Unity uses a reversed Z depth on non OpenGL platforms, so it flips
    ZTest LessEqual
    to
    GreaterEqual
    for you automagically.
     
    Peter77 likes this.
  19. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,625
    Thanks again for the quick help. I was really thinking this would fix the issue, but it doesn't. Here is an update:


    I see, that makes.
     

    Attached Files:

  20. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,375
    It appears we can add "writing to a RWStructuredBuffer" among the list of "things that disable early depth rejection".

    Good news, you can tell the GPU to eff off and force early depth rejection by modifying the shader a tiny bit.
    Code (CSharp):
    1.             [earlydepthstencil]
    2.             fixed4 frag(v2f i) : SV_Target
    3.             {
    4.                 _BufferData[0] = 1;
    5.                 return fixed4(1,1,0,1);
    6.             }
    And voilà.
     
  21. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,625
    Holy moly, excellent find! Thank you for the help! I guess I can get it to work from here on. I'll post an update if I have something to show :)
     
  22. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,289
    Ah yes nice find bgolus!

    Great stuff getting this working!
     
    Peter77 likes this.
  23. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,625
    I've made some progress :D

     
    Antony-Blackett and bgolus like this.
  24. dotsquid

    dotsquid

    Joined:
    Aug 11, 2016
    Posts:
    224
    @Peter77
    Great you made it work!
    Here is my approach I described back in 2019 when I was solving the same problem for our game which released on Android, iOS, tvOS, Nintendo Switch (and PC of course). I started to think the very same way you did: I tried to find how to use GPU occlusion queries in Unity but failed as well. So then I started to figure out how could I mimic this technique and make it work on wide variety of GPUs including the old/weak ones like on AppleTV gen4.
    So maybe you or some one else may be interested: http://dotsquid.com/2019/06/26/simple-gpu-occlusion-for-lens-flares/
     
    Last edited: Mar 9, 2021
    Peter77, JoNax97 and bgolus like this.
  25. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,625
    Thank you for the link, it's an interesting read. bgolus suggested this approach (link) too.

    The reason why I didn't implement it this way is that I'd need to render objects in the "occlusion view frustum" to the occlusion buffer, means it increases the overall vertex count and thus works against my vertex budget.

    However, unfortunately my example works in the editor only, it does not work on my Samsung Galaxy S6. The phone supports SM5, but I think it doesn't support the [earlydepthstencil] thingy.

    So... I actually need a different solution. Perhaps I should just give this "render objects to occlusion buffer" approach a try and measure how the vertex count adds up and whether it's still in the budget.

    Thanks again for the link!
     
  26. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,375
    @dotsquid 's example is using an orthographic camera for the main scene rendering, which simplifies things a bit for their setup and means the example script can't be used as is. But ultimately it shouldn't be that bad of a vertex count increase if you use a tight fov as most objects will hopefully be frustum culled.
     
  27. dotsquid

    dotsquid

    Joined:
    Aug 11, 2016
    Posts:
    224
    But this may burden the CPU. So the effectiveness of this approach really depends on the scene and the number of objects / vertices.

    BTW, I used orthographic camera for this effect only because the whole game uses an orthographic projection. I didn't make it on some smart purpose :)
     
  28. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,625
    I implemented the "render occlusion texture" approach now, which works with no issues on my older Android phone. It works without any GPU to CPU readback and requires SM2 or newer.

    I put all this on github and it can be installed through Unity's Package Manager:
    https://github.com/pschraut/UnityOcclusionLensFlare

    There is most likely room for improvement, but I wanted get most of this done this weekend. If there is feedback coming, I'll integrate it over the next weekends.

    Thanks again for the help!



     
    Last edited: Mar 7, 2021
    richardkettlewell likes this.