Question How to create a Lens Flare effect without Physics?

Peter77 · Feb 7, 2021

I'm using Unity 2019.4 with the built-in forward renderer. I want to implement a lens flare effect for a 3D mobile project (Android, iOS). The project does not use Unity's physics engine.

I found the Lens Flare and Flare Layer Components allow Unity to produce a lens flare effect. However, Unity's lens flare system requires Unity's physics engine and colliders, which the project doesn't use. The Lens Flare implementation in Unity is also not great, it seems to shoot a single ray to determine whether sight to the sun is occluded.

In the past, non-Unity time, I used GPU occlusion queries to determine how many pixels can be drawn (are not occluded) from the sun, which I used to adjust the lens flare opacity.

Unfortunately, I can't find occlusion queries in Unity and I'm stuck how one would create a lens flare effect otherwise. Any ideas?

Neto_Kokku · Feb 7, 2021

If you're targeting shader model 5 hardware, you can write to UAVs (RWStructuredBuffer) from pixel shaders. You could then draw a the sun disc and have each visible pixel increment a counter in an UAV to tell you the number of visible pixels, then use that buffer in the flare shader to adjust its size and opacity.

You can also do it without SM5 by reading depth values from the screen space rect which contains the sun disk after the opaque pass, and checking how many pixels are smaller than the max depth.

Peter77 · Feb 8, 2021

Thanks for your reply! Preferably I'd want to get it to run with SM3. How exactly would the second option work? Things that are unclear to me:

Do I need to render an additional depth texture or can I use the "hardeware-depth"?

When I render the sun disk, where would I store the data how many pixels are smaller than the max depth?

How would I transfer the "how many pixels are smaller than the max depth" data from the GPU to CPU? Would I use Rendering.AsyncGPUReadback.Request?

joshuacwilde · Feb 8, 2021

Another option would be to just change the sun flare position / opacity / size in a compute shader that reads the depth buffer.

BrandyStarbrite · Feb 8, 2021

If you want, you could make your own png. sun flare in photoshop, and use it in unity.

Neto_Kokku · Feb 8, 2021

Peter77 said: ↑

Thanks for your reply! Preferably I'd want to get it to run with SM3. How exactly would the second option work? Things that are unclear to me:

Do I need to render an additional depth texture or can I use the "hardeware-depth"?

When I render the sun disk, where would I store the data how many pixels are smaller than the max depth?

How would I transfer the "how many pixels are smaller than the max depth" data from the GPU to CPU? Would I use Rendering.AsyncGPUReadback.Request?

Click to expand...

Ideally, you should avoid reading back into the CPU. The idea would be to have a fragment or compute shader read from the depth texture and store the results into a 1x1 texture buffer, which you then read in the flare vertex and fragment shader. Compute shaders would require SM3.1 and sampling textures in vertex shaders requires SM3.0.

For SM2.0 there's no alternative other than using async readback to get the data to the CPU. You could mask 3-frames readback delay by gradually interpolating between the current value and the new one, so the flare gracefully fades out a little shorter after being occluded.

Trivia: this is pretty much how The Legend Of Zelda: Ocarina of Time did lens flare in the Nintendo 64. Since the CPU could freely access GPU memory, the game would read the depth value at the light source's screen-space position and compare it against it's actual depth to determine if it was occluded or not without ray casting. This was very troublesome to emulate for a long time on PCs.

bgolus · Feb 9, 2021

For SM2.0 I would start adding collision shapes / meshes to your objects to do single traces against on the CPU. Anything else is going to be way, way too slow. There's no async readback on those devices, so you're going to be waiting several MS each time, and the CPU is way too slow to iterate over the texture.

You could conceivably render a focused view of each flare as a white circle to an R8, downscale it, and use the smallest mip level to get the brightness w/o needing vertex texture sampling. Not sure that'd be terribly efficient though.

Peter77 · Feb 9, 2021

KokkuHub said: ↑

The idea would be to have a fragment or compute shader read from the depth texture and store the results into a 1x1 texture buffer, which you then read in the flare vertex and fragment shader.
Click to expand...

Thanks again for the help! That sounds like a slick approach. I'm going to give it a try at the weekend and post an update when I have something to show.

KokkuHub said: ↑

Trivia: this is pretty much how The Legend Of Zelda: Ocarina of Time did lens flare in the Nintendo 64.
Click to expand...

I always loved the trickery how developers created effects on older hardware.

Peter77 · Feb 10, 2021

bgolus said: ↑

You could conceivably render a focused view of each flare as a white circle to an R8, downscale it, and use the smallest mip level to get the brightness w/o needing vertex texture sampling.
Click to expand...

Thank you for reply. I don't know what "render a focused view" in particular means. How exactly would that work?

bgolus · Feb 10, 2021

"Short" version is:
Create a camera from script, disabled, as a child of your main camera.

Assign it a render texture that's something small like 32x32, R8 format, 16 bit depth buffer, and with mip maps enabled.
https://docs.unity3d.com/ScriptReference/RenderTexture-ctor.html
https://docs.unity3d.com/ScriptReference/RenderTexture-useMipMap.html

Assign a replacement shader to the camera that draws everything as solid black
https://docs.unity3d.com/Manual/SL-ShaderReplacement.html
https://docs.unity3d.com/ScriptReference/Camera.SetReplacementShader.html

Except for a quad the size of your flare visibility, and which is facing the camera. Have it use a shader that renders as a white circle (needs to use a custom
RenderType
tag with the circle shader in the replacement shader), and which is only visible to that camera. You can do that either with layers & the camera's culling mask, or you can enable it just before you call
Camera.Render()
as you're going to be doing most of this from c# before the scene renders.

Align the camera to point at that quad, and adjust the FOV so it fills the view:
https://forum.unity.com/threads/fit...of-view-focus-the-object.496472/#post-3229700
And then call
Render()
on the camera.

That render texture's smallest mip now has how occluded the flare is, and can be sampled from the fragment shader of your lens flare with
tex2Dbias(_FlareOcclusionTex, float4(0.5, 0.5, 0.0, -10.0)).r;
For one or maybe two flares, this is totally plausible, if overly complicated.

Peter77 · Feb 21, 2021

KokkuHub said: ↑

If you're targeting shader model 5 hardware, you can write to UAVs (RWStructuredBuffer) from pixel shaders. You could then draw a the sun disc and have each visible pixel increment a counter in an UAV to tell you the number of visible pixels, then use that buffer in the flare shader to adjust its size and opacity.
Click to expand...

After playing with some of the here mentioned approaches (thank you KokkuHub and bgolus for the write up), I decided to go with what I thought to be the simpler route and give RWStructuredBuffer a try.

However, I can't get it to work. I created a simple test project that I try to get running (also attached to this post). I have two shaders, both use the same ComputeBuffer. On the C# side I create the ComputeBuffer and assign it to both materials:

Code (CSharp):

void Start()

{

m_BufferData = new float[1];

m_ComputeBuffer = new ComputeBuffer(m_BufferData.Length, sizeof(float) * 1, ComputeBufferType.Default);

Graphics.SetRandomWriteTarget(1, m_ComputeBuffer);

m_ComputeBuffer.SetData(m_BufferData);

m_SunDiscMaterial.SetBuffer("_BufferData", m_ComputeBuffer);

m_LensFlareMaterial.SetBuffer("_BufferData", m_ComputeBuffer);

//Shader.SetGlobalBuffer("_BufferData", m_ComputeBuffer);

}

The sun disc shader writes
1
to the
_BufferData
in the fragment shader:

Code (CSharp):

uniform RWStructuredBuffer<float> _BufferData : register(u1);

fixed4 frag(v2f i) : SV_Target

{

_BufferData[0] = 1;

return fixed4(1,1,0,1);

}

My assumption was: The fragment is only executed for pixels that are visible (not z-rejected). So if no pixel is visible, the fragment shader would not write to the buffer. This assumption seems to be wrong?

The lens flare shader then reads the
_BufferData[0]
value, which should be 0 or 1, and uses it as color:

Code (CSharp):

uniform RWStructuredBuffer<float> _BufferData : register(u1);

fixed4 frag(v2f i) : SV_Target

{

return _BufferData[0] * 0.1;

}

On the C# side, I set _BufferData to 0 every frame:

Code (CSharp):

void Update()

{

m_BufferData[0] = 0;

m_ComputeBuffer.SetData(m_BufferData);

}

Assuming this would cause _BufferData[0] to be 0 when the entire sun disc is occluded and 1 if a single fragment of the sun disc is visible. But _BufferData[0] seems to be 1 always, regardless whether the sun disc is occluded by another object that writes to the z-buffer. PS: Writing to m_BufferData[0] on the C# side makes it to the shader side, I tested this.

The yellow sphere is the sun disc. The transparent white squares are particles rendered with the flare shader.

However, moving the sun disc behind a green occluder doesn't cause particles to disappear, because _BufferData[0] is still 1.

What I'm doing wrong?

joshuacwilde · Feb 22, 2021

Hmm, I may be wrong... But I think IMUs (Immediate Mode Renderers, like desktop GPUs) will execute the fragment shader even on occluded fragments, as they don't have built in depth-culling.

EDIT : Didn't realize you were targeting mobile, so none of the above is relevant...

bgolus · Feb 22, 2021

The fragment shader will absolutely not be run if it's fully occluded.*

* Assuming nothing is preventing occlusion from working.

If the shader you're using to write to the buffer is using
ZTest Always
, or is modifying the vertex shader's output clip space position z so it's always close to the camera, or if you're writing to
SV_Depth
in the fragment, all of these will skip early depth culling, meaning the fragment shader will run for every pixel all the time. So make sure you didn't leave any of those in from the original lens flare shader. Obviously you're not writing to the depth, but make sure you don't have
ZTest Always
. There's also the issue of pixel quads, where for each 2x2 group of pixels if one pixel is visible, the fragment shader for all 4 will be run. There's not actually a way to avoid that last issue that I'm aware of.

Also, be mindful that this technique will limit you to mid to high end phones from the last 5-6 years.

richardkettlewell · Feb 22, 2021

I think you should run this through RenderDoc. It can give you a step by step of what the GPU is doing on specific frames. (on PC).

A question I have, that RenderDoc could answer: is the sun being rendered *after* all the geometry?

joshuacwilde · Feb 22, 2021

One more thing, is I would try and see if it renders any different on device compared to in the editor. I have a feeling it might.

bgolus · Feb 22, 2021

As much as I love RenderDoc, it isn't quite accurate enough for this specific case. RenderDoc doesn't differentiate between pre or post fragment depth rejections, which is important here. It also doesn't show you anything to do with pixel quad behavior. It's just showing you what the ideal rejection for the merge is, and it doesn't always even match that. Using something like
SV_DepthLessEqual
with an out range value (like 0.9999) and you can very easily see this.

Code (CSharp):

Shader "Unlit/BadDepthLessEqual"

{

Properties

{

}

SubShader

{

Tags { "Queue"="Transparent" }

Pass

{

CGPROGRAM

#pragma vertex vert

#pragma fragment frag

#pragma target 5.0

#include "UnityCG.cginc"

float4 vert (float4 vertex : POSITION) : SV_POSITION

{

return UnityObjectToClipPos(vertex);

}

fixed4 frag (out float oDepth : SV_DepthLessEqual) : SV_Target

{

oDepth = 0.9999;

return 0;

}

ENDCG

}

}

}

The purple area is where RenderDoc thinks the quad's fragments are going to be limited to. The blocky black artifacts on the interior of the sphere are where the GPU is still running the fragment shader.

Peter77 · Feb 22, 2021

richardkettlewell said: ↑

I think you should run this through RenderDoc.
Click to expand...

Thank you everybody for the help so far. I recorded a video where I go over the RenderDoc capture. Sorry for bad English.

bgolus · Feb 22, 2021

Nothing obviously wrong there. Though I do have two thoughts.

In the past I've had problems with modifying compute buffers directly from script being delayed by a frame. It's possible it's something like that is happening here. Could try having another shader that clears the buffer to 0 rather than doing it from script, or ping/ponging between two buffers so you're zeroing out the one that's not going to be used for that frame.

RenderDoc showing
GreaterEqual
is not a bug, that is actually correct. Unity uses a reversed Z depth on non OpenGL platforms, so it flips
ZTest LessEqual
to
GreaterEqual
for you automagically.

Peter77 · Feb 22, 2021

bgolus said: ↑

Could try having another shader that clears the buffer to 0 rather than doing it from script
Click to expand...

Thanks again for the quick help. I was really thinking this would fix the issue, but it doesn't. Here is an update:
bgolus said: ↑
RenderDoc showing
GreaterEqual
is not a bug, that is actually correct. Unity uses a reversed Z depth on non OpenGL platforms, so it flips
ZTest LessEqual
to
GreaterEqual
for you automagically.
Click to expand...
I see, that makes.

bgolus · Feb 22, 2021

It appears we can add "writing to a RWStructuredBuffer" among the list of "things that disable early depth rejection".

Good news, you can tell the GPU to eff off and force early depth rejection by modifying the shader a tiny bit.

Code (CSharp):

[earlydepthstencil]

fixed4 frag(v2f i) : SV_Target

{

_BufferData[0] = 1;

return fixed4(1,1,0,1);

}

And voilà.

Peter77 · Feb 22, 2021

bgolus said: ↑

you can tell the GPU to eff off and force early depth rejection by modifying the shader a tiny bit.
Click to expand...

Holy moly, excellent find! Thank you for the help! I guess I can get it to work from here on. I'll post an update if I have something to show

richardkettlewell · Feb 23, 2021

Ah yes nice find bgolus!

Great stuff getting this working!

Peter77 · Feb 23, 2021

I've made some progress

dotsquid · Mar 9, 2021

@Peter77
Great you made it work!
Here is my approach I described back in 2019 when I was solving the same problem for our game which released on Android, iOS, tvOS, Nintendo Switch (and PC of course). I started to think the very same way you did: I tried to find how to use GPU occlusion queries in Unity but failed as well. So then I started to figure out how could I mimic this technique and make it work on wide variety of GPUs including the old/weak ones like on AppleTV gen4.
So maybe you or some one else may be interested: http://dotsquid.com/2019/06/26/simple-gpu-occlusion-for-lens-flares/

Peter77 · Mar 4, 2021

dotsquid said: ↑

So maybe you or some one else may be interested: http://dotsquid.com/2019/06/26/simple-gpu-occlusion-for-lens-flares/
Click to expand...

Thank you for the link, it's an interesting read. bgolus suggested this approach (link) too.

The reason why I didn't implement it this way is that I'd need to render objects in the "occlusion view frustum" to the occlusion buffer, means it increases the overall vertex count and thus works against my vertex budget.

However, unfortunately my example works in the editor only, it does not work on my Samsung Galaxy S6. The phone supports SM5, but I think it doesn't support the [earlydepthstencil] thingy.

So... I actually need a different solution. Perhaps I should just give this "render objects to occlusion buffer" approach a try and measure how the vertex count adds up and whether it's still in the budget.

Thanks again for the link!

bgolus · Mar 4, 2021

@dotsquid 's example is using an orthographic camera for the main scene rendering, which simplifies things a bit for their setup and means the example script can't be used as is. But ultimately it shouldn't be that bad of a vertex count increase if you use a tight fov as most objects will hopefully be frustum culled.

dotsquid · Mar 5, 2021

bgolus said: ↑

as most objects will hopefully be frustum culled
Click to expand...

But this may burden the CPU. So the effectiveness of this approach really depends on the scene and the number of objects / vertices.

BTW, I used orthographic camera for this effect only because the whole game uses an orthographic projection. I didn't make it on some smart purpose

Peter77 · Mar 7, 2021

bgolus said: ↑

"Short" version is
Click to expand...

I implemented the "render occlusion texture" approach now, which works with no issues on my older Android phone. It works without any GPU to CPU readback and requires SM2 or newer.

I put all this on github and it can be installed through Unity's Package Manager:
https://github.com/pschraut/UnityOcclusionLensFlare

There is most likely room for improvement, but I wanted get most of this done this weekend. If there is feedback coming, I'll integrate it over the next weekends.

Thanks again for the help!

Search Unity

Unity ID

Useful Searches

Question How to create a Lens Flare effect without Physics?

QA Jesus

QA Jesus

QA Jesus

QA Jesus

QA Jesus

Attached Files:

Unity Technologies

QA Jesus

QA Jesus

Attached Files:

QA Jesus

Unity Technologies

QA Jesus

QA Jesus

QA Jesus