"Fixing" Screen Space Directional Shadows and Anti-Aliasing

bgolus · Jan 16, 2016

Unity has always had an issue with anti-aliasing and directional light shadows, specifically that they don't seem to respect the anti-aliasing and undo the work the scene anti-aliasing is attempting to do. The reason for this is fairly straightforward once you understand it but the ways to fix the issue are not really obvious. Directional shadows are rendered out in a full screen buffer using the scene depth, and the scene depth is not anti-aliased, so the resulting scene shadows aren't either.

That said I came up with a hacky solution that with some more massaging might be made useful so I'm putting this here to see if anyone else wants to take a gander at it.

The below is a forward rendered scene with 8x MSAA. Note the edges against the skybox are smooth, but the shadow edges are chunky in the top image, but everything is smooth in the bottom one.

Now a valid question might be "why not just use a post process AA"? The answer to that is post process AA techniques don't address temporal aliasing (flickering of pixels from one frame to another) and that's the kind of aliasing that VR has the biggest problem with. The flicker is still apparent even if the edge is blurred.

Code (ShaderLab):

Shader "Custom/Screen Space Shadow AA Correction"

{

Properties

{

}

SubShader

{

Tags { "RenderType"="Opaque" }

LOD 100

Pass

{

Tags {"LightMode"="ForwardBase"}

CGPROGRAM

#pragma vertex vert

#pragma fragment frag

#pragma multi_compile_fwdbase nolightmap nodirlightmap nodynlightmap novertexlight

#include "UnityCG.cginc"

#include "AutoLight.cginc"

struct appdata

{

float4 vertex : POSITION;

float2 uv : TEXCOORD0;

};

struct v2f

{

float4 pos : SV_POSITION;

float2 uv : TEXCOORD0;

SHADOW_COORDS(1)

};

sampler2D _MainTex;

float4 _MainTex_ST;

sampler2D_float _CameraDepthTexture;

float4 _CameraDepthTexture_TexelSize;

v2f vert (appdata v)

{

v2f o;

o.pos = mul(UNITY_MATRIX_MVP, v.vertex);

o.uv = TRANSFORM_TEX(v.uv, _MainTex);

TRANSFER_SHADOW(o)

return o;

}

fixed4 frag (v2f i) : SV_Target

{

#ifdef SHADOWS_SCREEN

float2 screenUV = i._ShadowCoord.xy / i._ShadowCoord.w;

fixed shadow = tex2D(_ShadowMapTexture, screenUV).r;

// early out, shows off "standard" screen space shadows

if(frac(_Time.x) > 0.5)

return shadow;

float fragDepth = i._ShadowCoord.z / i._ShadowCoord.w;

float depth_raw = tex2D(_CameraDepthTexture, screenUV).r;

float depthDiff = abs(fragDepth - depth_raw);

float diffTest = 1.0 / 100000.0;

if (depthDiff > diffTest)

{

float2 texelSize = _CameraDepthTexture_TexelSize.xy;

float4 offsetDepths = 0;

float2 uvOffsets[5] = {

float2(1.0, 0.0) * texelSize,

float2(-1.0, 0.0) * texelSize,

float2(0.0, 1.0) * texelSize,

float2(0.0, -1.0) * texelSize,

float2(0.0, 0.0)

};

offsetDepths.x = tex2D(_CameraDepthTexture, screenUV + uvOffsets[0]).r;

offsetDepths.y = tex2D(_CameraDepthTexture, screenUV + uvOffsets[1]).r;

offsetDepths.z = tex2D(_CameraDepthTexture, screenUV + uvOffsets[2]).r;

offsetDepths.w = tex2D(_CameraDepthTexture, screenUV + uvOffsets[3]).r;

float4 offsetDiffs = abs(fragDepth - offsetDepths);

float diffs[4] = {offsetDiffs.x, offsetDiffs.y, offsetDiffs.z, offsetDiffs.w};

int lowest = 4;

float tempDiff = depthDiff;

for (int i=0; i<4; i++)

{

if(diffs[i] < tempDiff)

{

tempDiff = diffs[i];

lowest = i;

}

}

shadow = tex2D(_ShadowMapTexture, screenUV + uvOffsets[lowest]).r;

}

return shadow;

#endif //SHADOWS_SCREEN

return 1;

}

ENDCG

}

}

FallBack "Diffuse"

}

This is just a proof of concept and not optimized or even really usable code. The shader is essentially an unlit shader that samples the screen space shadows.

Reanimate_L · Jan 18, 2016

Hey thanks for the sample code . seems pretty interesting
How's the performance? just curious.

bgolus · Jan 18, 2016

I haven't done enough testing to gauge testing; in my test scene there's no perceivable difference in performance, maybe ~0.2ms @ 1920x1080, but this scene isn't exactly indicative of a real game scene and the additional depth samples this shader does is going to start to cost more in a scene that actually uses textures. How much more I couldn't say without extra testing. The shader is doing up to 5 depth samples for each fragment sample. With 4x MSAA this means an additional 1 texture sample to up to worst case of 20 additional texture samples per pixel, with 8x MSAA this goes up to 40, but it'll be exceedingly rare as every coverage sample of the MSAA would have to be of a different polygon. It's plausible fo complex geometry with 4x MSAA, but if you're hitting that with 8x MSAA I would say you need to rethink your mesh density.

Ippokratis · Jan 19, 2016

Hi @bgolus, I have seen some of your answers in the shader section and i really appreciate the fact that you share all this knowledge.

I wonder about some choices you did in this shader.

line 65
float diffTest = 1.0 / 100000.0;

Why not using
const float diffTest = 0,000001;

lines 72-78, uvOffsets declaration
Why not declaring it in the vert instead of the frag ?

Thanks for sharing.

bgolus · Jan 19, 2016

Ippokratis said: ↑

Hi @bgolus, I have seen some of your answers in the shader section and i really appreciate the fact that you share all this knowledge.

I wonder about some choices you did in this shader.

line 65
float diffTest = 1.0 / 100000.0;

Why not using
const float diffTest = 0,000001;

lines 72-78, uvOffsets declaration
Why not declaring it in the vert instead of the frag ?

Thanks for sharing.
Click to expand...

1.0 / 10000.0 is because that form is a little easier to tweak and when the shader is compiled it store the number as 0.00001 for me since both values are constant. That is basically a dumb magic number anyway and needs some work to be a slightly less dumb magic number.

I'm doing the uv offsets in the pixel shader because this is just a prototype and because isn't intended for mobile. I'll explain.

Most of the time people think you should do as much work in the vertex shader as possible because there are fewer vertices than pixels and the cost of the math will be reduced. This is true, but there's a cost to moving data from the vertex shader to the pixel shader that most people don't realize and GPUs are really fast at calculations. The savings of calculating something fewer times might be lost if it's a lot of data or not a lot of calculation. In this case it's transferring 10 float values vs the cost of 4 multiplies. With some optimization that could be 8 floats vs 2 multiplies. Either way doing 4 multiplies is nearly free, but 10 floats are not. If there was two or three times as much math involved it might start to make sense.

On mobile calculating UVs in the vertex shader specifically is still a huge win as it means the gpu can cache the texture values before running the fragment shader and the performance of the data transfer to calculation speed isn't quite as far apart as it is on the desktop. So if I was writing this for a mobile device I would be doing a lot more in the vertex shader.

Ippokratis · Jan 19, 2016

Thanks for explaining.

bgolus · Jan 22, 2016

After posting this I realized this technique has a lot of similarities to Inferred Rendering which made me realize it could be abused to limited shadowing on transparency. I have this working with AlphaToMask right now, but only one layer at a time.

hippocoder · Jun 15, 2016

Really fascinating work you're doing @bgolus - do you have a blog or something?

bgolus · Jun 15, 2016

Nope, too lazy for that.

DmitryAndreevMel · Oct 25, 2017

bgolus said: ↑

The shader is doing up to 5 depth samples for each fragment sample. With 4x MSAA this means an additional 1 texture sample to up to worst case of 20 additional texture samples per pixel, with 8x MSAA this goes up to 40
Click to expand...

this is not correct: in MSAA fragment shader executed only once for all subsamples inside a pixel, so number of executed fragment functions is always the same as if no MSAA applied

bgolus · Oct 25, 2017

DmitryAndreevMel said: ↑

this is not correct: in MSAA fragment shader executed only once for all subsamples inside a pixel, so number of executed fragment functions is always the same as if no MSAA applied
Click to expand...

While you are correct that MSAA only runs the fragment shader once per pixel and in the best case it is identical to no MSAA. But it's once per pixel per triangle, and a different triangle can be rendered per subsample. So for 4x MSAA up to four fragment shaders may be executed per pixel. And that's ignoring potential overdraw.

For example, if the subpixel samples all hit a different triangle, like in the case of a vertex shared by 4 triangles being right at the center of the pixel, then the fragment shader will be executed 4 times. 4 evocations * 5 depth texture samples = 20 additional texture samples.

DmitryAndreevMel · Nov 8, 2017

bgolus said: ↑

different triangle can be rendered per subsample
Click to expand...

true. Though it affects the overall number of pixel shader invocations slightly on average scene. It's misleading to assume worst case to assess performance implications of MSAA

bgolus · Nov 8, 2017

DmitryAndreevMel said: ↑

Though it affects the overall number of pixel shader invocations slightly on average scene. It's misleading to assume worst case to assess performance implications of MSAA
Click to expand...

And I'm not assuming the worst, I simply stating the possible range. I even said the worst case is exceedingly rare. Plus there are plenty of people new to real time rendering importing their high res models directly into Unity so it's probably less rare than we would hope.

Though in a lot of ways it is always much worse than most people expect, even when they understand it only runs on tri edges. If a single subsample in a pixel quad is on a tri, then all 4 pixels in the quad have to run the shader. My "worse case" estimate is actually a low ball. The worst case for 4x MSAA is actually 16 shader evocations per pixel, assuming best case overdraw and no transparency. It's easily worse than this with complex models due to overdraw too, even with out the perfect tri corner case. Mobile and Nvidia GPUs save a bit by being tiled-based reducing the impact of overdraw so that's less of a concern.

There were several talks near the end of the Xbox 360 lifespan on using MSAA with deferred rendering that had some good "holy sh*t!" images when they visualized how many pixels the GPU was multi-sampling.

This is from an old Nvidia example with the top being all tri edges the GPU would be multi-sampling vs a custom detection method based on depth discontinuities (which is closer to what most humans would likely expect). And I think you can agree that that boxy corner of Sponza is far less geometric detail than most games would have.

DmitryAndreevMel · Nov 10, 2017

bgolus said: ↑

The worst case for 4x MSAA is actually 16 shader evocations per pixel,
Click to expand...

wait a second, I didn't get it... This is how I think it works, please correct me if I wrong:
let's assume we are in MSAAx4 mode, no overdraw on our scene, no transparency, no depth-writes in shader, no alpha-to-coverage and we are talking exclusively about fragment shader further on.
if a pixel quad is fully inside a triangle, then all 4 subsamples share the same color data which is gathered from one shader invocation and this invocation is for exact pixel center position and does not correlate to any subsample positions. In case if pixel quad intersects the triangle the procedure is the same for one triangle: shader invoked only once, again for exact pixel center position but this time the bitmask to write into MSAA-texture is calculated by checking which of the four subsamples are inside the triangle. In both cases for one pixel the result of one shader invocation and a (coverage) bitmask is passed from fragment shader stage and the hardware knows how to interpret the result and writes (after depthtest, see below) the same color to all texture subsamples with corresponding bit in bitmask set to 1.
Now let's look at depth: during the rasterization stage depth values for all four subsamples are calculated based on the triangle's plane and subsamples positions. All four values do participate in four different depth-tests and the depth-test result (bitmask) is combined with coverage bitmask to decide whether to write color from shader invocation to texture subsample color data or not. (of course there is an early-out if during depth test stage no one of all 4 tests passed - in such a case no shader invocations occurs).
TLDR: for one pixel and one triangle the hardware in any case executes pixel shader once (pixel center coordinates), computes depth for all 4 subsamples in rasterization stage, performs 4 depth tests, computes coverage bitmask in rasterization stage (4 point-triangle-intersection tests), combines depth- and coverage-bitmasks - and this is it. One color vector, 4-bit bitmask and four depth values are passed to texture write hardware (if we do not take into account hardware data compression)
In case the pixel is on the edge of any triangle - it will be processed the same number of times as how many triangles its quad intersects. At maximum only 4 times we'll go further than depth test stage - so at maximum only 4 shader invocations will occur.
Now let's take into account that most of GPUs process not individual pixels but 2x2 tiles of pixels at once. So even if one subsample out of 16 in this tile is inside a triangle - all 4 pixels will invoke a shader program, but only one result will be used... but this is not so different from no-MSAA mode where the same rules applied - furthermore all 4 shader invocations are performed in parallel so can not be considered as "3 wasted-performance invocations".

Of course the worst-case 4x multiplier will go up in case of overdraw regions, alpha-to-coverage mode, transparency, explicit depth calculations in fragment shader (in such a case by the way all four depth values are the same for pixel's subsamples and MSAA does not help with aliasing in any way) which disables early-Z-out.

bgolus · Nov 10, 2017

DmitryAndreevMel said: ↑

if a pixel quad is fully inside a triangle, then all 4 subsamples share the same color data which is gathered from one shader invocation and this invocation is for exact pixel center position and does not correlate to any subsample positions.
Click to expand...

Note that when I say "pixel quad" I'm explicitly referring to the 2x2 pixel tiles. Otherwise, yes. Single invocation per pixel of the shader at the center of each pixel if all subsamples in the quad are the same triangle. Whether or not that correlates to a subsample depends on the implementation, but generally for 4x it's the rotated grid / 4 rooks pattern in which case yes, there is no correlation.

But apart from terminology, I think we're in agreement.

As you said, GPUs work in 2x2 tiles of pixels (which I called a quad). With 4x MSAA each of those pixels has 4 subsamples making 16 possible subsamples. If any one of those subsamples is a triangle, all 4 pixels render the fragment shader for that triangle. That makes the worst case 16 fragment shader invocations per pixel in the unlikely chance of all 16 subsamples in a 2x2 tile of pixels are each sampling a unique triangle. You are also correct that with out MSAA the same rules for the 2x2 tiles exist, so it's possible for 4 fragment shader invocations per pixel to occur with MSAA disabled.

Your contention seems to be that:
A) I am counting those additional invocations due to the 2x2 tile, which I'll admit is making things more confusing than it needs to be, though technically correct. 4x MSAA is not 16x more shader invocations.
B) Am bringing up the worst case at all since it is so exceptionally rare. However if you have something as simple as the default Unity sphere mesh small enough on screen to only be a few pixels wide, this will absolutely be the case as every subsample is likely to be of a different triangle since the individual triangles are significantly smaller than a single pixel. I see this all of the time though where people take the sphere and use it to put a dot on screen because they don't know any better or it got left there from prototyping. The last game I shipped we at one point had berry bushes with 1600 poly spheres for each of the 8 berries on the otherwise <100 poly bush. So I contend this case happens far more often than one might expect.

Even in AAA games there is a great story of a game's framerate & memory budget suddenly having problems and it being tracked down to a box of bullets ... where each bullet has it's own unique set of 2k textures and a 30k mesh and the entire box is actually filled with 30 bullets, not just the top few that are visible...

DmitryAndreevMel · Nov 10, 2017

bgolus said: ↑

But apart from terminology, I think we're in agreement.
Click to expand...

I see, yes, fully agree now.
Never met 'pixel quad' term before, misunderstood your post. Thanks for clearing it out!

IgnisIncendio · Jul 5, 2018

Does the new LWRP fix this issue?

bgolus · Jul 5, 2018

Sadly, no. It did in some early versions, and then it didn't by the time non-beta builds of Unity were required to run it. Well, that's not entirely true. As of LWRP 2.0.4 if you disable shadow cascades the main directional light shadows are sampled in the forward pass instead of the "Screenspace shadow resolve" as they call it in the change notes.
https://docs.unity3d.com/Packages/c...ines.lightweight@2.0/changelog/CHANGELOG.html

For mobile platforms, Unity's forward renderer has never used the screen space shadows, and in some versions of Unity it was possible to disable this even for desktop / console builds, and by some versions I don't mean major versions, but Unity 5.3 you could disable screen space shadows from the quality settings (using a setting not exposed to the inspector), but in 5.4 it was removed, and in some versions afterward disabling the screen space shadow shader would cause it to fall back to sampling the shadows in the forward pass, and in others it would just stop rendering the shadows entirely. For all of these cases only hard shadows were an option.

With LWRP 2.0.4 you can still use soft shadows for non-screen space shadows, which is a nice improvement over the built in forward renderer, but there are some issues still. Since you can't use cascades for non-screen space shadows you generally need to set the shadow distance fairly small, and currently there is a hard edge where the shadow map ends rather than a soft fade like the built in renderer. When using soft shadows and a larger range the light bias has to be quite high leading to a lot of light bleeding. The light bias settings have to be set much higher in the LWRP compared to the built in renderers even with cascades as they use the same bias for all cascades rather than properly increasing the bias for each, so even there it's a bit of a qualitative drop from the built in renderer. This means either the shadows in the distance show significant shadow acne, or your bias is set for the largest cascade which again produces significant light bleeding.

The most annoying part is the early versions of the LWRP did soft shadows and cascades in the forward passes, even on transparencies, which is pretty much exactly what many of us doing VR want. It was removed because the other techniques are more efficient and they were experiencing an explosion of shader variants as the LWRP is being designed to cover a broad range of platforms and uses, now even more so.

Also, Unity for their part has also acknowledged the current setup isn't great for VR, and internally there are plans to look into this more.

IgnisIncendio · Jul 5, 2018

bgolus said: ↑

Sadly, no. It did in some early versions, and then it didn't by the time non-beta builds of Unity were required to run it. Well, that's not entirely true. As of LWRP 2.0.4 if you disable shadow cascades the main directional light shadows are sampled in the forward pass instead of the "Screenspace shadow resolve" as they call it in the change notes.
https://docs.unity3d.com/Packages/c...ines.lightweight@2.0/changelog/CHANGELOG.html

For mobile platforms, Unity's forward renderer has never used the screen space shadows, and in some versions of Unity it was possible to disable this even for desktop / console builds, and by some versions I don't mean major versions, but Unity 5.3 you could disable screen space shadows from the quality settings (using a setting not exposed to the inspector), but in 5.4 it was removed, and in some versions afterward disabling the screen space shadow shader would cause it to fall back to sampling the shadows in the forward pass, and in others it would just stop rendering the shadows entirely. For all of these cases only hard shadows were an option.

With LWRP 2.0.4 you can still use soft shadows for non-screen space shadows, which is a nice improvement over the built in forward renderer, but there are some issues still. Since you can't use cascades for non-screen space shadows you generally need to set the shadow distance fairly small, and currently there is a hard edge where the shadow map ends rather than a soft fade like the built in renderer. When using soft shadows and a larger range the light bias has to be quite high leading to a lot of light bleeding. The light bias settings have to be set much higher in the LWRP compared to the built in renderers even with cascades as they use the same bias for all cascades rather than properly increasing the bias for each, so even there it's a bit of a qualitative drop from the built in renderer. This means either the shadows in the distance show significant shadow acne, or your bias is set for the largest cascade which again produces significant light bleeding.

The most annoying part is the early versions of the LWRP did soft shadows and cascades in the forward passes, even on transparencies, which is pretty much exactly what many of us doing VR want. It was removed because the other techniques are more efficient and they were experiencing an explosion of shader variants as the LWRP is being designed to cover a broad range of platforms and uses, now even more so.

Also, Unity for their part has also acknowledged the current setup isn't great for VR, and internally there are plans to look into this more.
Click to expand...

I see! Very very informative, thank you! I'm actually doing a school VR game right now (due next month), and the shadows, while aren't that visible in VR, kind of bug me a bit. Would your shader be able to solve the issue?

bgolus · Jul 5, 2018

The concept shown in the shader, yes. The shader itself isn't doing any shading, just sampling the shadow map and displaying that directly. To use this in a real environment would require some additional work to override the functions in AutoLight.cginc, or use a completely custom lighting model not using Unity's shading code or surface shaders. I've shipped two games using variations on this technique, though I haven't gotten it working properly for my current project since Unity is making it harder for me as they keep making modifications to the AutoLight.cginc.

peterfiftyfour · May 1, 2019

Hey bgolus, just resurrecting this as I have been having a similar issue with water shaders using the depth texture to show edge foam.

At first I took the depth like this:

Code (CSharp):

float existingDepth = tex2Dproj(_CameraDepthTexture, UNITY_PROJ_COORD(i.screenPosition)).r;

Which resulted in this (you can see the slight outlines):

But I used your technique and instead sampled the depth texture at 4 points and took the min value:

Code (CSharp):

float2 texelSize = _CameraDepthTexture_TexelSize.xy;

float2 screenUV = i.screenPosition.xy / i.screenPosition.w;

float d1 = tex2D(_CameraDepthTexture, screenUV + float2(1.0, 0.0) * texelSize).r;

float d2 = tex2D(_CameraDepthTexture, screenUV + float2(-1.0, 0.0) * texelSize).r;

float d3 = tex2D(_CameraDepthTexture, screenUV + float2(0.0, 1.0) * texelSize).r;

float d4 = tex2D(_CameraDepthTexture, screenUV + float2(0.0, -1.0) * texelSize).r;

float existingDepth = min(d1, min(d2, min(d3, d4)));

This seems to work perfectly so far:

But now I am thinking ... how is it that simple. There must be a catch? Is this technique just a very basic blurring of the depth texture?

Thanks for all the shader help in the forums

bgolus · May 1, 2019

It's a search, not really a blur. You're not averaging the values, but finding the one that best fits your needs. In the shadow case it's finding the depth closest to the current fragment. For your water case it's finding the depth furthest away from the camera. You should also be checking against the current pixel's depth, and not just the 4 surrounding points.

The catch for your technique is you might have some false "negatives", like if you have a pillar stuck in the water the left and right most pixels won't have edge foam when they should, but that'll be much more rare and less obvious than the alternative you had before.

xVergilx · Jul 5, 2019

This is kinda an old thread, but, @bgolus, is it possible that I'm encountering this one in my case as well?
https://forum.unity.com/threads/soft-shadows-cause-white-pixel-artifacts-everywhere.699596

Realtime soft shadows cause white pixels along mesh tri edges. If I got only hard shadows - they do not appear.

SniperED007 · Jul 13, 2020

Is it not possible to get the Scene Color texture before the anti aliasing is applied so that the depth and scene textures match perfectly?

bgolus · Jul 13, 2020

MSAA isn't something that's "applied", it's an innate part of the rendering. A single pixel can contain multiple color values and depths. Unity doesn't render the depth texture using MSAA, because until very recently Unity didn't support directly reading a multi-sample texture in a shader, and because hardware resolved MSAA depth textures have bad values. Now Unity does support reading a multi-sample texture, and the HDRP has code for using an MSAA depth texture, though I don't know if it uses it. But the built in rendering paths aren't going to see any further updates, so we're still stuck with the old limitations if we want to use them. An MSAA render is guaranteed to not match a non-MSAA one.

Plus reading a multi-sample texture is slower than reading a non multi-sample texture, so the URP still uses a non-MSAA depth texture for things like soft particles and post process effects. It used to have the same issue with shadows, at least back in the LWRP days, but it appears they don't use a screen space shadow texture anymore so at least the specific issue in this article is solved.

Now, if you're not using MSAA and are instead only using a post process based anti-aliasing technique, like FXAA, SMAA, or TAA, then this isn't an issue anyway since all of those are applied well after shadow rendering is complete.

Search Unity

"Fixing" Screen Space Directional Shadows and Anti-Aliasing

bgolus

Attached Files:

ScreenSpaceShadowAACorrection.shader

Reanimate_L

bgolus

Ippokratis

bgolus

Ippokratis

bgolus

hippocoder

Digital Ape

bgolus

DmitryAndreevMel

bgolus

DmitryAndreevMel

bgolus

DmitryAndreevMel

bgolus

DmitryAndreevMel

IgnisIncendio

bgolus

IgnisIncendio

bgolus

peterfiftyfour

bgolus

xVergilx

SniperED007

bgolus

Search Unity

Unity ID

Useful Searches

"Fixing" Screen Space Directional Shadows and Anti-Aliasing

Attached Files:

Digital Ape