DecodeDepthNormal/Linear01Depth/LinearEyeDepth explanations

mahdiii · Jan 6, 2019

Hi. I do not know accurately input and output of these functions and how they relate to each other. Thank you
DecodeDepthNormal
DECODE_EYEDEPTH
LinearEyeDepth
Linear01Depth
SAMPLE_DEPTH_TEXTURE
DecodeFloatRG

In SAMPLE_DEPTH_TEXTURE function, the output is the depth or z coordinate in the screen space
in DecodeDepthNormal, it has been used i.scrPos.xy instead of i.uv why?

Code (CSharp):

float depth = SAMPLE_DEPTH_TEXTURE(_CameraDepthTexture, uv); // sample from depth texture

depth = Linear01Depth (depth);

////

DecodeDepthNormal(tex2D(_CameraDepthNormalsTexture, i.scrPos.xy), depthValue, normalValues)

I am confused. Another code is like below:

Code (CSharp):

float4 depthnormal = tex2D(_CameraDepthNormalsTexture, i.uv);

//decode depthnormal

float3 normal;

float depth;

DecodeDepthNormal(depthnormal, depth, normal);

He/she has written DecodeDepthNormal with i.uv

bgolus · Jan 7, 2019

Let’s start off with the easy question.

mahdiii said: ↑

i.scrPos.xy instead of i.uv why?
Click to expand...

Using the screen position is guaranteed to match up, but for post process effects (using Blit() or similar) the UVs of the quad / tri being drawn are often setup to match saving some math.

Now on to the rest of it.

Some basics. The depth value stored in the buffer buffer when rendering with a perspective projection matrix is a non-linear value between 0.0 and 1.0. Traditionally 1.0 is the far plane, and 0.0 is the near plane, but Unity uses a reversed Z so that’s reversed. That’s another conversation. However understand that a depth buffer value of 0.5 is not half way between the near and far plane, but actually quite close to the camera, even with a fairly distant far plane.

The _CameraDepthTexture stores the depth value exactly as it would be in the depth buffer as a 32 bit float texture. The SAMPLE_DEPTH_TEXTURE macro is just a simple tex2D call on most platforms, but you only ever want a single channel from the texture read, and sometimes special handling of floating point textures is required, so it’s a macro.

Converting this floating point value from that non-linear 1.0 to 0.0 range to a linear range is what the two most commonly seen functions do. LinearEyeDepth and Linear01Depth. The DECODE_EYEDEPTH macro just calls that first function.

LinearEyeDepth takes the depth buffer value and converts it into world scaled view space depth. The original depth texture 0.0 will become the far plane distance value, and 1.0 will be the near clip plane. So now with the value you get from the linear eye depth function, 1 is a surface that is 1 unit from the camera’s pivot along the camera’s z axis. A value of 100 is 100 units, 200 is 200 units, you get the idea.

Linear01Depth mostly just makes the non-linear 1.0 to 0.0 range be a linear 0.0 to 1.0, so 0.5 really is half way between the camera and far plane.

The _CameraDepthNormalsTexture is a different beast. This actually has nothing to do with the depth buffer at all. This is storing the linear view depth and view normals in a single RGBA32 texture. The normals are stored in view space using stereographic projection. Basically the x and y values are stretched a bit so it can represent values facing away from the camera with out needing a third component. This is because with perspective projections, you can sometimes see view space normals that aren’t facing along the camera’s forward axis. It also means that the normals are a slightly lower quality. The depth is even more curious. It is a view space linear depth with a range of 0.0 to 1.0 encoded as a 16 bit float stored in two channels of the RGBA32. This means 0.5 is half way between the camera and the far plane, just like Linear01Depth gets you when using the _CameraDepthTexture, but it means the quality of this depth is very, very poor.

That’s where DecodeDepthNormal and DecodeFloatRG come in. It takes that 8 bit per channel RGBA texture and extracts the 16 bit float depth and view space normal (with a reconstructed Z).

So, you might ask, “why are there two depth textures, especially if one is so bad?” Because a lot of post process effects can get away with not needing a high quality depth buffer. And, more importantly, for effects that need both the depth and normal, you’re getting both in a single, cheap, texture sample. This mattered a lot more several years ago than it does today, but can still be a decent performance increase if you’re doing a lot of depth or normal samples, like when doing SSAO, or edge detection.

mahdiii · Jan 9, 2019

The names are so awful.

hakobkira · May 12, 2019

Hi.
How can I find some documentation about these macros?

a436t4ataf · Dec 21, 2019

Ha! Found another classic @bgolus thread after posting a related question (https://forum.unity.com/threads/what-is-eye-space-in-unity-shaders-nb-its-not-view-space.797775/) - Where you say:

"LinearEyeDepth takes the depth buffer value and converts it into world scaled view space depth."

and Unity docs say:

"LinearEyeDepth(i): given high precision value from depth texture i, returns corresponding eye space depth."

I believe both are incorrect (although yours is close), the key being your earlier statement:

"The depth value stored in the buffer ... Traditionally 1.0 is the far plane, and 0.0 is the near plane" (fits with my experience pre-Unity)

i.e. it's not the eye-space/world-space depth. Your description, that it's world-scaled (arguably: Unity meant to write: "eye-space-scaled"?) is accurate and close, but it's actually:

"world-scaled 'depth from near plane' "

(If I'd not trusted Unity docs, I would have had no problem: I'd have directly sampled a depth texture and known that it's a measure from near to far planes. But Unity's function claimed to be eye-space depth, which I thought was the same as view-space, i.e. origin is at the eye, not at the near-plane?)

bgolus · Dec 21, 2019

a436t4ataf said: ↑

"world-scaled 'depth from near plane' "
Click to expand...

So, the near and far planes are defined in view space (aka eye space). A near value of 0.1 is 0.1 world scale units from the camera. So the extremes (0.0 and 1.0) of the depth buffer should remap to that near and far values.

Let’s look at the code for that function.

Code (CSharp):

// Z buffer to linear depth

inline float LinearEyeDepth( float z )

{

return 1.0 / (_ZBufferParams.z * z + _ZBufferParams.w);

}

And here’s what those _ZBufferParams mean:

Code (CSharp):

// Values used to linearize the Z buffer (http://www.humus.name/temp/Linearize%20depth.txt)

// x = 1-far/near

// y = far/near

// z = x/far

// w = y/far

// or in case of a reversed depth buffer (UNITY_REVERSED_Z is 1)

// x = -1+far/near

// y = 1

// z = x/far

// w = 1/far

float4 _ZBufferParams;

So, let’s do some simple math to test it. Let’s have a hypothetical camera with a near of 0.1 and far of 100, and let’s just apply the above calculations to the 0.0 far plane and 1.0 near plane with a reversed Z.

z = 0.0 is easy enough, it basically turns into 1 / (1/far), which is equal to the far plane, so that works out.

But let’s calculate the _ZBufferParams.z and w so we can do a z of 1.0.

_ZBufferParams.z = (-1+far/near)/far
far/near = 100/0.1 = 1000
-1 + 1000 = 999
999 / 100 = 9.99
So _ZBufferParams.z = 9.99

And _ZBufferParams.w = 1/100 = 0.01

When is z = 1.0:
1 / (_ZBufferParams.z * z + _ZBufferParams.w)
1 / (9.99 * 1.0 + 0.01) = 1 / 10 = 0.1

And we’re left with the near clip plane depth, which is 0.1 units from the camera. If it was “depth from near plane” that should have been 0.0, but it’s not, it’s exactly the same value as the input near plane. Thus it is not “world scaled depth from the near plane”, it is “world scaled view depth from the camera” as I and the documentation explained it as. It should also be noted that the view matrix only ever has rotation and translation, never scale (unless explicitly overridden). So view, eye, and world scale are always the same.

edit: One caveat is I think there’s an issue with OpenGL on mobile devices where they are going from near to far in almost world scale units, and 0.0 in the depth buffer is decoded as 0.0 in the eye space depth. There used to be some comments about this in a few places in the code, but I can’t find it right now. Not sure if they fixed this, or I’m just not finding the comment.

a436t4ataf · Dec 21, 2019

Thanks - so I'd been using the right intepretation originally.

And it was just coincidence that I found my calculations were incorrect by EXACTLY the same distance as the camera near-plane (this appears to have been one of those "one in a million" chances).

After a couple more hours of intense debugging, I found:

In 2019.x, the new scene-camera controls (drop down next to the Gizmos drop down in titlebar of the SceneView window) silently set the clip-planes to some VERY weird values, unless you disable the auto-clip-distance and set them to reasonable ones (so ... doing that is a good idea forever, in all projects)

I had one place in the code where instead of taking: "length( camera-to-point-on-object )", which would be xyz coords, I was passing it the homogeneous coords for that point, i.e. xyzw. This makes almost no difference except for small values of xyz, where it starts to make a huge difference due to the extra +1 from the w coord

<-- this is the source of my out-by-0.3 error. Coincidentally, my camera was at a position where the +1 from w added +0.3 to the calculated length.

TL;DR: I was wrong, and I've managed to show that even in my project where it seemed that LinearEyeDepth returned something different, it does in fact do what it says, and what bgolus's original description said. My bad .

J_Kost · Nov 1, 2021

bgolus said: ↑

Let’s start off with the easy question.

Using the screen position is guaranteed to match up, but for post process effects (using Blit() or similar) the UVs of the quad / tri being drawn are often setup to match saving some math.

Now on to the rest of it.

Some basics. The depth value stored in the buffer buffer when rendering with a perspective projection matrix is a non-linear value between 0.0 and 1.0. Traditionally 1.0 is the far plane, and 0.0 is the near plane, but Unity uses a reversed Z so that’s reversed. That’s another conversation. However understand that a depth buffer value of 0.5 is not half way between the near and far plane, but actually quite close to the camera, even with a fairly distant far plane.

The _CameraDepthTexture stores the depth value exactly as it would be in the depth buffer as a 32 bit float texture. The SAMPLE_DEPTH_TEXTURE macro is just a simple tex2D call on most platforms, but you only ever want a single channel from the texture read, and sometimes special handling of floating point textures is required, so it’s a macro.

Converting this floating point value from that non-linear 1.0 to 0.0 range to a linear range is what the two most commonly seen functions do. LinearEyeDepth and Linear01Depth. The DECODE_EYEDEPTH macro just calls that first function.

LinearEyeDepth takes the depth buffer value and converts it into world scaled view space depth. The original depth texture 0.0 will become the far plane distance value, and 1.0 will be the near clip plane. So now with the value you get from the linear eye depth function, 1 is a surface that is 1 unit from the camera’s pivot along the camera’s z axis. A value of 100 is 100 units, 200 is 200 units, you get the idea.

Linear01Depth mostly just makes the non-linear 1.0 to 0.0 range be a linear 0.0 to 1.0, so 0.5 really is half way between the camera and far plane.

The _CameraDepthNormalsTexture is a different beast. This actually has nothing to do with the depth buffer at all. This is storing the linear view depth and view normals in a single RGBA32 texture. The normals are stored in view space using stereographic projection. Basically the x and y values are stretched a bit so it can represent values facing away from the camera with out needing a third component. This is because with perspective projections, you can sometimes see view space normals that aren’t facing along the camera’s forward axis. It also means that the normals are a slightly lower quality. The depth is even more curious. It is a view space linear depth with a range of 0.0 to 1.0 encoded as a 16 bit float stored in two channels of the RGBA32. This means 0.5 is half way between the camera and the far plane, just like Linear01Depth gets you when using the _CameraDepthTexture, but it means the quality of this depth is very, very poor.

That’s where DecodeDepthNormal and DecodeFloatRG come in. It takes that 8 bit per channel RGBA texture and extracts the 16 bit float depth and view space normal (with a reconstructed Z).

So, you might ask, “why are there two depth textures, especially if one is so bad?” Because a lot of post process effects can get away with not needing a high quality depth buffer. And, more importantly, for effects that need both the depth and normal, you’re getting both in a single, cheap, texture sample. This mattered a lot more several years ago than it does today, but can still be a decent performance increase if you’re doing a lot of depth or normal samples, like when doing SSAO, or edge detection.
Click to expand...

This post is still extremely helpful, thanks! Just for the sake of clarification: Is "the camera’s pivot" synonymous with the (camera/view/)eye space origin or do you mean something different?

bgolus · Nov 1, 2021

J_Kost said: ↑

This post is still extremely helpful, thanks! Just for the sake of clarification: Is "the camera’s pivot" synonymous with the (camera/view/)eye space origin or do you mean something different?
Click to expand...

Yes, the "camera's pivot" and the view space origin are the same position, mainly because the view space origin is defined by the camera's pivot.

J_Kost · Nov 1, 2021

bgolus said: ↑

Yes, the "camera's pivot" and the view space origin are the same position, mainly because the view space origin is defined by the camera's pivot.
Click to expand...

Thanks!
I got a little confused because I'm used to referring to that as the aperture (in the sense of a pinhole). Before asking in the first place, I googled the definition of a camera's pivot in the context of computer vision, but mainly got results pertaining to stereo camera setups or pivoting camera mounts of various descriptions. Could you point me to a source where I can familiarize myself with the lingo?

bgolus · Nov 1, 2021

This isn't really a lingo thing, this is a how-Unity-defines-view-space thing. When I say "camera pivot" I'm referring to the pivot of the game object the Camera component is on in the Unity scene. Really it's all arbitrary and there's no reason the "camera pivot" has to be the view space origin, though it generally is for reasons of sanity. Technically you can even override the view space matrix to be anything you want from c#. And when doing stereoscopic rendering the view space is offset from the game object's pivot to be wherever the eyes would be.

When it comes to the various transform names, one of the reasons why even with Unity's own code they call the same transform space "camera", "view", and "eye" is because there is no one conical set of names for the various transforms in computer graphics. If someone claims that one specific name is the "correct" one, they're wrong because there's now 40 years of people using all of them interchangeably. To add to the confusion there's sometimes a difference between the name of the space and the name of the transform matrix for converting to that space. For example:

Spaces:

"Model space" - Also sometimes called local or object space.

"World space" - Also sometimes called scene or game space, or omitted altogether. Gets extra confusing when there are multiple versions of "world space", like the HDRP which has "absolute world space" and "world space" and they're not the same thing. The later is more accurately the "camera relative world oriented space", which is a mouthful.

"View space" - Also sometimes called camera or eye space. This is also confusing because in some uses Unity defines view and camera space as different things, where view space is -Z forward and camera space is +Z forward, but other times they use the term "camera" when they really mean "view".

"Clip space" - Also more correctly called homogeneous clip space, or less accurately projection space, or incorrectly called screen space.

"Normalized Device Coordinate space" - Usually just called NDC space. This is probably the only one that's ever 100% consistent. It's also a space most people don't really think about because it's usually hidden by the GPU.

"Viewport Space" - Sometimes called screen or normalized window or just "window" space. Not to be confused with view space. Also sometimes referred to as "screen space UVs" as that's where they show up the most often. 0.0 to 1.0 range for x and y for what's visible on screen.

"Viewport Space" - Sometimes called screen, pixel or window space. Not to be confused with ... wait ... Yes, I did just type the same name twice. No, that was not a mistake. I did this to highlight how inconsistent the terminology is. This is the on screen pixel coordinate space, which is totally different than the normalized window space.* (* Debatable...)

Transforms:

"Model matrix" - Transforms from local to world space. Also sometimes called the object, or world, or object to world matrix.

"View matrix" - Transforms from world to view space. Also sometimes called the camera matrix, though again be wary of Unity's -Z "view" vs +Z "camera" forward stuff.

"Projection matrix" - Transforms from view to clip space. Also sometimes called the perspective matrix. Unity usually has multiple versions of this matrix depending on if it should be used with the "view" or "camera" matrix as it needs to correctly handle the Z sign. In Unity's shaders it tends to call the projection matrix that handles +Z the "camera projection" matrix.

There are then combinations of all of these transform matrices, like the MV, VP, and MVP matrices. These let you go from one space to another skipping any in between which can be faster and reduce numerical inaccuracies ... sometimes. And then there are the inverse of these matrices for going "back" to another space. Usually there aren't fancy unique names for these. You'll notice there aren't any matrices for converting from clip space to the NDC or "viewport" spaces, and that's because while these are unique spaces the conversions don't require a matrix multiply. HClip to NDC for example is a divide. NDC to either viewport space (normalized window or pixel space) is a scale and offset.

And then there's the fun of the depth, which is what this whole thread started talking about, and which I haven't even discussed. OpenGL, and only OpenGL, uses a -1.0 to 1.0 range for the NDC z, which gets converted to a 0.0 to 1.0 range with a scale and offset when converted to either viewport space and is what appears as the final depth value. All other APIs use 0.0 to 1.0 (or really 1.0 to 0.0) for the NDC z, and there's no conversion between that and the final depth value.

J_Kost · Nov 1, 2021

Wow, that's hugely helpful. It would have saved me the best part of today's research in one comment!
I have a few unanswered questions that tie right into this.

I need the world space coordinates for each pixel in a RenderTexture. I think I'm almost there, but a few things still confuse me.
"Viewport/Screen Space" and how to Sample the _CameraDepthTexture
There seem to be two equivalent (are they?) ways of sampling depth in the fragment shader assuming a vertex shader like this:
--

Code (CSharp):

struct appdata {

float4 vertex : POSITION;

};

//the data that's used to generate fragments and can be read by the fragment shader

struct v2f {

float4 clipPos : SV_POSITION;

float4 screenPos : TEXCOORD0;

float4 screenPosModified : TEXCOORD1;

};

//the vertex shader

v2f vert(appdata v) {

v2f o;

//convert the vertex positions from object space to clip space so they can be rendered

o.clipPos = UnityObjectToClipPos(v.vertex);

o.screenPos = ComputeScreenPos(o.clipPos);

o.screenPosModified = ComputeScreenPos(o.clipPos);

COMPUTE_EYEDEPTH(o.screenPosModified.z);

return o;

}

Variant A (using tex2D),

Code (CSharp):

float2 screenUV = i.screenPos.xy / i.screenPos.w;

float depth = SAMPLE_DEPTH_TEXTURE(_CameraDepthTexture, screenUV);

and Variant B (using tex2Dproj):

Code (CSharp):

float depth = SAMPLE_DEPTH_TEXTURE_PROJ(_CameraDepthTexture, UNITY_PROJ_COORD(i.screenPosModified))

Which space exactly does ComputeScreenPos convert Clip Space coordinates to? Is it some kind of screen space with an extra w dragged along from clip space because we were not allowed to do the perspective division before the rasterizing step?

What's the deal with COMPUTE_EYEDEPTH(o.screenPosModified.z)? I see this everywhere and assume it's required, but what I don't get is: tex2Dproj wants uv coordinates and a w.(*) What does it care about the value in z? Clearly, I must be fundamentally misunderstanding something here.

(*)from the HLSL docs: "tex2Dproj: Samples a 2D texture using a projective divide; the texture coordinate is divided by t.w before the lookup takes place".
The projection matrix
Is Unity's Camera.projectionMatrix the same as the shader-side UNITY_MATRIX_P?
Sometimes, the term "projection matrix" is used when referring to what would be MVP.
Also: Is there a way to access the MVP matrix from C#? In an extending use-case I would like to be able to calculate real (physical, not unity) world points P using real camera image points P' = (x,y,depth) and a (same-size) TOF image T as:
P = inv_MVP x (P'.x, P'.y, TOF(x,y))
It seems more straight forward to do this in C# than somehow getting all the data passed into a shader to do it there.
Actually computing the world point coordinates for a pixel in the fragment shader
As you mention in your post here, most people use some variation of a normalized ray direction to to calculate the world position corresponding to a pixel. Something along these lines:

Code (CSharp):

v2f vert(appdata v) {

...

o.ray = wPos - _WorldSpaceCameraPos;

return o;

}

fixed4 frag(v2f i) : SV_Target {

...

float sceneZ = LinearEyeDepthDEPTH(SAMPLE_DEPTH_TEXTURE(_CameraDepthTexture, screenUV ));

float3 worldPos = normalize(i.ray) * sceneZ + _WorldSpaceCameraPos;

...

}

You propose doing this instead:

Code (CSharp):

// calculate the view plane vector

// note: Something like normalize(i.camRelativeWorldPos.xyz) is what you'll see other

// examples do, but that is wrong! You need a vector that at a 1 unit view depth, not

// a 1 unit magnitude.

float3 viewPlane = i.camRelativeWorldPos.xyz / dot(i.camRelativeWorldPos.xyz, unity_WorldToCamera._m20_m21_m22);

// calculate the world position

// multiply the view plane by the linear depth to get the camera relative world space position

// add the world space camera position to get the world space position from the depth texture

float3 worldPos = viewPlane * sceneZ + _WorldSpaceCameraPos;

worldPos = mul(unity_CameraToWorld, float4(worldPos, 1.0));

There are a few things about this that I don't quite understand yet:

A. Why multiply worldPos with unity_CameraToWorld in the last line again?
B. What is "unity_WorldToCamera._m20_m21_m22" ? I read this as the first through third element of the third row of this matrix, but what is contained there?
C. What is "viewPlane". Could you expand on "You need a vector that at a 1 unit view depth, not a 1 unit magnitude." Why is a normalized ray direction in world coordinates not correct?

bgolus · Nov 1, 2021

J_Kost said: ↑

There seem to be two equivalent (are they?) ways of sampling depth in the fragment shader assuming a vertex shader like this:
Click to expand...

Yes, they are equivalent.
SAMPLE_DEPTH_TEXTURE
calls
tex2D()
to sample the depth texture, and
SAMPLE_DEPTH_TEXTURE_PROJ
calls
tex2Dproj()
which just does that divide by w before itself is just calling
tex2D()
. On very old GPUs
tex2Dproj()
existed "for real" and there was special hardware that handled the divide, or the case of some no longer supported hardware did a divide by z instead (because the
tex2Dproj()
function took a
float3
instead of a
float4
for the UV). These days they're equivalent and you can use either option. They'll compile to identical shaders in the end.

J_Kost said: ↑

Which space exactly does ComputeScreenPos convert Clip Space coordinates to?
Click to expand...

Homogeneous screen space UVs.

In homogeneous clip space, the xy values have a -w to +w range for what will appear on screen, where w is literally the w of the
float4
clip space value. NDC is clip space divided by the w, so the on screen values are now between -1.0 and +1.0. That
ComputeScreenPos
adjusts the xy values so that they're in a 0.0 to +w range so that after the divide they'll be in a 0.0 to 1.0 range.

You can try reading up on homogeneous coordinates if you want to try to make sense of why, but the short version for computer graphics is the xyz is multiply by w to handle perspective correct interpolation for things like UV mapping on GPUs. But we don't want perspective correction for screen space UVs, so passing the value is a homogeneous coordinate lets you "undo" the perspective correction. On modern GPUs you could also tell the shader to not perspective correct certain values, but Unity's shader code was written to work on old GPUs that doesn't support doing that.

J_Kost said: ↑

Is Unity's Camera.projectionMatrix the same as the shader-side UNITY_MATRIX_P?
Click to expand...

Nope.* That is the
unity_CameraProjection
matrix though. See the documentation on that variable:
https://docs.unity3d.com/ScriptReference/Camera-projectionMatrix.html

Note that projection matrix passed to shaders can be modified depending on platform and other state. If you need to calculate projection matrix for shader use from camera's projection, use GL.GetGPUProjectionMatrix.
Click to expand...

Using
GL.GetCPUProjectionMatrix(camera.projectionMatrix)
gets you the
UNITY_MATRIX_P
.

* The one caveat is the c# projection matrix is the OpenGL form of the projection matrix, so when you're using OpenGL, they are the same, because
GL.GetCPUProjectionMatrix(camera.projectionMatrix)
doesn't modify the projection matrix in that case.

J_Kost said: ↑

A. Why multiply worldPos with unity_CameraToWorld in the last line again?
Click to expand...

Because I f****d up and posted a version of the shader with an extra line that shouldn't be there? I was probably hacking something, modified that shader to check something, and forgot to delete the line afterwards before I pasted it there. I wonder how many other places I've posted that wrong ...

J_Kost said: ↑

B. What is "unity_WorldToCamera._m20_m21_m22" ? I read this as the first through third element of the third row of this matrix, but what is contained there?
Click to expand...

That's the forward vector. I'm using the camera matrix rather than the view matrix to not have to deal with the -Z, but you could also use
-UNITY_MATRIX_V._m20_m21_m22
.

J_Kost said: ↑

C. What is "viewPlane". Could you expand on "You need a vector that at a 1 unit view depth, not a 1 unit magnitude." Why is a normalized ray direction in world coordinates not correct?
Click to expand...

Because depth isn't distance. You already liked a post were I showed this image, but I'll post it here again for others.

The above image is the cross section of the camera's view frustum, and the dot is some position in front of the camera. There's a line going from the camera to the point labelled Distance, and the curved line is that distance swept across the frustum. Any point on that curved line has the same distance from the camera. The line labelled Z Depth is the depth, which goes to a plane parallel to the near plane, aka the view plane. Any point on that vertical line as the same depth.

J_Kost · Nov 2, 2021

I think I got most of it now. Thanks a bunch. I failed to put your diagram and your shader code together in my head, for which I feel kind of stupid in hindsight. I dissected the whole thing here for anyone who might be interested:

Accordingly, the shader code should look something like the example below. I noticed that you never normalized the ray (or view direction) vector in your code example. Was this also a mistake or are my calculations wrong? Also, is the forward vector (unity_WorldToCamera._m20_m21_m22) already unit length or does it need normalizing too?

Code (CSharp):

CGPROGRAM

#pragma vertex vert

#pragma fragment frag

#include "UnityCG.cginc"

struct appdata

{

float4 vertex : POSITION;

};

struct v2f

{

float4 pos : SV_POSITION;

float4 projPos : TEXCOORD0;

float3 camRelativeWorldPos : TEXCOORD1;

};

UNITY_DECLARE_DEPTH_TEXTURE(_CameraDepthTexture);

v2f vert (appdata v)

{

v2f o;

o.pos = UnityObjectToClipPos(v.vertex);

o.projPos = ComputeScreenPos(o.pos);

//view direction vector -> not normalized

o.camRelativeWorldPos = mul(unity_ObjectToWorld, float4(v.vertex.xyz, 1.0)).xyz - _WorldSpaceCameraPos;

return o;

}

float4 frag (v2f i) : SV_Target

{

float2 screenUV = i.projPos.xy / i.projPos.w;

float depth = SAMPLE_DEPTH_TEXTURE(_CameraDepthTexture, screenUV);

float sceneZ = LinearEyeDepth(depth);

float3 rayNorm = normalize(i.camRelativeWorldPos.xyz);

//unity_WorldToCamera._m20_m21_m22 is z_cam

float3 rayUnitDepth = rayNorm / dot(rayNorm, unity_WorldToCamera._m20_m21_m22);

float3 worldPos = rayUnitDepth * sceneZ + _WorldSpaceCameraPos;

//pad for rendering value directly into ARGBFloat RenderTexture

float4 worldPosPaddedXYZA32 = float4(worldPos, 1.0);

return worldPosPaddedXYZA32;

}

ENDCG

bgolus · Nov 2, 2021

J_Kost said: ↑

I noticed that you never normalized the ray (or view direction) vector in your code example. Was this also a mistake or are my calculations wrong? Also, is the forward vector (unity_WorldToCamera._m20_m21_m22) already unit length or does it need normalizing too?
Click to expand...

The view forward vector doesn't need to be normalized because it should already be unit length. The
unity_WorldToCamera
and
UNITY_MATRIX_V
transform matrices never have any scale, so the axis vectors should already be normalized values. The only time they might not be unit length is if something is manually overriding the view transform, and at that point all bets are off as it could be literally anything, but that's going to be very unusual.

I named the "view direction" vector in the vertex shader the
camRelativeWorldPos
because it explicitly should not be normalized when output by the vertex shader and interpolated for the fragment shader. A normalized vector will not be accurately interpolated and you'll see the world position start to "swim" if you normalize that vector.

As for the ray direction in the fragment shader, it doesn't need to be normalized. We don't ever want or need a normalized ray, we just need a ray that has a depth of 1. The divide by the dot product gives us that regardless of the ray's length. Again we're not working with a distance, we're working with depth. A dot product of an arbitrary vector and a normalized vector gives you the "depth" of the arbitrary vector along the normalized vector. The math makes a little more sense if we do stuff in camera space.

Code (csharp):

float3 worldSpacePos = // the world position of this object's surface

// transform the world position into view space

float3 viewSpacePos = mul(UNITY_MATRIX_V, float4(worldSpacePos, 1.0)).xyz;

// viewSpacePos is the unnormalized view direction

// but we want a vector that has a "depth" of 1 in the view space forward z (which is -1)

// so we divide by abs(z)

float3 viewSpaceUnitDepthViewPlane = viewSpacePos.xyz / abs(viewSpacePos.z);

// multiply that "unit plane" by the linear depth

float3 viewSpacePosFromDepth = viewSpaceUnitDepthViewPlane * LinearEyeDepth(rawDepth);

// transform the camera space position back to world space with the inverse view matrix

float3 worldSpacePosFromDepth = mul(UNITY_MATRIX_I_V, float4(viewSpacePosFromDepth, 1.0)).xyz;

The code you have w/ the dot product is doing the same thing as the above, but it's avoiding two matrix multiplies by doing everything in world space.

J_Kost · Nov 3, 2021

I guess I was just not flexible enough in my way of thinking about this. Multiplying a unit length "ray" vector by the correct depth-equivalent distance (see "d" in my image) is the same as multiplying a "unit depth" ray vector with the LinearEyeDepth. Two ways of looking at the same thing. So yes, the vector r indeed does not need to be normalized. It is a quirk of the special case at hand though, and the two vectors in the final equation have to be the same. Either both normalized or both not normalized; mixing would throw off the scale.

This is all still pretty new to me so please excuse my short-sightedness.

atomicjoe · Jan 12, 2022

So here I am again fighting with different depth formats...
I want to convert the depth from the _CameraDepthNormalsTexture to the same format as the depth in _CameraDepthTexture
Is there any formula to do so?

Apparently the DepthNormalTexture depth is already converted using Linear01Depth (once decoded using DecodeFloatRG), so I would need to invert that operation:

Code (CSharp):

// Z buffer to linear 0..1 depth

inline float Linear01Depth( float z )

{

return 1.0 / (_ZBufferParams.x * z + _ZBufferParams.y);

}

Sadly I'm half retarded and can't manage to do it.
Could someone help me?
I summon the mighty @bgolus !

atomicjoe · Jan 12, 2022

Ok, after struggling for hours with this and posting a desperate call for help, I once again found the solution mere minutes after posting this...
Yeah, I'm really that retarded...

So, for future references, the formula to convert the depth from _CameraDepthNormalsTexture to the format in _CameraDepthTexture is this:

First, we decode the depth from 2 channels:

Code (CSharp):

float camDepth = dot(tex2D(_CameraDepthNormalsTexture, uv).zw, float2(1.0, 1 / 255.0));

And then we apply the inverse of the Linear01Depth method:

Code (CSharp):

camDepth= ((1.0 / depth) - _ZBufferParams.y) / _ZBufferParams.x;

That's it.
Now the depth from _CameraDepthNormalsTexture is in the same format as the one on _CameraDepthTexture.
Very handy for image effects that could use both textures with the same code.

Also, just for reference, to convert the depth form _CameraDepthTexture to the format of _CameraDepthNormalsTexture, you just have to sample the depth texture and then apply the Linear01Depth method:

Code (CSharp):

float depth = SAMPLE_DEPTH_TEXTURE(_CameraDepthTexture, uv); // sample from depth texture

depth = Linear01Depth (depth);

AlejMC · Apr 20, 2022

atomicjoe said: ↑

Ok, after struggling for hours with this and posting a desperate call for help, I once again found the solution mere minutes after posting this...
Yeah, I'm really that retarded...

So, for future references, the formula to convert the depth from _CameraDepthNormalsTexture to the format in _CameraDepthTexture is this:

First, we decode the depth from 2 channels:

Code (CSharp):

float camDepth = dot(tex2D(_CameraDepthNormalsTexture, uv).zw, float2(1.0, 1 / 255.0));

And then we apply the inverse of the Linear01Depth method:

Code (CSharp):

camDepth= ((1.0 / depth) - _ZBufferParams.y) / _ZBufferParams.x;

That's it.
Now the depth from _CameraDepthNormalsTexture is in the same format as the one on _CameraDepthTexture.
Very handy for image effects that could use both textures with the same code.

Also, just for reference, to convert the depth form _CameraDepthTexture to the format of _CameraDepthNormalsTexture, you just have to sample the depth texture and then apply the Linear01Depth method:

Code (CSharp):

float depth = SAMPLE_DEPTH_TEXTURE(_CameraDepthTexture, uv); // sample from depth texture

depth = Linear01Depth (depth);

Click to expand...

Hi!
Just wanted to mention that this helped me a lot regarding an idea I wanted to try about getting the _CameraDepthTexture, average ViewSpace positions on pixels around and put it back in there.

The equivalent for linear eye depth back to _CameraDepthTexture values would be:

Code (CSharp):

inline float UnlinearizeDepth(float linearEye)

{

return ((1.0f / linearEye) - _ZBufferParams.w) / _ZBufferParams.z;

}

If anybody has any ideas:
I couldn't for any reason be able to modify the contents on the _CameraDepthTexture itself, I could set another render target, calculate the average positions, set back the BuiltinRenderTextureType.Depth (CamDepth is this one) and try to update back to it and it wouldn't budge.

For it to work I had to create a completely new RenderTexture and set as global texture "_CameraDepthTexture" to that new render target for the rest of the rendering to use that one.

backwheelbates · Feb 23, 2023

In case it helps anyone, after digging into the cgincludes and other hlsl files, here are a couple of helper functions for shadergraph that can turn the depth texture into something useful.

Code (CSharp):

#ifndef LINEAREYEDEPTH_INCLUDED

#define LINEAREYEDEPTH_INCLUDED

// depth to linear 0-1

void Linear01Depth_float(float InDepth, float NearClip, float FarClip, out float OutDepth){

float x, y, z, w;

x = (float)((FarClip-NearClip)/NearClip);

y = 1.0f;

z = (float)(FarClip-NearClip)/(NearClip*FarClip);

w = (float)(1.0f / FarClip);

OutDepth = 1.0 / (x * InDepth + y); //

}

// depth to linear distances

void LinearEyeDepth_float(float InDepth, float NearClip, float FarClip, out float OutDepth){

float x, y, z, w;

x = (float)((FarClip-NearClip)/NearClip);

y = 1.0f;

z = (float)(FarClip-NearClip)/(NearClip*FarClip);

w = (float)(1.0f / FarClip);

OutDepth = 1.0 / (z * InDepth + w); //

}

#endif

One tip, this is based on clipping planes. By default the scene view has separate ones from your render cam, so in my case I just set the scene view clipping planes to be the same as my rendercam, and it all looks the same now.

Search Unity

Unity ID

Useful Searches

DecodeDepthNormal/Linear01Depth/LinearEyeDepth explanations