Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Question World position from depth image

Discussion in 'General Graphics' started by Franzi_Good, Sep 10, 2022.

  1. Franzi_Good

    Franzi_Good

    Joined:
    May 27, 2022
    Posts:
    13
    Hello!

    This is my first post and I am very new to unity and computer graphics in general.
    So please do not be too harsh. :D

    I am trying to reconstruct the world position from a depth image.

    upload_2022-9-10_12-49-25.png upload_2022-9-10_12-50-6.png

    My shader:

    Code (CSharp):
    1. // Upgrade NOTE: replaced 'mul(UNITY_MATRIX_MVP,*)' with 'UnityObjectToClipPos(*)'
    2.  
    3. Shader "Hidden/Depth"{
    4.  
    5.     Properties
    6.      {
    7.          _MainTex ("Base (RGB)", 2D) = "white" {}
    8.          _DepthLevel ("Depth Level", Range(1, 3)) = 1
    9.      }
    10.      SubShader
    11.      {
    12.          Pass
    13.          {
    14.              CGPROGRAM
    15.              #pragma vertex vert
    16.              #pragma fragment frag
    17.              #include "UnityCG.cginc"
    18.  
    19.              uniform sampler2D _MainTex;
    20.              uniform sampler2D _CameraDepthTexture;
    21.              uniform fixed _DepthLevel;
    22.              uniform half4 _MainTex_TexelSize;
    23.              struct input
    24.              {
    25.                  float4 pos : POSITION;
    26.                  half2 uv : TEXCOORD0;
    27.              };
    28.              struct output
    29.              {
    30.                  float4 pos : SV_POSITION;
    31.                  half2 uv : TEXCOORD0;
    32.              };
    33.              output vert(input i)
    34.              {
    35.                  output o;
    36.                  o.pos = UnityObjectToClipPos(i.pos);
    37.                  o.uv = MultiplyUV(UNITY_MATRIX_TEXTURE0, i.uv);
    38.                  // Flip the image
    39.                  #if UNITY_UV_STARTS_AT_TOP
    40.                  if (_MainTex_TexelSize.y < 0)
    41.                          o.uv.y = 1 - o.uv.y;
    42.                  #endif
    43.                  return o;
    44.              }
    45.  
    46.              fixed4 frag(output o) : COLOR
    47.              {
    48.                  float depth = UNITY_SAMPLE_DEPTH(tex2D(_CameraDepthTexture, o.uv));
    49.                  depth = pow(Linear01Depth(depth), _DepthLevel);
    50.                  return depth;
    51.              }
    52.              ENDCG
    53.          }
    54.      }
    55. }
    I am not trying to shoot rays through the pixels, because I want to learn more
    about camera parameters and matrices (or at least, I do not think I want to).

    What I actually want to do is use the camera parameters and the depth I get from
    the image. An idea, to easily get the parameters I need, would be to try to use
    the view and inverse projection matrix. I do not even know, if I need the view matrix.

    Can I calculate an imaginary line with these parameters (start point, end point from the depth
    value and angle from one of the matrices) to inverse calculate the world position from
    screen/image space (my depth image)?
    That would probably be my ultimate goal.

    Is it bad, that my camera has a world position with x = 5 and not 0?
    I just realized that while writing this.

    What i also tried, was multiplying the Vector from the pixel coordinate
    from the image/view space and the z from my depth values with the inverse
    projection matrix.

    I tried to reverse the perspective divide process, which, from what I am understanding,
    takes part after the projection from world to image/world space. Then I multiplied that
    vector with the inverse projection matrix of the camera and added my depth value.

    I know, that we use the w component of the Vec4 to keep the original z value
    of the vertex for further operations with textures and stuff.

    The "zValue" I am using here tho is the ??linear depth?? from my image
    (0 to 1, from the RGB value(s) of the pixel(s)).
    So i multiplied that with my far clip plane as shown below to get back
    the actual distance in world space (w).
    Do i even need to do this, or can I also just use the
    projection matrix somehow for this?

    However, the x and y coordinates are way off after that. Only the z value is legit
    (since I am reading it from my depth image).


    Code (CSharp):
    1. Vector3 ScreenToWorld(Vector3 screenPos, float zValue)
    2.     {
    3.         Matrix4x4 matInv = _camera.projectionMatrix.inverse;
    4.         screenPos.x *= zValue * _camera.farClipPlane; // Trying to reverse the perspective divide here
    5.         screenPos.y *= zValue * _camera.farClipPlane;
    6.         Vector4 WorldPos = matInv * screenPos;
    7.         WorldPos.z = zValue * _camera.farClipPlane;
    8.  
    9.         return WorldPos;
    10.     }
    This is all done with a camera in unity and a depth image generated from scene with a shader,
    because I can not get my hand on a let's say Kinect for example, nor do I want to yet.
    But this would be the project after this one to then calculate from actual camera parameters.

    Could I also use the "real" camera parameters from the camera in unity? From what I know,
    it would be more handy to use matrices for now.

    Just to make it clear, I do not want to use anything in the scene itself.
    Only code to do the image processing to get the world position.

    I do not want to use camera.ScreenToWorldPoint to shoot a ray through
    the pixel and get the point at the depth value to get the world position.
    I already did this and it works.

    Maybe I want to write something similar to it tho with the help of the
    camera parameters/matrices.


    Help with both solutions would be very appreciated!

    It would also be nice, if someone could also help me with understanding the theory behind
    all of this a little better (maybe even for both solutions).

    Thank you so much! <3
     
    Last edited: Sep 10, 2022
  2. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    560
    Do you want to do this in a shader or in C# code?

    This is how it works in a shader (writing it from memory, could be buggy):
    Code (JavaScript):
    1. sampler2D _CameraDepthTexture;
    2. float4x4 _ScreenToWorldSpaceMatrix;
    3.  
    4. float4 frag(float4 pos: SV_Position) : SV_Target
    5. {
    6.     float depth = _CameraDepthTexture[int3(pos.x, pos.y, 0)].r;
    7.  
    8.     float4 screenPosition = float4(pos.x, pos.y, depth, 1.0);
    9.     float4 worldPosition = mul(_ScreenToWorldSpaceMatrix, screenPosition);
    10.     worldPosition /= worldPosition.w;
    11.  
    12.     return worldPosition;
    13. }
    14.  
    15. where
    16.  
    17. Matrix4x4 ScreenToWorldSpaceMatrix =
    18.     camera.cameraToWorldMatrix *
    19.     GL.GetGPUProjectionMatrix(camera.projectionMatrix, camera.targetTexture != null).inverse *
    20.     Matrix4x4.Translate(new Vector3(-1.0f, -1.0f, 0.0f)) *
    21.     Matrix4x4.Scale(new Vector3(2.0f / camera.pixelWidth, -2.0f / camera.pixelHeight, 1.0f));  // not sure about the minus
    22.    
     
  3. Franzi_Good

    Franzi_Good

    Joined:
    May 27, 2022
    Posts:
    13
    @c0d3_m0nk3y

    Thank you so much for your reply!

    I think, I actually want it as a C# script.
    But seeing it in a shader will probably also help me to understand
    how this stuff works.

    Would you be so kind and show me, how this would look like in a Script?
    I think I need the values from a script because I want to let a virtual robot
    with inverse cinematics point at the coordinates later on.
    I also do not want to have to write the coordinates into the fragments as RGB.

    I want to try to get all the coordinates of say a plane object representing a person
    and then find the average of them to point at the middle of it, also when it is moving.

    The last part I think I can do on my own. I just need the world positions.

    I also have the depth values already in Script from the depth image if
    that makes it easier and I think, I want to use these values for z so there
    would be no need to get them again. But if possible, it would be also interesting
    to see. I just want to take them from my image, cause that is pretty much what
    I am trying to to achieve with it in the future.


    Thank you! :)
     
    Last edited: Sep 10, 2022
  4. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    560
    Well, theoretically in C# it's the same

    Code (CSharp):
    1. Vector3 ScreenToWorld(Vector2 screenPos, float zValue)
    2. {
    3.     Vector4 worldPosition = ScreenToWorldSpaceMatrix * new Vector4(screenPos.x, screenPos.y, zValue, 1.0);
    4.     return worldPosition / worldPosition.w;
    5. }
    6.  
    using the ScreenToWorldSpaceMatrix from above.

    However, you'd have to read the depth buffer pixels to get them on the CPU which is slow. The CPU can be up to 3 frames ahead of the GPU so reading the pixels back would stall the CPU and you'd lose all parallelism between CPU and GPU. Also, you'd have the depth value of the last frame but you probably need the value for the current frame - which doesn't exist yet because the frame hasn't been rendered yet.

    On the CPU, using raycasts is actually the way to go.
     
  5. Franzi_Good

    Franzi_Good

    Joined:
    May 27, 2022
    Posts:
    13
    @c0d3_m0nk3y

    Thank you!

    Would there be a way for me to get the world position from the shader into a script tho?
     
  6. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    560
    Sorry, don't understand what you mean. Can you elaborate?

    Do you mean calculating the world position in a shader, storing it in a render target and then reading the render target on the CPU?

    It's the same problem, you'd either stall the CPU (to get 1 frame old data) or get 3 frames old data with async readback.
     
    Last edited: Sep 11, 2022
  7. Franzi_Good

    Franzi_Good

    Joined:
    May 27, 2022
    Posts:
    13
    1. float4 frag(float4 pos: SV_Position) : SV_Target
    2. {
    3. float depth = _CameraDepthTexture[int3(pos.x, pos.y, 0)].r;

    4. float4 screenPosition = float4(pos.x, pos.y, depth, 1.0);
    5. float4 worldPosition = mul(_ScreenToWorldSpaceMatrix, screenPosition);
    6. worldPosition /= worldPosition.w;

    7. return worldPosition;
    8. }
    I can't get this return value from the shader into a script, right?

    I also tried the script version.
    Once written by me from the stuff you wrote in the shader and the one you just posted.

    But it gives me very weird numbers.

    I could probably use camera.ScreenToWorldPoint(). I tried it and it seems to work.
    But then I would understand pretty much nothing of what is actually going on.
    Too bad unity is not open source.

    Basically I would need a self written ScreenToWorldPoint() and add my depth afterwards.
    Or go with the shader, if possible. But if the result of the multiplication of the matrices is already weird,
    then i don't know where to start with.
     
  8. Franzi_Good

    Franzi_Good

    Joined:
    May 27, 2022
    Posts:
    13
    Now ScreenToWorldPoint also gives me some weird numbers. I really don't know, where to start anymore. :D
     
  9. Franzi_Good

    Franzi_Good

    Joined:
    May 27, 2022
    Posts:
    13
    The thing is, I can not shoot rays into the scene, 'cause I technically don't have a scene.
    I could not check for collision.
    I could however probably calculate a ray and get its point at the depth value.
    The thing is tho, I can not just draw a ray through the pixel, because it would calculate
    the angle by itself.

    I need that angle tho. any idea how I would get than from the matrices and/or the
    camera intrinsic and extrinsic parameters?

    I will have to do it with these parameters and matrices of a physical camera later on
    anyway.
     
  10. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    560
    Totally forgot about Camera.ScreenToWorldPoint.

    The difference between Camera.ScreenToWorldPoint and my ScreenToWorld is that Camera.ScreenToWorld takes a world space depth and my version takes NDC space depth.

    As I said, the problem is not the calculation itself, it's where to get the depth value from.
     
  11. Franzi_Good

    Franzi_Good

    Joined:
    May 27, 2022
    Posts:
    13
    Okay, let me show you, what I am doing.

    Code (CSharp):
    1. Vector3 ScreenToWorld(Vector2 screenPos, float zValue)
    2.     {
    3.         Matrix4x4 ScreenToWorldSpaceMatrix =
    4.             _camera.cameraToWorldMatrix *
    5.             GL.GetGPUProjectionMatrix(_camera.projectionMatrix, _camera.targetTexture != null).inverse *
    6.             Matrix4x4.Translate(new Vector3(-1.0f, -1.0f, 0.0f)) *
    7.             Matrix4x4.Scale(new Vector3(2.0f / _camera.pixelWidth, -2.0f / _camera.pixelHeight, 1.0f));  // not sure about the minus
    8.  
    9.         Vector4 worldPosition = ScreenToWorldSpaceMatrix * new Vector4(screenPos.x, screenPos.y, zValue, 1.0f);
    10.         Debug.Log("World position of pixel: " + "x: " + worldPosition.x + "y: " +  worldPosition.y + "z: " + worldPosition.z);
    11.    
    12.         return worldPosition / worldPosition.w;
    13.     }
    And then:

    Code (CSharp):
    1. ScreenToWorld(new Vector2(x, y), _screenShot.GetPixel(x,y).r);
    _screenShot.GetPixel(x,y).r is my NDC depth.

    The result is:
    upload_2022-9-11_16-48-5.png

    upload_2022-9-11_16-48-17.png

    The weirdest thing is, if i move the cube, it still stays around 5 for x.

    Camera has 84x84 resolution at the moment.

    upload_2022-9-11_16-52-17.png
     
    Last edited: Sep 11, 2022
  12. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    560
    How do you take the screenshot? Are your sure .r is NDC depth? Can you log it out?
     
  13. Franzi_Good

    Franzi_Good

    Joined:
    May 27, 2022
    Posts:
    13
    Code (CSharp):
    1. Debug.Log("NDC depth: " + _screenShot.GetPixel(x,y).r);
    2. Debug.Log("World depth: " + _screenShot.GetPixel(x,y).r * _camera.farClipPlane);
    upload_2022-9-11_17-42-12.png
    upload_2022-9-11_17-42-21.png

    Since 14.5 is pretty much off center by 0.5, this should be correct.
    Or do i need the non linearized depth?

    I pretty much just get the depth in the shader and assign it to RGB (shader code from original post).
    Then i put a screenshot on a Texture2D and get the pixel.

    Thank you for helping me out again!

    I am going to get dinner now but will be right back.
     
    Last edited: Sep 11, 2022
  14. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    560
    Unity uses reverse depth by default. Also NDC depth is non-linear so you can't just multiply it with the far plane distance, as you already assumed.

    Call LinearEyeDepth(ndcDepth) in the shader and store it in a floating point texture. (you can also do the calculation on the CPU, but this is easier).

    That should give you view-space depth that you can pass to Camera.ScreenToWorldPoint().

    This will only be correct if neither the camera nor objects are moving because of the lag.

    Also, just to be sure where do you get the NDC depth from in the shader?
     
    Last edited: Sep 11, 2022
  15. Franzi_Good

    Franzi_Good

    Joined:
    May 27, 2022
    Posts:
    13
    1. fixed4 frag(output o) : COLOR
    2. {
    3. float depth = UNITY_SAMPLE_DEPTH(tex2D(_CameraDepthTexture, o.uv));
    4. depth = pow(Linear01Depth(depth), _DepthLevel);
    5. return depth;
    6. }
    From here. :)
     
  16. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    560
    Ok, so you've already linearlized the depth. Make sure that _DepthLevel is 1, otherwise this won't work.

    What's the texture format of the RenderTexture that you are using?

    In which render pass do you render this?
     
  17. Franzi_Good

    Franzi_Good

    Joined:
    May 27, 2022
    Posts:
    13
    I actually do this in a script.

    In Start():
    Code (CSharp):
    1.     void Start()
    2.     {
    3.         _camera = GetComponent<Camera>();
    4.         _rect = new Rect(0, 0, _camera.pixelWidth, _camera.pixelHeight);
    5.         _renderTexture = new RenderTexture(_camera.pixelWidth, _camera.pixelHeight, 24);
    6.         _screenShot = new Texture2D(_camera.pixelWidth, _camera.pixelHeight, TextureFormat.RGBA32, false);
    7.  
    8.         _camera.targetTexture = _renderTexture;
    9.     }

    Then:
    Code (CSharp):
    1.  private void GetValues()
    2.     {
    3.         _camera.Render();
    4.         RenderTexture.active = _renderTexture;
    5.         _screenShot.ReadPixels(_rect, 0, 0);
    6.         _camera.targetTexture = null;
    7.         RenderTexture.active = null;
    8.  
    9.         Boolean pixelsFound = false;
    10.         _pixelVals = new List<Vector3>();
    11.  
    12.         for (int y = _camera.pixelHeight; y >= 0; y--)
    13.         {
    14.             for (int x = _camera.pixelWidth; x >= 0; x--)
    15.             {
    16.                 if (_screenShot.GetPixel(x, y) != Color.white)
    17.                 {
    18.                     Debug.Log("pixel found: " + x  + "   " + y);
    19.                     Vector3 worldPoint = ScreenToWorld(new Vector3(x, y, _screenShot.GetPixel(x,y).r));
    20. ...
    21.  
    22.         Destroy(_renderTexture);
    23.     }
     
    Last edited: Sep 11, 2022
  18. Franzi_Good

    Franzi_Good

    Joined:
    May 27, 2022
    Posts:
    13
    Code (CSharp):
    1. Vector3 worldPoint = _camera.ScreenToWorldPoint(new Vector3(x, y, _screenShot.GetPixel(x,y).r * _camera.farClipPlane));
    This seems to work.
    upload_2022-9-11_19-57-52.png
    This would be a fine outcome.

    Well... or it sometimes works and sometimes:
    upload_2022-9-11_19-58-23.png
    :D

    The problem still is tho, i have no idea what it is doing. :D
     
  19. Franzi_Good

    Franzi_Good

    Joined:
    May 27, 2022
    Posts:
    13
    Would be nice,
    if someone could write me the ScreenToWorldPoint function so i could see what it is doing. :D

    Also what is happening here? :D :
    upload_2022-9-11_20-3-45.png
    upload_2022-9-11_20-3-53.png

    Ohhhh....:
    upload_2022-9-11_20-4-35.png
     
  20. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    560
    I strongly recommend using a floating point texture format for both the render texture and the screenshot. 8 bit is not enough.
    RenderTextureFormat.RFloat
    TextureFormat.RFloat
     
  21. Franzi_Good

    Franzi_Good

    Joined:
    May 27, 2022
    Posts:
    13