Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question on Unity3D's projection matrix and 3D texture

Discussion in 'Shaders' started by zhutianlun810, Jun 11, 2019.

  1. zhutianlun810

    zhutianlun810

    Joined:
    Sep 17, 2017
    Posts:
    165
    Hello,

    I am implementing a GPU Voxelization algorithm in Unity3D. I am following the tutorial https://developer.nvidia.com/content/basics-gpu-voxelization . In general, it uses a orthographic camera to rasterize the scene.

    I read some implementations from others, and I find something I can't understand.
    This is the geom step in the shader:
    Code (CSharp):
    1.             void geom(triangle v2g input[3], inout TriangleStream<g2f> triStream)
    2.                 {
    3.                     v2g p[3];
    4.                     for (int i = 0; i < 3; i++)
    5.                     {
    6.                         p[i] = input[i];
    7.                         p[i].pos = mul(unity_ObjectToWorld, p[i].pos);                
    8.                     }
    9.  
    10.                     float3 realNormal = float3(0.0, 0.0, 0.0);
    11.            
    12.                     float3 V = p[1].pos.xyz - p[0].pos.xyz;
    13.                     float3 W = p[2].pos.xyz - p[0].pos.xyz;
    14.            
    15.                     realNormal.x = (V.y * W.z) - (V.z * W.y);
    16.                     realNormal.y = (V.z * W.x) - (V.x * W.z);
    17.                     realNormal.z = (V.x * W.y) - (V.y * W.x);
    18.            
    19.                     float3 absNormal = abs(realNormal);
    20.            
    21.  
    22.            
    23.                     int angle = 0;
    24.                     if (absNormal.z > absNormal.y && absNormal.z > absNormal.x)
    25.                     {
    26.                         angle = 0;
    27.                     }
    28.                     else if (absNormal.x > absNormal.y && absNormal.x > absNormal.z)
    29.                     {
    30.                         angle = 1;
    31.                     }
    32.                     else if (absNormal.y > absNormal.x && absNormal.y > absNormal.z)
    33.                     {
    34.                         angle = 2;
    35.                     }
    36.                     else
    37.                     {
    38.                         angle = 0;
    39.                     }
    40.            
    41.                     for (int i = 0; i < 3; i ++)
    42.                     {
    43.                         ///*
    44.                         if (angle == 0)
    45.                         {
    46.                             p[i].pos = mul(SEGIVoxelViewFront, p[i].pos);            
    47.                         }
    48.                         else if (angle == 1)
    49.                         {
    50.                             p[i].pos = mul(SEGIVoxelViewLeft, p[i].pos);            
    51.                         }
    52.                         else
    53.                         {
    54.                             p[i].pos = mul(SEGIVoxelViewTop, p[i].pos);
    55.                         }
    56.                
    57.                         p[i].pos = mul(UNITY_MATRIX_P, p[i].pos);
    58.                
    59.                         #if defined(UNITY_REVERSED_Z)
    60.                         p[i].pos.z = 1.0 - p[i].pos.z;
    61.                         #else
    62.                         p[i].pos.z *= -1.0;
    63.                         #endif
    64.                
    65.                         p[i].angle = (float)angle;
    66.                     }
    67.            
    68.                     triStream.Append(p[0]);
    69.                     triStream.Append(p[1]);
    70.                     triStream.Append(p[2]);
    71.                 }
    The input vertices are in world space. It basiclly computes which side is best suited for projection(which side can make more grids).Then it passes the data to the fragment shader. Notice that the p.pos has been transfer to the clip space(p.pos = mul(UNITY_MATRIX_P, p.pos)

    Then, in the fragment shader, the author computes the coordinate of vertices in voxel space, by:
    int3 coord = int3((int)(input.pos.x), (int)(input.pos.y), (int)(input.pos.z * VoxelResolution));

    And It is also the coordinate for the 3D texture.

    I can't understand this part. After mul(UNITY_MATRIX_P, p.pos), the vertices should be in clip space. The x,y,z shoule be all in range[-1,1].

    I have few questions:
    If my voxels are 256*256*256,
    1. What is the correct index value(UVW) for a RWTexture3D texture with 256*256*256 dimension in a shader? From [0,1] or [0,255], or [-127, 128]?
    2. For any possible index value type, the coord calculation step is weird for me. Why only z value is multiplied by VoxelResolution(256 in this case)? It seems the ranges of x,y after projection are (0,256), and z is (0,1)? How can this happen? How can the camera know my voxel's resolution?

    I feel I just have a blur understanding on these things. I truly appreciate any help.
     
    Last edited: Jun 11, 2019
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    In the vertex shader, the range is actually unbounded. For vertices within the frustum the x and y are going to be within the range of -w to +w, rather than "-1 to 1", and z depends on if you're using opengl (-w to +w) vs any other graphics API (w near, 0.0 far). If .pos is using SV_Position as its semantic, then in the fragment shader the x and y are target pixel positions ((pos.z / pos.w + pos.w) * 0.5 * target resolution), z is the non linear depth buffer (pos.z / pos.w).
     
  3. zhutianlun810

    zhutianlun810

    Joined:
    Sep 17, 2017
    Posts:
    165
    Thank you. I am still confused on the space transforms.
    This is the code for Vertex shader:
    Code (CSharp):
    1.                 struct v2g
    2.                 {
    3.                     float4 pos : SV_POSITION;
    4.                     half4 uv : TEXCOORD0;
    5.                     float3 normal : TEXCOORD1;
    6.                     float angle : TEXCOORD2;
    7.                 };
    8.              
    9.                 struct g2f
    10.                 {
    11.                     float4 pos : SV_POSITION;
    12.                     half4 uv : TEXCOORD0;
    13.                     float3 normal : TEXCOORD1;
    14.                     float angle : TEXCOORD2;
    15.                 };
    16.              
    17.                 v2g vert(appdata_full v)
    18.                 {
    19.                     v2g o;
    20.                  
    21.                     float4 vertex = v.vertex;
    22.                  
    23.                     o.normal = UnityObjectToWorldNormal(v.normal);
    24.                     float3 absNormal = abs(o.normal);
    25.                  
    26.                     o.pos = vertex;
    27.                  
    28.                     o.uv = float4(TRANSFORM_TEX(v.texcoord.xy, _MainTex), 1.0, 1.0);
    29.                  
    30.                  
    31.                     return o;
    32.                 }
    And the geometry shader is shown in my original post.
    Then the fragment shader:
    Code (CSharp):
    1.  
    2.                 float4 frag (g2f input) : SV_TARGET
    3.                 {
    4.                     int3 coord = int3((int)(input.pos.x), (int)(input.pos.y), (int)(input.pos.z * VoxelResolution));
    5. ..........
    6. }
    7.  
    The output of the vertex shader is still in local space, right? Because I don't see any space transform in the code above. In the geometry shader, the vertices are firstly transformed to world space:
    p[i].pos = mul(unity_ObjectToWorld, p[i].pos);
    . Then they are multiplied by UNITY_MATRIX_P. Finally they are passed to the fragment shader. You said "For vertices within the frustum the x and y are going to be within the range of -w to +w, rather than "-1 to 1"". For an orthographic camera, the w is 1 right? So the x and y should be in range -1 to 1.

    And the most confusing part is, "If .pos is using SV_Position as its semantic, then in the fragment shader the x and y are target pixel positions ((pos.z / pos.w + pos.w) * 0.5 * target resolution), z is the non linear depth buffer (pos.z / pos.w)." So Do you mean there is something internal transforms happening from geometry shader to fragment shader? And, how can the shader know the target resolution? The camera's setting is:
    Code (CSharp):
    1.         voxelCameraGO = new GameObject("SEGI_VOXEL_CAMERA");
    2.         voxelCameraGO.hideFlags = HideFlags.HideAndDontSave;
    3.  
    4.         voxelCamera = voxelCameraGO.AddComponent<Camera>();
    5.         voxelCamera.enabled = false;
    6.         voxelCamera.orthographic = true;
    7.         voxelCamera.orthographicSize = voxelSpaceSize * 0.5f;
    8.         voxelCamera.nearClipPlane = 0.0f;
    9.         voxelCamera.farClipPlane = voxelSpaceSize;
    10.         voxelCamera.depth = -2;
    11.         voxelCamera.renderingPath = RenderingPath.Forward;
    12.         voxelCamera.clearFlags = CameraClearFlags.Color;
    13.         voxelCamera.backgroundColor = Color.black;
    14.         voxelCamera.useOcclusionCulling = false;
    If I want to voxelize my scene using resolution 256*256*256, I must ensure the resolution of the orthographic camera is 256*256. But I don't see any config can control it.

    I know this is a long post. Thanks for your patience to read it.
     
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    Yes, it should be. Unless you’re using a custom orthographic projection matrix where it’s not.

    Before the fragment shader stage, the GPU is transforming the position from clip space to window space. The final outcome of that transformation is what the fragment shader gets from the SV_Position semantic. And the shader doesn’t know the target resolution unless the application passes that data in. Unity uses _ScreenParams normally, though that only has the x and y, so you’d still have to pass in a 3D texture’s depth dimension manually.

    The camera doesn’t know, nor care what the resolution is. That’s controlled by the render target.
     
  5. zhutianlun810

    zhutianlun810

    Joined:
    Sep 17, 2017
    Posts:
    165
    Thank you. One more question, what is pos.z in window space?
     
  6. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    It's the value used by the depth buffer. For perspective projection, it's a non-linear 0.0 to 1.0 range, for orthographic it's a linear 0.0 to 1.0. For OpenGL the 0.0 is at the near plane, for everything else 1.0 is the near plane.
     
  7. zhutianlun810

    zhutianlun810

    Joined:
    Sep 17, 2017
    Posts:
    165
    I see. Thank you.