Search Unity

  1. Unity Asset Manager is now available in public beta. Try it out now and join the conversation here in the forums.
    Dismiss Notice

Official Feedback Wanted: Mesh Compute Shader access

Discussion in 'Graphics Experimental Previews' started by Aras, Apr 20, 2021.

  1. cecarlsen

    cecarlsen

    Joined:
    Jun 30, 2006
    Posts:
    864
    @Aras, is this a feature we can hope for to be implemented one day? Variable index count in meshes generated on the GPU? Or will we always have to rely on DrawProceduralIndirect for this?
     
    LooperVFX likes this.
  2. Aras

    Aras

    Unity Technologies

    Joined:
    Nov 7, 2005
    Posts:
    4,770
    I don't know TBH. Well, I mean -- for any "regular" rendering (like what Meshes are for), the CPU has to know the number of vertices/indices, so it has to read it back to the CPU. At which point, you can do that yourself just as well; read it back, do mesh.SetSubMesh with the result.

    Alternative would be changing Mesh class to be able to "now know" the amount of vertices/indices on the CPU, and being able to somehow get into a mode where it would do an indirect draw behind the scenes somehow. But that's piling more complexity into an already complex Mesh class, and also potentially has a lot of unintended ripple effects (across the whole engine there's probably a million places that assume the CPU can get vertex count of a Mesh).
     
  3. cecarlsen

    cecarlsen

    Joined:
    Jun 30, 2006
    Posts:
    864
    Yes, that's exactly what I'm talking about! I can imagine the Mesh class complexity is growing, and that it is highly interconnected with the engine ... but ... on the user side it would greatly simplify rendering of any GPU procedure that generates varying index count (for example a particle system). I am super happy that we are finally able to build meshes on the GPU - but without varying index count, we are left with the old tedious method of having to write a new specialized shader and renderer using an IndirectArguments buffer with DrawProceduralIndirect, and again being locked out of easy integration with ShaderGraph. This use case happens quite often for me.

    I wonder what Unity is using mesh index count on the CPU for.
     
    vx4 likes this.
  4. Aras

    Aras

    Unity Technologies

    Joined:
    Nov 7, 2005
    Posts:
    4,770
    Alternative I could see is some sort of ability to create a "custom renderer component", i.e. kinda what you can achieve from code via Graphics.Draw* today, but as an actual component that does not have to be emitted from C# every frame.

    Anyway, one or another way would be up for the graphics team(s) to decide and do, I'm just a casual bystander :)
     
    LooperVFX and FernandoMK like this.
  5. cecarlsen

    cecarlsen

    Joined:
    Jun 30, 2006
    Posts:
    864
    Haha :) I am sure the graphics team listen up when you drop ideas.

    That sounds like a new pattern :eek::confused: The last thing we need are new patterns.

    I understand that emitting graphics calls from C# is a performance bottleneck. But in this case, one would typically emit an entire particle system (or the like), so that is hardly a problem. I would much prefer a solution that allow you to simply mark a GraphicsBuffer containing indices as dynamic, and then let the Mesh handle the magic work of copying the index count on the GPU to an IndirectArguments buffer and calling DrawProceduralIndirect ... so that standard shaders as well as ShaderGraph can actually be used.

    Writing renderers and shaders for all render pipelines is such a pain. This could be a remedy.
     
    Last edited: Oct 19, 2021
    LooperVFX and FernandoMK like this.
  6. korzen303

    korzen303

    Joined:
    Oct 2, 2012
    Posts:
    223
    Hi @Aras , I am trying to hookup to a vertex buffer of Skinned Mesh with a ComputeShader to refit a custom BVH tree. In other words, I simply need to copy vertex positions from skinnedMeshVertexBuffer to bvhVertexBuffer<float4>

    The VertexAttribsDesc are as follows:

    (attr=Position fmt=Float32 dim=3 stream=0)
    (attr=Normal fmt=Float32 dim=3 stream=0)
    (attr=Tangent fmt=Float32 dim=4 stream=0)
    (attr=TexCoord0 fmt=Float32 dim=2 stream=1)
    (attr=BlendWeight fmt=Float32 dim=4 stream=2)
    (attr=BlendIndices fmt=UInt32 dim=4 stream=2)

    My compute shader is:

    Code (CSharp):
    1. [numthreads(32, 1, 1)]
    2. void UpdateVertexBufferFromMesh(uint id : SV_DispatchThreadID)
    3. {
    4.     if (id < bvhLeavesCount)
    5.     {
    6.         uint3 vraw = meshVertexBuffer.Load3(id * meshVertexBufferStride);
    7.         float3 v = asfloat(vraw);
    8.         bvhVertexBuffer[id.x] = float4(v, 0);
    9.     }
    10. }
    and

    Code (CSharp):
    1.             skinnedMeshRenderer.vertexBufferTarget |= GraphicsBuffer.Target.Raw;
    2.             GraphicsBuffer vertexBuf = skinnedMeshRenderer.GetVertexBuffer();
    3.             if (vertexBuf != null)
    4.             {
    5.                 colCompute.SetInt("meshVertexBufferStride", this.vertexBufferStride);
    6.                 colCompute.SetBuffer(kernelId, ShaderParam.bvhVertexBuffer, this.bvhVertexBuffer);
    7.                 colCompute.SetBuffer(kernelId, "meshVertexBuffer", vertexBuf);
    8.                 colCompute.Dispatch(kernelId, Mathf.CeilToInt((float)bvhLeavesIds.Count / 32), 1, 1);
    9.                 vertexBuf.Dispose();
    10.             }
    The problem is that neither meshVertexBufferStride = 40 (10 floats in stream 0) nor 80 (20 floats in total) works.
    I tried with regular Mesh with different layout and it worked as expected.

    Thanks for your help
     
  7. korzen303

    korzen303

    Joined:
    Oct 2, 2012
    Posts:
    223
    Actually, meshVertexBufferStride = 40, the proble was that the vertices in skinnedMeshVertexBuffer are rotated and translated like this:
    upload_2021-11-22_12-50-26.png

    Despite this, it seems that the animations work fine. Any idea how to align the raw vertices with the visual mesh? I tried invoking the above compute shader on OnPreRender, but it did not help.

    Cheers
     
    Last edited: Nov 22, 2021
  8. Aras

    Aras

    Unity Technologies

    Joined:
    Nov 7, 2005
    Posts:
    4,770
    Probably missing a matrix transform of the root game object, or something? Just my guess.
     
  9. korzen303

    korzen303

    Joined:
    Oct 2, 2012
    Posts:
    223
    Hi @Aras,

    yes, this was the root transform problem indeed. Thanks!

    However, I am experiencing yet another problem with NewMeshAPI.

    When I setup my GPU collision detection compute data with BakeMesh and bvhVertexBuffer.SetData() things work as expected (although very slow).

    When I try to avoid unecessery memory transfers and directly get the data via SkinnedMeshRenderer.GetVertexBuffer() it seems that the data in that buffer is one frame behind compared to what is actually being rendered.
    I am running all the updates in LateUpdate and triedwith skinnedMesh Motion Vectors check box both on and off (same result).

    This results in a nasty "visual penetration" of the underlying skinned mesh over the cloth.
    Please have a look at arms at aroudn 0:54:



    Getting SkinnedMeshRenderer.GetVertexBuffer() to work would drastically simplify my code. In the current version of my simulation I am doing all the avatar skinning/blending again in Compute just to get the data for collision system, which is pure waste of resources.

    Cheers

    PS: I am on 2021.2.3f1
     
  10. Aras

    Aras

    Unity Technologies

    Joined:
    Nov 7, 2005
    Posts:
    4,770
    Could it be an issue that at LateUpdate time, the skinning has not happened yet? So effectively you are getting data from the previous frame. If you want to access data after skinning has happened, use Camera.onPreRender (see earlier post in this same thread, https://forum.unity.com/threads/feedback-wanted-mesh-compute-shader-access.1096531/#post-7363490) or RenderPipelineManager.beginFrameRendering (another post in this thread https://forum.unity.com/threads/feedback-wanted-mesh-compute-shader-access.1096531/#post-7415966)
     
  11. korzen303

    korzen303

    Joined:
    Oct 2, 2012
    Posts:
    223
    Thanks @Aras for quick reply. I moved the whole GPU physics simulation to PreRender callback. Unfortunately, it still does not work as expected:
    upload_2021-11-24_20-48-15.png

    Any other ideas? If not, I guess I will need to fallback to my own skinning compute...

    PS: Also, in PreRender, it did not work at all using OpenGL32 (Win64)
     
  12. Qleenie

    Qleenie

    Joined:
    Jan 27, 2019
    Posts:
    868
    @Aras I have some issues with writing to the VertexBuffer of SkinnedMeshRenderer. I am doing calculation of vertex positions and normals as adviced in "RenderPipelineManager.beginFrameRendering ". This works in principal, however somehow the SkinnedMeshRenderer seems to still transform the vertices based on the configured "RootBone". It also seems to reverse the scale of the mesh.

    If I set "forceMatrixRecalculationPerRender" on SMR to true, all seems good, but this seems VERY expensive.
    Even more strangely, if I don't call GetVertexBuffer() every frame, I get a flickering between the correct vertices and the transformed ones, probably related to the "PreviousVertexBuffer"; BUT if I change any property in the Inspector during runtime, all seems to render correctly again; sadly I am not able to replicate this behavior by changing properties from script, though.
    (Update: These two option only lead to invalidity of vertexbuffer, so I saw the non manipulated mesh rendered correctly, not my manipulation).

    I cannot wrap my head around this behavior. If I render same way to a MeshFilter, everything is correct, but I don't get the "SkinnedMotionVectors" I need.
    Any help would be appreciated, I am really stuck for about a day now.
     
    Last edited: Nov 25, 2021
  13. Qleenie

    Qleenie

    Joined:
    Jan 27, 2019
    Posts:
    868
    How would we decide in that case which buffer to use on which frame?
     
  14. Qleenie

    Qleenie

    Joined:
    Jan 27, 2019
    Posts:
    868
    I somehow got SkinnedMotionVectors to work with a mesh generated by compute shaders; however, this was a horrible experience, many different workarounds for issues I encountered:

    • it seems to matter WHEN to grab the VertexBuffer; in some cases it works if I get it in LateUpdate, in other cases in "beginFrameRendering"; compute shader however always needs to be invoked in "beginFrameRendering", as stated before. If I grab the VertexBuffer at the wrong time, very strange things are happening (see my posting above).
    • it seems that the VertexBuffer gets nullified as soon as I switch to Scene View in Editor, although it passes a null check one line before, which makes the Editor unusable (needed to build a workaround in case the Camera is the Scene Camera).
    • I did not manage to keep the buffers in memory and swapping them, as described by @Aras before, as there seems to be no clue about which one is the current one of the current frame. Just flipping frame by frame does not seem to work, as the order seems to change from time to time, especially if working in the Editor.

    Seems to be that there are still issues with the concept of manipulating the VertexBuffer of SkinnedMesh, but maybe it's due to my lack of understanding. An example of how it was intended to be used would be very helpful, I was not able to find any, only for the much more simple Mesh case.
     
    theLORDofMUESLI likes this.
  15. Przemyslaw_Zaworski

    Przemyslaw_Zaworski

    Joined:
    Jun 9, 2017
    Posts:
    328
    Now we don't need to use tessellation shaders. Here is GPU based Phong Tessellation created with new API (example created by Przemyslaw Zaworski):

    Tessellation.cs
    Code (CSharp):
    1. using Unity.Collections;
    2. using UnityEngine;
    3. using UnityEngine.Rendering;
    4. using System;
    5. using System.Collections.Generic;
    6.  
    7. public class Tessellation : MonoBehaviour
    8. {
    9.     public ComputeShader TessellationCS;
    10.     [Range(1,128)] public int TessellationFactor = 16;
    11.     [Range(0f,1f)] public float Phong = 0.5f;
    12.  
    13.     int _VertexCount = 0;
    14.     ComputeBuffer _ComputeBuffer;
    15.     GraphicsBuffer _GraphicsBuffer;
    16.     Mesh _Mesh;
    17.     string _Name;
    18.     bool _Recalculate = false;
    19.     Attribute[] _Attributes;
    20.     Bounds _Bounds;
    21.  
    22.     struct Attribute
    23.     {
    24.         public Vector4 Vertex;
    25.         public Vector3 Normal;
    26.         public Vector2 TexCoord;
    27.     }
    28.  
    29.     byte[] ToByteArray(Attribute[] vectors)
    30.     {
    31.         byte[] bytes = new byte[sizeof(float) * vectors.Length * 9];
    32.         for (int i = 0; i < vectors.Length; i++)
    33.         {
    34.             Buffer.BlockCopy(BitConverter.GetBytes(vectors[i].Vertex[0]),   0, bytes, (i*9+0)*sizeof(float), sizeof(float));
    35.             Buffer.BlockCopy(BitConverter.GetBytes(vectors[i].Vertex[1]),   0, bytes, (i*9+1)*sizeof(float), sizeof(float));
    36.             Buffer.BlockCopy(BitConverter.GetBytes(vectors[i].Vertex[2]),   0, bytes, (i*9+2)*sizeof(float), sizeof(float));
    37.             Buffer.BlockCopy(BitConverter.GetBytes(vectors[i].Vertex[3]),   0, bytes, (i*9+3)*sizeof(float), sizeof(float));
    38.             Buffer.BlockCopy(BitConverter.GetBytes(vectors[i].Normal[0]),   0, bytes, (i*9+4)*sizeof(float), sizeof(float));
    39.             Buffer.BlockCopy(BitConverter.GetBytes(vectors[i].Normal[1]),   0, bytes, (i*9+5)*sizeof(float), sizeof(float));
    40.             Buffer.BlockCopy(BitConverter.GetBytes(vectors[i].Normal[2]),   0, bytes, (i*9+6)*sizeof(float), sizeof(float));
    41.             Buffer.BlockCopy(BitConverter.GetBytes(vectors[i].TexCoord[0]), 0, bytes, (i*9+7)*sizeof(float), sizeof(float));
    42.             Buffer.BlockCopy(BitConverter.GetBytes(vectors[i].TexCoord[1]), 0, bytes, (i*9+8)*sizeof(float), sizeof(float));
    43.         }
    44.         return bytes;
    45.     }
    46.  
    47.     void Start()
    48.     {
    49.         Mesh mesh = GetComponent<MeshFilter>().sharedMesh;
    50.         if (mesh == null) mesh = Resources.GetBuiltinResource<Mesh>("Sphere.fbx");
    51.         _Name = mesh.name;
    52.         _Bounds = mesh.bounds;
    53.         _Attributes = new Attribute[mesh.triangles.Length];
    54.         Vector3 normal = new Vector3(0.0f, 0.0f, 1.0f);
    55.         Vector2 uv = new Vector2(0.0f, 0.0f);
    56.         for (int i = 0; i < mesh.triangles.Length; i++)
    57.         {
    58.             Vector3 p = mesh.vertices[mesh.triangles[i]];
    59.             if (mesh.normals.Length > 0) normal = mesh.normals[mesh.triangles[i]];
    60.             if (mesh.uv.Length > 0) uv = mesh.uv[mesh.triangles[i]];
    61.             _Attributes[i].Vertex = new Vector4(p.x, p.y, p.z, 1.0f);
    62.             _Attributes[i].Normal = normal;
    63.             _Attributes[i].TexCoord = uv;
    64.         }
    65.         _ComputeBuffer = new ComputeBuffer(9 * _Attributes.Length, sizeof(float), ComputeBufferType.Raw);
    66.          byte[] bytes = ToByteArray(_Attributes);
    67.         _ComputeBuffer.SetData(bytes);
    68.     }
    69.  
    70.     void Update()
    71.     {
    72.         _VertexCount = TessellationFactor * TessellationFactor * _Attributes.Length;
    73.         if (_Mesh == null || _Recalculate)
    74.         {
    75.             _Recalculate = false;
    76.             Release();
    77.             _Mesh = new Mesh();
    78.             _Mesh.name = _Name;
    79.             _Mesh.vertexBufferTarget |= GraphicsBuffer.Target.Raw;
    80.             _Mesh.indexBufferTarget |= GraphicsBuffer.Target.Raw;
    81.             VertexAttributeDescriptor[] attributes = new []
    82.             {
    83.                 new VertexAttributeDescriptor(VertexAttribute.Position, VertexAttributeFormat.Float32, 3, stream:0),
    84.                 new VertexAttributeDescriptor(VertexAttribute.Normal, VertexAttributeFormat.Float32, 3, stream:0),
    85.                 new VertexAttributeDescriptor(VertexAttribute.TexCoord0, VertexAttributeFormat.Float32, 2, stream:0),
    86.             };
    87.             _Mesh.SetVertexBufferParams(_VertexCount, attributes);
    88.             _Mesh.SetIndexBufferParams(_VertexCount, IndexFormat.UInt32);
    89.             NativeArray<int> indexBuffer = new NativeArray<int>(_VertexCount, Allocator.Temp);
    90.             for (int i = 0; i < _VertexCount; ++i) indexBuffer[i] = i;
    91.             _Mesh.SetIndexBufferData(indexBuffer, 0, 0, indexBuffer.Length, MeshUpdateFlags.DontRecalculateBounds);
    92.             indexBuffer.Dispose();
    93.             SubMeshDescriptor submesh = new SubMeshDescriptor(0, _VertexCount, MeshTopology.Triangles);
    94.             submesh.bounds = _Bounds;
    95.             _Mesh.SetSubMesh(0, submesh);
    96.             _Mesh.bounds = submesh.bounds;
    97.             GetComponent<MeshFilter>().sharedMesh = _Mesh;
    98.             _GraphicsBuffer = _Mesh.GetVertexBuffer(0);
    99.             TessellationCS.SetInt("_VertexCount", _VertexCount);
    100.             TessellationCS.SetInt("_TessellationFactor", TessellationFactor);
    101.             TessellationCS.SetFloat("_Phong", Phong);
    102.             TessellationCS.SetBuffer(0, "_GraphicsBuffer", _GraphicsBuffer);
    103.             TessellationCS.SetBuffer(0, "_ComputeBuffer", _ComputeBuffer);
    104.             TessellationCS.GetKernelThreadGroupSizes(0, out uint x, out uint y, out uint z);
    105.             int threadGroupsX = Mathf.Min((_VertexCount + (int)x - 1) / (int)x, 65535);
    106.             int threadGroupsY = (int)y;
    107.             int threadGroupsZ = (int)z;
    108.             TessellationCS.Dispatch(0, threadGroupsX, threadGroupsY, threadGroupsZ);
    109.         }
    110.     }
    111.  
    112.     void Release()
    113.     {
    114.         if (_Mesh != null) Destroy(_Mesh);
    115.         if (_GraphicsBuffer != null) _GraphicsBuffer.Release();
    116.     }
    117.  
    118.     void OnDestroy()
    119.     {
    120.         Release();
    121.         if (_ComputeBuffer != null) _ComputeBuffer.Release();
    122.     }
    123.  
    124.     void OnValidate()
    125.     {
    126.         _Recalculate = true;
    127.     }
    128. }
    Tessellation.compute
    Code (CSharp):
    1. #pragma kernel CSMain
    2.  
    3. ByteAddressBuffer  _ComputeBuffer;
    4. RWByteAddressBuffer _GraphicsBuffer;
    5. int _TessellationFactor, _VertexCount;
    6. float _Phong;
    7.  
    8. // GPU PRO 3, Advanced Rendering Techniques, A K Peters/CRC Press 2012
    9. // Chapter 1 - Vertex shader tessellation, Holger Gruen
    10. void Tessellation (uint id, out float3 position, out float3 normal, out float2 texcoord)
    11. {
    12.     uint subtriangles = (_TessellationFactor * _TessellationFactor);
    13.     float triangleID = float (( id / 3 ) % subtriangles);
    14.     float row = floor (sqrt( triangleID ));
    15.     uint column = triangleID - ( row * row );
    16.     float incuv = 1.0 / _TessellationFactor;
    17.     float u = ( 1.0 + row ) / _TessellationFactor;
    18.     float v = incuv * floor (float(column) * 0.5);
    19.     u -= v;
    20.     float w = 1.0 - u - v;
    21.     uint address = id / (3u * subtriangles) * 3u;
    22.     float3 p1 = asfloat(_ComputeBuffer.Load4(((address + 0) * 9) << 2)).xyz;
    23.     float3 p2 = asfloat(_ComputeBuffer.Load4(((address + 1) * 9) << 2)).xyz;
    24.     float3 p3 = asfloat(_ComputeBuffer.Load4(((address + 2) * 9) << 2)).xyz;
    25.     float3 n1 = asfloat(_ComputeBuffer.Load3(((address + 0) * 9) + 4 << 2));
    26.     float3 n2 = asfloat(_ComputeBuffer.Load3(((address + 1) * 9) + 4 << 2));
    27.     float3 n3 = asfloat(_ComputeBuffer.Load3(((address + 2) * 9) + 4 << 2));
    28.     float2 t1 = asfloat(_ComputeBuffer.Load2(((address + 0) * 9) + 7 << 2));
    29.     float2 t2 = asfloat(_ComputeBuffer.Load2(((address + 1) * 9) + 7 << 2));
    30.     float2 t3 = asfloat(_ComputeBuffer.Load2(((address + 2) * 9) + 7 << 2));
    31.     uint vertexID = ((id / 3u) / subtriangles) * 3u + (id % 3u);
    32.     switch(vertexID % 3)
    33.     {
    34.         case 0u:
    35.             if ((column & 1u) != 0)
    36.             {
    37.                 v += incuv, u -= incuv;
    38.             }
    39.             break;
    40.         case 1u:
    41.             if ((column & 1u) == 0)
    42.             {
    43.                 v += incuv, u -= incuv;
    44.             }
    45.             else
    46.             {
    47.                 v += incuv, u -= incuv;
    48.                 w += incuv, u -= incuv;
    49.             }
    50.             break;
    51.         case 2u:
    52.             if ((column & 1u) == 0)
    53.             {
    54.                 u -= incuv, w += incuv;
    55.             }
    56.             else
    57.             {
    58.                 w += incuv, u -= incuv;
    59.             }
    60.             break;
    61.     }
    62.     normal = float3(u * n1 + v * n2 + w * n3);
    63.     texcoord = float2(u * t1 + v * t2 + w * t3);
    64.     float3 location = float3(u * p1 + v * p2 + w * p3);
    65.     float3 d1 = location - n1 * (dot(location, n1) - dot(p1, n1));
    66.     float3 d2 = location - n2 * (dot(location, n2) - dot(p2, n2));
    67.     float3 d3 = location - n3 * (dot(location, n3) - dot(p3, n3));
    68.     position = _Phong * (d1 * u + d2 * v + d3 * w) + (1.0 - _Phong) * location;
    69. }
    70.  
    71. [numthreads(64, 1, 1)]
    72. void CSMain (uint3 id : SV_DispatchThreadID)
    73. {
    74.     if ((int)id.x >= (_VertexCount)) return;
    75.     float3 position = 0;
    76.     float3 normal = 0;
    77.     float2 texcoord = 0;
    78.     Tessellation(id.x, position, normal, texcoord);
    79.     _GraphicsBuffer.Store3((id.x * 8) << 2, asuint(position));
    80.     _GraphicsBuffer.Store3((id.x * 8 + 3) << 2, asuint(normal));
    81.     _GraphicsBuffer.Store2((id.x * 8 + 6) << 2, asuint(texcoord));
    82. }
    Original mesh (tessellation factor = 1):
    upload_2021-11-28_21-35-37.png

    Phong factor = 0.0:
    upload_2021-11-28_21-28-24.png

    Phong factor = 0.5:
    upload_2021-11-28_21-29-10.png
     
  16. korzen303

    korzen303

    Joined:
    Oct 2, 2012
    Posts:
    223
    Thanks Przemek, great example!

    I have been looking something like this for a while now...
     
  17. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    That's awesome, but have you tested the performance? I'd expect this to be much slower than the fixed-function tesselation (although maybe on modern GPUs it's basically just turned into this in the UMD? Is dedicated tesselation hardware still a thing?)
     
  18. kyriew

    kyriew

    Joined:
    Sep 10, 2019
    Posts:
    7
    Awesome! Can I use it to develop self rasterization pipeline?
     
    FernandoMK likes this.
  19. hoshos

    hoshos

    Joined:
    Mar 7, 2015
    Posts:
    920
    Hello!
    First of all, thank you for adding the wonderful features Mesh.GetVertexBuffer () and SkinnedMeshRenderer.GetVertexBuffer ().
    In my tests both of these features are working fine.
    However, there is one strange behavior with Mesh.GetVertexBuffer ().
    That is, if you get the buffer with GetVertexBuffer () for the sharedMesh of MeshFilter and rewrite it with the compute shader, the original mesh asset will be updated for some reason.
    I thought the buffer was just the contents of the GPU and wouldn't be written to the asset if changed.
    This problem can be avoided by getting an instance with MeshFilter.mesh and executing it, but is this a specification?
    If possible, I would like to use sharedMesh to transform the mesh without creating an instance mesh.
    SkinnedMeshRenderer.GetVertexBuffer () doesn't do that and works as expected.
     
  20. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    1,550
    Did you actually query the mesh data on the CPU side to see that this was the case?
    Shared Mesh is exactly as the name says, "shared". Any objects that are referencing that same shared mesh will be rendering using the same GPU-side buffer as well, because this saves on memory and bandwidth and allows for better instancing. So if you make a change to that buffer, then all objects using that shared mesh will render with that changed data. It's not modifying the original source (cpu-side), just the GPU-side buffer that mesh links to.

    The reason SkinnedMeshRenderer doesn't "seem" to do that is because each frame the GPU-side buffer has to be updated with the bone and blendshape transformations. So any changes you make aren't going to persist.

    Current behaviour for non-skinned meshes is as expected and intended. Unity would make performance much worse if it was creating a unique buffer each time you called GetVertexBuffer. It's just meant to give you a native reference to the data so you can efficiently modify what Unity has put there. There is no reason for Unity to update the GPU buffer each frame for meshes that aren't being deformed, that would be incredibly wasteful, imagine a huge landscape or city and all that mesh data has to be sent to the GPU every frame... No beuno. It's only when you do
    Apply()
    on the Mesh class that the data will be updated, in case you do want to change the CPU-side source.

    So, if a mesh is using a shared mesh, but you want alterations to its shape only for that object... Then yes you should be accessing
    .mesh
    to get a unique instance, since you are trying to do unique modifications to one object.
     
    Last edited: Dec 18, 2021
  21. hoshos

    hoshos

    Joined:
    Mar 7, 2015
    Posts:
    920
    Yes, I'm going to understand that working with a sharedMesh changes all the meshes that reference it.
    I may not have been able to explain it well because of the translation.
    My question is that if you operate sharedMesh with VertexBuffer () while the game is running, even the mesh assets in Unity's Project will be changed after the game is stopped.
    In other words, the mesh file itself is changed.
    I've known for a long time that this happens when you run Mesh.SerVertices () on a sharedMesh.
    I was wondering because I thought this wouldn't happen with the new VertexBuffer ().
    Because the buffer is from the GPU and I didn't expect its impact to affect the project's asset files.
     
  22. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    1,550
    The asset in the project is not changed, the asset is merely in memory still. If you restarted Unity those changes would be lost. Unity is not pulling the GPU-side data and serializing it to your source asset.
     
  23. hoshos

    hoshos

    Joined:
    Mar 7, 2015
    Posts:
    920
    Thank you, I finally understand the reason.
    Certainly restarting the editor brought the mesh assets back.
    In other words, the shared mesh cached in memory is changed while the editor is running, and even if you stop running the game, it will remain until you close the editor.
    I was able to confirm this by doing a simple experiment.
    Even if I stopped the game, the mesh asset displayed in the inspector was deformed, so I thought that the file was rewritten.
    Thank you for your commentary!
     
  24. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    1,550
    No problem :)

    Also technically it should go back to normal if you load a scene that isn't referencing that asset in any way (except maybe if editing Unity built-in primitives but can't confirm), as it will be Garbage Collected and unloaded from vram. It makes sense for it to stay even after play stops, as it will allow you to more quickly go back into play mode whenever you need to by avoiding that GPU upload overhead each time.
     
    hoshos likes this.
  25. iddqd

    iddqd

    Joined:
    Apr 14, 2012
    Posts:
    501
    GPU mesh modification is really great. But after modifying, how would I update the collider. I generally use the Jobified BakeMesh https://docs.unity3d.com/ScriptReference/Physics.BakeMesh.html but the resulting collider doesn't have the GPU updates. I'm guessing I must first asynchronously read the mesh changes back to CPU before baking, but I didn't find any info regarding this. Any help?

    Thanks
     
    vx4 likes this.
  26. iddqd

    iddqd

    Joined:
    Apr 14, 2012
    Posts:
    501
    Ok, so this seems to work
    Mesh.GetVertexBuffer, then use AsyncGPUReadbackRequest to read it into a NativeArray, then apply it with Mesh.SetVertexBufferData and then I can Physics.BakeMesh.
     
  27. Przemyslaw_Zaworski

    Przemyslaw_Zaworski

    Joined:
    Jun 9, 2017
    Posts:
    328
    Recently I made NURBS surface generator with Unity Mesh API Compute Shader Access.
    Optimized for 16 control points, but everyone can modify source code a bit and use various numbers of points
    (in this case remember about rules of NURBS splines - number of control points depends on surface degree and knot vector length).
    Press play, then press R key to load default points values, move transform positions to change control points position, press Space to save mesh.

    Example image (control points visible as gizmos and mesh as shaded wireframe):

    upload_2022-3-15_22-40-3.png

    Full source code:

    Code (CSharp):
    1. using Unity.Collections;
    2. using UnityEngine;
    3. using UnityEngine.Rendering;
    4. using System.Collections.Generic;
    5.  
    6. [RequireComponent(typeof(MeshFilter))]
    7. [RequireComponent(typeof(MeshRenderer))]
    8. public class NurbsSurface : MonoBehaviour
    9. {
    10.     public ComputeShader NurbsSurfaceCS;
    11.     [Range(1, 1024)] public int TessellationFactor = 64;
    12.     [System.Serializable] public struct ControlPoint {public Transform Transform; public float Weight;};
    13.     [System.Serializable] public struct Vertex {public Vector3 Position; public Vector3 Normal; public Vector2 Texcoord;};
    14.     public ControlPoint[] ControlPoints;
    15.  
    16.     private List<Vector4> _ControlPoints = new List<Vector4>();
    17.     private GraphicsBuffer _GraphicsBuffer;
    18.     private Mesh _Mesh;
    19.     private bool _Recalculate = false;
    20.     private int _VertexCount = 0;
    21.  
    22.     void LoadDefaultSettings()
    23.     {
    24.         Vector3[] vectors = new Vector3[]
    25.         {
    26.             new Vector3(00.0f, 00.0f, 00.0f), new Vector3(10.0f, 00.0f, 00.0f), new Vector3(20.0f, 00.0f, 00.0f), new Vector3(30.0f, 00.0f, 00.0f),
    27.             new Vector3(00.0f, 00.0f, 10.0f), new Vector3(10.0f, 10.0f, 10.0f), new Vector3(20.0f, 10.0f, 10.0f), new Vector3(30.0f, 00.0f, 10.0f),
    28.             new Vector3(00.0f, 00.0f, 20.0f), new Vector3(10.0f, 10.0f, 20.0f), new Vector3(20.0f, 10.0f, 20.0f), new Vector3(30.0f, 00.0f, 20.0f),
    29.             new Vector3(00.0f, 00.0f, 30.0f), new Vector3(10.0f, 00.0f, 30.0f), new Vector3(20.0f, 00.0f, 30.0f), new Vector3(30.0f, 00.0f, 30.0f)
    30.         };
    31.         ControlPoints = new ControlPoint[vectors.Length];
    32.         for (int i = 0; i < vectors.Length; i++)
    33.         {
    34.             GameObject element = new GameObject();
    35.             element.name = "ControlPoint" + (i + 1).ToString();
    36.             element.transform.parent = this.transform;
    37.             element.transform.position = vectors[i];
    38.             ControlPoints[i] = new ControlPoint() {Transform = element.transform, Weight = 1.0f};
    39.         }
    40.     }
    41.  
    42.     /* Example for 16 control points (grid 4x4) - fill grid 4x4 into larger collection (max 64)
    43.     HLSL arrays need to have constant (predefined) size, so for another number of control points
    44.     you need to modify a code a bit...
    45.         ########
    46.         ########
    47.         ########
    48.         ########
    49.         ****####
    50.         ****####
    51.         ****####
    52.         ****####
    53.     */
    54.     void LoadDefaultCollection()
    55.     {
    56.         _ControlPoints.Clear();
    57.         _ControlPoints.TrimExcess();
    58.         int index = 0;
    59.         for (int i = 0; i < 64; i++) // max 64 elements, because _ControlPoints[8][8] from compute shader
    60.         {
    61.             if (index > ControlPoints.Length - 1) continue;
    62.             if (i % 8 > 3 || i > 31)
    63.             {
    64.                 _ControlPoints.Add(Vector4.zero);
    65.             }
    66.             else
    67.             {
    68.                 Vector3 p = ControlPoints[index].Transform.position;
    69.                 _ControlPoints.Add(new Vector4(p.x, p.y, p.z, ControlPoints[index].Weight));
    70.                 index++;
    71.             }
    72.         }
    73.     }
    74.  
    75.     void ExportMesh()
    76.     {
    77.         #if UNITY_EDITOR
    78.             Vertex[] points = new Vertex[_Mesh.vertices.Length];
    79.             _GraphicsBuffer.GetData(points);
    80.             Mesh mesh = new Mesh();
    81.             mesh.indexFormat = UnityEngine.Rendering.IndexFormat.UInt32;
    82.             List<Vector3> vertices = new List<Vector3>();
    83.             List<int> triangles = new List<int>();
    84.             List<Vector3> normals = new List<Vector3>();
    85.             List<Vector2> uvs = new List<Vector2>();
    86.             for (int i = 0; i < points.Length; i++)
    87.             {
    88.                 vertices.Add(points[i].Position);
    89.                 triangles.Add(i);
    90.                 normals.Add(points[i].Normal);
    91.                 uvs.Add(points[i].Texcoord);
    92.             }
    93.             mesh.vertices = vertices.ToArray();
    94.             mesh.triangles = triangles.ToArray();
    95.             mesh.normals = normals.ToArray();
    96.             mesh.uv = uvs.ToArray();
    97.             string fileName = System.Guid.NewGuid().ToString("N");
    98.             UnityEditor.AssetDatabase.CreateAsset(mesh, "Assets/" + fileName + ".asset");
    99.             GameObject target = new GameObject();
    100.             target.name = fileName;
    101.             target.AddComponent<MeshFilter>().sharedMesh = mesh;
    102.             MeshRenderer renderer = target.AddComponent<MeshRenderer>();
    103.             renderer.sharedMaterial = UnityEditor.AssetDatabase.GetBuiltinExtraResource<Material>("Default-Material.mat");
    104.             UnityEditor.PrefabUtility.SaveAsPrefabAsset(target, "Assets/" + fileName + ".prefab");
    105.         #endif
    106.     }
    107.  
    108.     void Start()
    109.     {
    110.         #if UNITY_EDITOR
    111.             Material material = this.GetComponent<Renderer>().sharedMaterial;
    112.             if (material == null)
    113.             {
    114.                 material = UnityEditor.AssetDatabase.GetBuiltinExtraResource<Material>("Default-Material.mat");
    115.                 this.GetComponent<Renderer>().sharedMaterial = material;
    116.             }
    117.         #endif
    118.     }
    119.  
    120.     void Update()
    121.     {
    122.         _VertexCount = TessellationFactor * TessellationFactor * 6;
    123.         if (_Mesh == null || _Recalculate)
    124.         {
    125.             Release();
    126.             _Recalculate = false;
    127.             _Mesh = new Mesh();
    128.             _Mesh.name = "NURBS Surface";
    129.             _Mesh.vertexBufferTarget |= GraphicsBuffer.Target.Raw;
    130.             _Mesh.indexBufferTarget |= GraphicsBuffer.Target.Raw;
    131.             VertexAttributeDescriptor[] attributes = new []
    132.             {
    133.                 new VertexAttributeDescriptor(VertexAttribute.Position,  VertexAttributeFormat.Float32, 3, stream:0),
    134.                 new VertexAttributeDescriptor(VertexAttribute.Normal,    VertexAttributeFormat.Float32, 3, stream:0),
    135.                 new VertexAttributeDescriptor(VertexAttribute.TexCoord0, VertexAttributeFormat.Float32, 2, stream:0),
    136.             };
    137.             _Mesh.SetVertexBufferParams(_VertexCount, attributes);
    138.             _Mesh.SetIndexBufferParams(_VertexCount, IndexFormat.UInt32);
    139.             NativeArray<int> indexBuffer = new NativeArray<int>(_VertexCount, Allocator.Temp);
    140.             for (int i = 0; i < _VertexCount; ++i) indexBuffer[i] = i;
    141.             _Mesh.SetIndexBufferData(indexBuffer, 0, 0, indexBuffer.Length, MeshUpdateFlags.DontRecalculateBounds | MeshUpdateFlags.DontValidateIndices);
    142.             indexBuffer.Dispose();
    143.             SubMeshDescriptor subMeshDescriptor = new SubMeshDescriptor(0, _VertexCount, MeshTopology.Triangles);
    144.             subMeshDescriptor.bounds = new Bounds(Vector3.zero, new Vector3(1e5f, 1e5f, 1e5f));
    145.             _Mesh.SetSubMesh(0, subMeshDescriptor);
    146.             _Mesh.bounds = subMeshDescriptor.bounds;
    147.             GetComponent<MeshFilter>().sharedMesh = _Mesh;
    148.             _GraphicsBuffer = _Mesh.GetVertexBuffer(0);
    149.         }
    150.         LoadDefaultCollection();
    151.         NurbsSurfaceCS.SetInt("_VertexCount", _VertexCount);
    152.         NurbsSurfaceCS.SetInt("_TessellationFactor", TessellationFactor);
    153.         NurbsSurfaceCS.SetBuffer(0, "_GraphicsBuffer", _GraphicsBuffer);
    154.         NurbsSurfaceCS.SetVectorArray("_ControlPoints", _ControlPoints.ToArray());
    155.         NurbsSurfaceCS.GetKernelThreadGroupSizes(0, out uint x, out uint y, out uint z);
    156.         int threadGroupsX = Mathf.Min((_VertexCount + (int)x - 1) / (int)x, 65535);
    157.         int threadGroupsY = (int)y;
    158.         int threadGroupsZ = (int)z;
    159.         NurbsSurfaceCS.Dispatch(0, threadGroupsX, threadGroupsY, threadGroupsZ);
    160.         if (Input.GetKeyDown(KeyCode.R)) LoadDefaultSettings();
    161.         if (Input.GetKeyDown(KeyCode.Space) && (_Mesh != null)) ExportMesh();
    162.     }
    163.  
    164.     void Release()
    165.     {
    166.         if (_Mesh != null) Destroy(_Mesh);
    167.         if (_GraphicsBuffer != null) _GraphicsBuffer.Release();
    168.     }
    169.  
    170.     void OnDestroy()
    171.     {
    172.         Release();
    173.     }
    174.  
    175.     void OnValidate()
    176.     {
    177.         _Recalculate = true;
    178.     }
    179. }
    Code (CSharp):
    1. #pragma kernel CSMain
    2.  
    3. float4 _ControlPoints[8][8];
    4. RWByteAddressBuffer _GraphicsBuffer;
    5. uint _TessellationFactor, _VertexCount;
    6.  
    7. static float KnotVectors[8][2] = // knotsDim = (8, 8)
    8. {
    9.     {0.0, 0.0},
    10.     {0.0, 0.0},
    11.     {0.0, 0.0},
    12.     {0.0, 0.0},
    13.     {1.0, 1.0},
    14.     {1.0, 1.0},
    15.     {1.0, 1.0},
    16.     {1.0, 1.0}
    17. };
    18.  
    19. // L. Piegl, W. Tiller, "The NURBS Book", Springer Verlag, 1997
    20. // http://nurbscalculator.in/
    21. float3 NurbsSurface (float4 cps[8][8], int2 cpsDim, float knots[8][2], int2 knotsDim, float2 uv)
    22. {
    23.     const int2 degree = int2(3, 3);
    24.     #define msize max(degree.x, degree.y)
    25.     for (int x = 0; x < cpsDim[0]; x++)
    26.     {
    27.         for (int y = 0; y < cpsDim[1]; y++)
    28.         {
    29.             cps[x][y].xyz *= cps[x][y].w;
    30.         }
    31.     }
    32.     int2 spans = int2(0, 0);
    33.     float4 p = 0;
    34.     [unroll(2)] for (int i = 0; i < 2; i++)
    35.     {
    36.         int n = knotsDim[i] - degree[i] - 2;
    37.         if (uv[i] == (knots[n + 1][i])) spans[i] = n;
    38.         int low = degree[i];
    39.         int high = n + 1;
    40.         int mid = (int)floor((low + high) / 2.0);
    41.         [unroll(16)]
    42.         while (uv[i] < knots[mid][i] || uv[i] >= knots[mid + 1][i])
    43.         {
    44.             if (uv[i] < knots[mid][i])
    45.                 high = mid;
    46.             else
    47.                 low = mid;
    48.             mid = (int)floor((low + high) / 2.0);
    49.         }
    50.         spans[i] = mid;
    51.     }
    52.     float N     [msize + 1][2];
    53.     float left  [msize + 1][2];
    54.     float right [msize + 1][2];
    55.     N[0][0] = N[0][1] = 1.0;
    56.     [loop] for (int h = 0; h < 2; h++)
    57.     {
    58.         float saved = 0.0, temp = 0.0;
    59.         [loop] for (int j = 1; j <= degree[h]; j++)
    60.         {
    61.             left[j][h] = (uv[h] - knots[spans[h] + 1 - j][h]);
    62.             right[j][h] = knots[spans[h] + j][h] - uv[h];
    63.             saved = 0.0;
    64.             [loop] for (int r = 0; r < j; r++)
    65.             {
    66.                 temp = N[r][h] / (right[r + 1][h] + left[j - r][h]);
    67.                 N[r][h] = saved + right[r + 1][h] * temp;
    68.                 saved = left[j - r][h] * temp;
    69.             }
    70.             N[j][h] = saved;
    71.         }
    72.     }
    73.     for (int m = 0; m <= degree[1]; m++)
    74.     {
    75.         float4 t = 0;
    76.         for (int k = 0; k <= degree[0]; k++) t += cps[spans[0] - degree[0] + k][spans[1] - degree[1] + m] * N[k][0];
    77.         p += t * N[m][1];
    78.     }
    79.     return (p.w != 0) ? p.xyz / p.w : p.xyz;
    80. }
    81.  
    82. float3 NurbsSurfaceNormal (float4 cps[8][8], int2 cpsDim, float knots[8][2], int2 knotsDim, float2 uv)
    83. {
    84.     for (int x = 0; x < cpsDim[0]; x++) for (int y = 0; y < cpsDim[1]; y++) cps[x][y].xyz *= cps[x][y].w;
    85.     const int order = 1; // order of the derivative
    86.     const int2 degree = int2(3, 3); // surface degrees
    87.     #define msize max(degree.x, degree.y)
    88.     int2 spans = int2(0, 0);
    89.     [unroll(2)] for (int i = 0; i < 2; i++) // find spans
    90.     {
    91.         int n = knotsDim[i] - degree[i] - 2;
    92.         if (uv[i] == (knots[n + 1][i])) spans[i] = n;
    93.         int low = degree[i];
    94.         int high = n + 1;
    95.         int mid = (int)floor((low + high) / 2.0);
    96.         [unroll(16)] while (uv[i] < knots[mid][i] || uv[i] >= knots[mid + 1][i])
    97.         {
    98.             if (uv[i] < knots[mid][i])
    99.                 high = mid;
    100.             else
    101.                 low = mid;
    102.             mid = (int)floor((low + high) / 2.0);
    103.         }
    104.         spans[i] = mid;
    105.     }
    106.     float basis[order + 1][msize + 1][2];
    107.     float left[msize + 1][2];
    108.     float right[msize + 1][2];
    109.     float ndu[msize + 1][msize + 1][2];
    110.     ndu[0][0][0] = ndu[0][0][1] = 1.0;
    111.     [unroll(2)] for (int q = 0; q < 2; q++) // derivatives of the basis functions
    112.     {
    113.         [loop] for (int j = 1; j <= degree[q]; j++)
    114.         {
    115.             left[j][q] = uv[q] - knots[spans[q] + 1 - j][q];
    116.             right[j][q] = knots[spans[q] + j][q] - uv[q];
    117.             float saved = 0.0;
    118.             [loop] for (int r = 0; r < j; r++)
    119.             {
    120.                 ndu[j][r][q] = right[r + 1][q] + left[j - r][q];
    121.                 float temp = ndu[r][j - 1][q] / ndu[j][r][q];
    122.                 ndu[r][j][q] = saved + right[r + 1][q] * temp;
    123.                 saved = left[j - r][q] * temp;
    124.             }
    125.             ndu[j][j][q] = saved;
    126.         }
    127.         [loop] for (int m = 0; m <= degree[q]; m++) basis[0][m][q] = ndu[m][degree[q]][q];
    128.         float a[2][msize + 1][2];
    129.         for (int r = 0; r <= degree[q]; r++)
    130.         {
    131.             int s1 = 0;
    132.             int s2 = 1;
    133.             a[0][0][q] = 1.0;
    134.             [unroll(order)] for (int k = 1; k <= order; k++)
    135.             {
    136.                 float d = 0.0;
    137.                 int rk = r - k;
    138.                 int pk = degree[q] - k;
    139.                 int j1 = 0;
    140.                 int j2 = 0;
    141.                 if (r >= k)
    142.                 {
    143.                     a[s2][0][q] = a[s1][0][q] / ndu[pk + 1][rk][q];
    144.                     d = a[s2][0][q] * ndu[rk][pk][q];
    145.                 }
    146.                 j1 = (rk >= -1) ? 1 : -rk;
    147.                 j2 = (r - 1 <= pk) ? k - 1 : degree[q] - r;
    148.                 [unroll(order)] for (int j = j1; j <= j2; j++)
    149.                 {
    150.                     a[s2][j][q] = (a[s1][j][q] - a[s1][j - 1][q]) / ndu[pk + 1][rk + j][q];
    151.                     d += a[s2][j][q] * ndu[rk + j][pk][q];
    152.                 }
    153.                 if (r <= pk)
    154.                 {
    155.                     a[s2][k][q] = -a[s1][k - 1][q] / ndu[pk + 1][r][q];
    156.                     d += a[s2][k][q] * ndu[r][pk][q];
    157.                 }
    158.                 basis[k][r][q] = d;
    159.                 int s3 = s1;
    160.                 s1 = s2;
    161.                 s2 = s3;
    162.             }
    163.         }
    164.         float f = degree[q];
    165.         [unroll(order)] for (int k = 1; k <= order; k++)
    166.         {
    167.             for (int h = 0; h <= degree[q]; h++) basis[k][h][q] *= f;
    168.             f *= degree[q] - k;
    169.         }
    170.     }
    171.     float4 derivatives[order + 1][order + 1] = {{0..xxxx, 0..xxxx}, {0..xxxx, 0..xxxx}};
    172.     int du = min(order, degree[0]);
    173.     int dv = min(order, degree[1]);
    174.     float4 temp[degree[1] + 1];
    175.     [unroll(4)] for (int w = 0; w <= du; w++) // derivatives of a B-spline surface
    176.     {
    177.         [unroll(4)] for (int s = 0; s <= degree[1]; s++)
    178.         {
    179.             temp[s] = (float4) 0;
    180.             [unroll(4)] for (int r = 0; r <= degree[0]; r++)
    181.             {
    182.                 float4 pw = cps[spans[0] - degree[0] + r] [spans[1] - degree[1] + s];
    183.                 temp[s] += pw * basis[w][r][0];
    184.             }
    185.         }
    186.         int dd = min(order - w, dv);
    187.         [unroll(4)] for (int l = 0; l <= dd; l++)
    188.         {
    189.             [loop] for (int s = 0; s <= degree[1]; s++) derivatives[w][l] += temp[s] * basis[l][s][1];
    190.         }
    191.     }
    192.     float3 SKL[order + 1][order + 1];
    193.     int BIN[4] = {1,1,1,1}; // binomial coefficients
    194.     [unroll(4)] for (int k = 0; k < (order + 1); ++k) // derivatives of a NURBS surface
    195.     {
    196.         [unroll(4)] for (int l = 0; l < order - k + 1; ++l)
    197.         {
    198.             float3 v = derivatives[k][l].xyz;
    199.             [unroll(4)] for (int z = 1; z < l + 1; ++z)
    200.             {
    201.                 if (z > l) continue;
    202.                 [unroll(4)] for (int a = 1; a <= z; ++a) BIN[0] *= (l + 1 - a) / a;
    203.                 v -= SKL[k][l - z] * derivatives[0][z].w * BIN[0];
    204.             }
    205.             [unroll(4)] for (int i = 1; i < k + 1; ++i)
    206.             {
    207.                 if (i > k)
    208.                     BIN[1] = 0;
    209.                 else
    210.                     [unroll(4)] for (int b = 1; b <= i; ++b) BIN[1] *= (k + 1 - b) / b;
    211.                 v -= SKL[k - i][l] * derivatives[i][0].w * BIN[1];
    212.                 float3 tmp = (float3) 0;
    213.                 [unroll(4)] for (int j = 1; j < l + 1; ++j)
    214.                 {
    215.                     if (j > l) continue;
    216.                     [unroll(4)] for (int c = 1; c <= j; ++c) BIN[2] *= (l + 1 - c) / c;
    217.                     tmp -= SKL[k - 1][l - j] * derivatives[i][j].w * BIN[2];
    218.                 }
    219.                 if (i > k) continue;
    220.                 [unroll(4)] for (int d = 1; d <= i; ++d) BIN[3] *= (k + 1 - d) / d;
    221.                 v -= tmp * BIN[3];
    222.             }
    223.             SKL[k][l] = v / derivatives[0][0].w;
    224.         }
    225.     }
    226.     return normalize(cross(SKL[1][0], SKL[0][1]));
    227. }
    228.  
    229. float Mod (float x, float y)
    230. {
    231.     return x - y * floor(x/y);
    232. }
    233.  
    234. void GenerateSurface (uint id : SV_VertexID, uint tess, float4 cps[8][8], float kv[8][2], out float3 position, out float3 normal, out float2 texcoord)
    235. {
    236.     int instance = int(floor(id / 6.0));
    237.     float x = sign(Mod(20.0, Mod(float(id), 6.0) + 2.0));
    238.     float y = sign(Mod(18.0, Mod(float(id), 6.0) + 2.0));
    239.     float u = (float(instance / tess) + x) / float(tess);
    240.     float v = (Mod(float(instance), float(tess)) + y) / float(tess);
    241.     float2 uv = float2(u,v);
    242.     position = NurbsSurface (cps, int2(4, 4), kv, int2(8, 8), uv);
    243.     normal = NurbsSurfaceNormal (cps, int2(4, 4), kv, int2(8, 8), uv);
    244.     texcoord = uv;
    245. }
    246.  
    247. [numthreads(64, 1, 1)]
    248. void CSMain(uint3 threadID : SV_DispatchThreadID)
    249. {
    250.     uint id = threadID.x;
    251.     if (id >= _VertexCount) return;
    252.     float3 position = 0;
    253.     float3 normal = 0;
    254.     float2 texcoord = 0;
    255.     GenerateSurface(id, _TessellationFactor, _ControlPoints, KnotVectors, position, normal, texcoord);
    256.     _GraphicsBuffer.Store3((id * 8 + 0) << 2, asuint(position));
    257.     _GraphicsBuffer.Store3((id * 8 + 3) << 2, asuint(normal));
    258.     _GraphicsBuffer.Store2((id * 8 + 6) << 2, asuint(texcoord));
    259. }
     
    bb8_1, hugokostic, LooperVFX and 8 others like this.
  28. sabint

    sabint

    Joined:
    Apr 19, 2014
    Posts:
    26
  29. kenamis

    kenamis

    Joined:
    Feb 5, 2015
    Posts:
    387
    Is there any way to get the skinned mesh renderer's vertex buffer at the point after the mesh's object space vertices have been set up for the frame, but before the blendshape and skinning shaders have run? Something like the "inOutMeshVertices" buffer found in the blendshape compute shader. I don't want to have to modify the source mesh because I want to keep that resource shared amongst many renderers.
     
  30. nasos_333

    nasos_333

    Joined:
    Feb 13, 2013
    Posts:
    13,358
    Hi,

    Is there anything on the api that can be used like the Geometry shaders. E.g create vertices directly on gpu side, which is extremely efficient, without using at all the cpu side ?

    Thanks
     
  31. joshuacwilde

    joshuacwilde

    Joined:
    Feb 4, 2018
    Posts:
    731
  32. nasos_333

    nasos_333

    Joined:
    Feb 13, 2013
    Posts:
    13,358
    I think this is overkill in performance since need to reallocate the data on CPU side and send to GPU. While in Geometry shader is all done on GPU side. Sizes also would be fixed i think.
     
  33. joshuacwilde

    joshuacwilde

    Joined:
    Feb 4, 2018
    Posts:
    731
    Well you asked for efficient, and geometry shaders are not efficient. But ofc you can use geometry shaders if that fits what you want. There is no way to create arbitrarily sized data on the GPU afaik.
     
  34. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    1,550
    How exactly do you think the GPU does it with Geometry shaders? GPUs don't dynamically allocate buffers. A Geo shader simply has a VERY LARGE append buffer, the count of which is used in render arguments.
    You can set up your own append buffer and add data to it from your compute shader. The data does not need to be transferred back to CPU side. You simply use that same buffer pointer and it's count to render the mesh with. It is very efficient, much more so than geometry shaders. And in fact geometry shaders have always been implemented as Compute Shaders in some APIs like Metal.
     
    bb8_1 likes this.
  35. nasos_333

    nasos_333

    Joined:
    Feb 13, 2013
    Posts:
    13,358
    Is there any practical sample of augmented run time geometry that change vertex count dynamically with this method ?

    Or of something that is same but more efficient than the geometry shaders ? Any actual code for comparisons also would help.

    Another aspect is that Geometry shader can operate directly on existing geometry, e.g. i use if for voxelization of the scene in a replacement shader, which means this is also another use that cant be covered by compute shaders.
     
    Last edited: Oct 5, 2022
    Walter_Hulsebos likes this.
  36. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    1,550
    Complete examples are pretty sparse due to the native API feature being somewhat recent. There is a Compute based grass shader example here but it came out a bit before this new API was announced, so it's not using that feature to skip the initial vertex array copy.



    But the core concept is the use of an "AppendStructuredBuffer". This allows you to add elements to this buffer from *within* the Compute Shader.
    After the Compute shader is done, this Buffer has a Count value (like a List in C#).
    This Count and Buffer can then be passed as the rendering arguments for draw calls or just as bound properties on materials. It does not need to be copied back to C# land, you simply tell the GPU to use the stuff that's already on the GPU.

    The most important place where performance savings can come from this is reducing the amount of work the GPU has to do every frame. With a Geometry Shader, EVERY SHADER PASS has to do those geometry mesh construction calculations. A "Lit" shader has multiple passes, and even an Unlit one will have AT LEAST 2 passes if not more depending on pipeline/shader options.

    With a Compute Shader, you calculate that data ONCE, and now every shader pass already has the computed mesh data to work with. Which brings up a second point, you don't need to customize shaders to work with the Compute approach, because you're just updating buffers, which can be then rendered with any material.

    Using Compute also opens up the ability to do more processing on your output mesh to optimize it for rendering better, such as applying more advanced culling techniques because you can do a pass over the whole buffer. And for the example of Grass, which would apply to many other things, you don't have to generate the mesh *every frame*, instead you only regenerate it for major view changes, otherwise only applying things like wind and collision, further reducing GPU work. There are so many more possibilities for optimization with Compute that are not available to you with Geometry shaders.

    Yes, Geometry shaders make for a simpler setup process, but they are ultimately not the best option for performance, and are not even supported on many platforms.

    What makes you say that? The compute shader, using this new API, has direct access to the vertices of the mesh. So you run the compute with a Dispatch size equal to the triangle or vertex count of the mesh, and for each vertex or triangle you're able to append data to your Append buffer, just like you would in a Geometry shader. And then use that Append buffer to render with instead of letting that mesh render (or render both).

    Also, you are not really saving memory. There's a reason you have to define
    [maxvertexcount(#)]
    in your Geometry shader, because it has to compile a buffer with that
    maxvertcount*srcTriangleCount
    for your output geometry to go into. But at least with the Compute shader method, you can avoid computing that buffer every pass and even not every frame.

    Some other stuff to throw in as reference but slightly less related:
    https://github.com/keijiro/NoiseBall6
    https://github.com/keijiro/Skinner
    https://github.com/keijiro/ComputeMarchingCubes

    And some examples of nice uses of this API in this thread such as:
    https://forum.unity.com/threads/fee...ute-shader-access.1096531/page-2#post-7966668
     
    Last edited: Oct 6, 2022
    bb8_1 and nasos_333 like this.
  37. nasos_333

    nasos_333

    Joined:
    Feb 13, 2013
    Posts:
    13,358
    Thanks, will check all, hopefully can find some way to replace the geometry stage, so can be more globally compatible. Geometry shaders are extremely easy to use comparing to anything else, wish they were globally applicable. From performance side they seem to work super fast so far, even on my ancient laptop.
     
    Walter_Hulsebos likes this.
  38. felbj694

    felbj694

    Joined:
    Oct 23, 2016
    Posts:
    35
    As far as I understand, please correct me if I am wrong.
    If you have a fixed amount of vertices/indices you can use the mesh api and modify the buffers in a compute shader and still have the convenience of using mesh renderer component and all materials work as expected.

    But if you have a dynamic amount of vertices there is no need/gain to use the mesh api. Its better to declare your own buffers (large enough to not overflow) and then populate them with your compute shaders. Then for rendering you write your own shaders that can read from these buffers, and rendering is done with e.g. Graphics.DrawProceduralIndirect.
     
  39. cecarlsen

    cecarlsen

    Joined:
    Jun 30, 2006
    Posts:
    864
    I still see this as a huge limitation in the implementation of compute access for the Mesh class. The number one reason why it rocks, is because it just works with the existing MeshRenderer, Materials, and ShaderGraph. But the moment you want to render something with a dynamic size you are screwed and have to DrawProceduralIndirect and roll your own shader. Particle systems and voxelscapes being examples, but anything dynamic really. With URP and HDRP shaders being significantly more complicated than in legacy RP, this is something I am struggling with on a regular basis.

    I have advocated earlier in this thread for adding support for a dynamic (Consume/Append) GraphicsBuffer as index buffer for a Mesh. I still hope this need will be taken seriously by UTECH one day.
     
    Last edited: Oct 14, 2022
    bb8_1 and felbj694 like this.
  40. nasos_333

    nasos_333

    Joined:
    Feb 13, 2013
    Posts:
    13,358
    hm, i wonder if this could be used somehow in Unity to overcome the huge limitation of Metal

    https://github.com/Kosalos/GeometryShader

    I have been trying to emulate the voxelization of a scene without geometry shaders without any luck, it is just not possible or extremely hard and limited, plus in all cases would be vastly slower.

    The more i use geometry shaders and try to replicate similar effects, the more i wonder how would be possible to remove the most powerful and useful tool from any hardware or API.

    Geometry shaders is hands down the most impressive tech i have played with, by a million miles distance to the next one.
     
  41. cecarlsen

    cecarlsen

    Joined:
    Jun 30, 2006
    Posts:
    864
    I get an immediate crash when I enable HDRP+DXR raytraced reflections and have a GPU mesh (compute generated GraphicsBuffer) in the scene. Unity 2022.2. I also tried with Keijiros NoiseBall6 example - crashes as soon as I enabled raytraced reflections.

    I think this was working in earlier versions. Or is my memory failing me?

    EDIT: I have created a bug report with case number IN-27999. I can confirm that my memory is not failing me; it works flawlessly in 2021.3.
     
    Last edited: Jan 6, 2023
    chap-unity and Passeridae like this.
  42. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    1,550
    Probably was. Worth filing a bug report, 2022.2 is still going to have some noticeable bugs only recently leaving beta.
     
    cecarlsen likes this.
  43. rile-s

    rile-s

    Joined:
    Sep 18, 2019
    Posts:
    26
    Hi,
    I've been trying to deform meshes via compute and while it works on a regular mesh. It doesn't work on a skinned one. The buffers are set (to the right IDs, I just verified) and shader is dispatched right before frame rendering starts, but to no results. I've found that RenderDoc shows that the graphics buffer containing the vertex data is missing. Is this a Unity bug?
    renderdoc.png
     
  44. rile-s

    rile-s

    Joined:
    Sep 18, 2019
    Posts:
    26
    Turns out GPU skinning was turned off, which I assumed was on. Turning it back on fixed the issue.
     
  45. cecarlsen

    cecarlsen

    Joined:
    Jun 30, 2006
    Posts:
    864
    The issue ( IN-27999) persists in 2023.1.0 Alpha 25. Instant crash when running a scene with a GPU mesh and ray tracing. So stuck in 2021.3.
     
  46. INedelcu

    INedelcu

    Unity Technologies

    Joined:
    Jul 14, 2015
    Posts:
    173
    Hi @cecarlsen,

    I fixed the issue locally and will backported soon. As a workaround you can replace some code in that project in MeshBuilder.cs file:

    This code:

    Code (CSharp):
    1. // Submesh initialization
    2. DynamicMesh.SetSubMesh(0, new SubMeshDescriptor(0, vertexCount),
    3.                                MeshUpdateFlags.DontRecalculateBounds);
    with this code:

    Code (CSharp):
    1. // Submesh initialization
    2. var smdesc = new SubMeshDescriptor(0, vertexCount);
    3. smdesc.vertexCount = vertexCount;
    4.      
    5. DynamicMesh.SetSubMesh(0, smdesc, MeshUpdateFlags.DontRecalculateBounds);
    It crashed because the vertexCount in SubMeshDescriptor was 0.

    Also you might want to use Dynamic Geometry as Ray Tracing Mode for the noisy sphere Renderer.
     
    Walter_Hulsebos and cecarlsen like this.
  47. chap-unity

    chap-unity

    Unity Technologies

    Joined:
    Nov 4, 2019
    Posts:
    766
    Hey, just wanted to share that the fix made by @INedelcu was merged this morning on 2022.2 branch and will be included in the 2022.2.7f1 next release.

    As for the 2023.1, it's included starting from Unity 2023.1.0b3 (thus why it's still failing on your alpha 25)

    Thanks again for the report !
     
    cecarlsen likes this.
  48. Przemyslaw_Zaworski

    Przemyslaw_Zaworski

    Joined:
    Jun 9, 2017
    Posts:
    328
    Efficient calculation of axis aligned bounding box coordinates using Mesh Compute. It can be useful to generation of bounding box in real time for highpoly meshes.Unfortunately, it seems that InterlockedMin / InterlockedMax have a problem with negative floats casted to uint. So I use offset to fix this issue, but maybe more elegant solution exists ?

    Code (CSharp):
    1. using UnityEngine;
    2. using UnityEngine.Rendering;
    3. using System;
    4.  
    5. public class BoundingBox : MonoBehaviour
    6. {
    7.     [SerializeField] ComputeShader _ComputeShader;
    8.     [SerializeField] GameObject _GameObject;
    9.  
    10.     float _Offset = 10000f; // InterlockedMin, InterlockedMin work wrong for negative floats ?
    11.     float[] _Box = new float[6] {0f,0f,0f,0f,0f,0f}; // minx, miny, minz, maxx, maxy, maxz;
    12.     int _Dimension, _VertexCount;
    13.     ComputeBuffer _ComputeBuffer;
    14.     GraphicsBuffer _GraphicsBuffer;
    15.     Mesh _Mesh;
    16.     MeshRenderer _MeshRenderer;
    17.  
    18.     void Start()
    19.     {
    20.         _Mesh = _GameObject.GetComponent<MeshFilter>().sharedMesh;
    21.         _Mesh.vertexBufferTarget |= GraphicsBuffer.Target.Raw;
    22.         _ComputeBuffer = new ComputeBuffer(6, sizeof(uint), ComputeBufferType.Structured);
    23.         _GraphicsBuffer = _Mesh.GetVertexBuffer(0);
    24.         VertexAttributeDescriptor[] attributes = _Mesh.GetVertexAttributes();
    25.         for (int i = 0; i < attributes.Length; i++) _Dimension += attributes[i].dimension;
    26.         _MeshRenderer = _GameObject.GetComponent<MeshRenderer>();
    27.         _VertexCount = _Mesh.vertexCount;
    28.     }
    29.  
    30.     void Update()
    31.     {
    32.         if (Input.GetKeyDown(KeyCode.Space))
    33.         {
    34.             uint min = UInt32.MinValue;
    35.             uint max = UInt32.MaxValue;
    36.             uint[] array = new uint[6] {max, max, max, min, min, min};
    37.             _ComputeBuffer.SetData(array);
    38.             _ComputeShader.SetInt("_Dimension", _Dimension);
    39.             _ComputeShader.SetInt("_VertexCount", _VertexCount);
    40.             _ComputeShader.SetFloat("_Offset", _Offset);
    41.             _ComputeShader.SetMatrix("_LocalToWorldMatrix", _MeshRenderer.localToWorldMatrix);
    42.             _ComputeShader.SetBuffer(0, "_ComputeBuffer", _ComputeBuffer);
    43.             _ComputeShader.SetBuffer(0, "_GraphicsBuffer", _GraphicsBuffer);
    44.             _ComputeShader.Dispatch(0, (_VertexCount + 64) / 64, 1, 1);
    45.             _ComputeBuffer.GetData(array);
    46.             for (int i = 0; i < array.Length; i++)
    47.             {
    48.                 byte[] bytes = BitConverter.GetBytes(array[i]);
    49.                 _Box[i] = BitConverter.ToSingle(bytes, 0) - _Offset;
    50.                 Debug.Log(_Box[i].ToString("0.000"));
    51.             }
    52.         }
    53.     }
    54.  
    55.     void OnDrawGizmos()
    56.     {
    57.         Gizmos.color = Color.yellow;
    58.         Gizmos.DrawSphere(new Vector3(_Box[0], _Box[1], _Box[2]), 0.2f);
    59.         Gizmos.DrawSphere(new Vector3(_Box[3], _Box[1], _Box[2]), 0.2f);
    60.         Gizmos.DrawSphere(new Vector3(_Box[0], _Box[1], _Box[5]), 0.2f);
    61.         Gizmos.DrawSphere(new Vector3(_Box[3], _Box[1], _Box[5]), 0.2f);
    62.         Gizmos.DrawSphere(new Vector3(_Box[0], _Box[4], _Box[2]), 0.2f);
    63.         Gizmos.DrawSphere(new Vector3(_Box[3], _Box[4], _Box[2]), 0.2f);
    64.         Gizmos.DrawSphere(new Vector3(_Box[0], _Box[4], _Box[5]), 0.2f);
    65.         Gizmos.DrawSphere(new Vector3(_Box[3], _Box[4], _Box[5]), 0.2f);
    66.     }  
    67.  
    68.     void OnDestroy()
    69.     {
    70.         _ComputeBuffer.Release();
    71.         _GraphicsBuffer.Release();
    72.     }
    73. }
    Code (CSharp):
    1. #pragma kernel CSMain
    2.  
    3. float _Offset;
    4. uint _Dimension, _VertexCount;
    5. float4x4 _LocalToWorldMatrix;
    6. RWStructuredBuffer<uint> _ComputeBuffer;
    7. ByteAddressBuffer _GraphicsBuffer;
    8.  
    9. [numthreads(64, 1, 1)]
    10. void CSMain (uint3 id : SV_DispatchThreadID)
    11. {
    12.     if (id.x >= _VertexCount) return;
    13.     float3 localPos = asfloat(_GraphicsBuffer.Load4((id.x * _Dimension) << 2)).xyz;
    14.     float3 worldPos = mul(_LocalToWorldMatrix, float4(localPos, 1.0)).xyz;
    15.     InterlockedMin(_ComputeBuffer[0], asuint(worldPos.x + _Offset));
    16.     InterlockedMin(_ComputeBuffer[1], asuint(worldPos.y + _Offset));
    17.     InterlockedMin(_ComputeBuffer[2], asuint(worldPos.z + _Offset));
    18.     InterlockedMax(_ComputeBuffer[3], asuint(worldPos.x + _Offset));
    19.     InterlockedMax(_ComputeBuffer[4], asuint(worldPos.y + _Offset));
    20.     InterlockedMax(_ComputeBuffer[5], asuint(worldPos.z + _Offset));
    21. }
     
  49. Life_Is_Good_

    Life_Is_Good_

    Joined:
    Mar 4, 2013
    Posts:
    43
    Since HLSL does not support uint16, does every mesh need to use uint32 as format for the index buffer if one is looking to access the index buffer in a compute shader?
    I read somewhere that uint16 should just map to uint32 when sent in a buffer but that doesn't seem to be the case here as my compute shader is not retrieving the correct indices from the index buffer.
     
  50. INedelcu

    INedelcu

    Unity Technologies

    Joined:
    Jul 14, 2015
    Posts:
    173
    Check UnityRayTracingFetchTriangleIndices. It handles both 16-bit and 32-bit index buffer reads. This is from UnityRayTracingMeshUtils.cginc in the Editor installation folder and is used in ray tracing shaders code.

    Code (CSharp):
    1.  
    2. uint3 UnityRayTracingFetchTriangleIndices(uint primitiveIndex)
    3. {
    4.     uint3 indices;
    5.  
    6.     MeshInfo meshInfo = unity_MeshInfo_RT[0];
    7.  
    8.     if (meshInfo.indexSize == 2)
    9.     {
    10.         const uint offsetInBytes = (meshInfo.indexStart + primitiveIndex * 3) << 1;
    11.         const uint dwordAlignedOffset = offsetInBytes & ~3;
    12.         const uint2 fourIndices = unity_MeshIndexBuffer_RT.Load2(dwordAlignedOffset);
    13.  
    14.         if (dwordAlignedOffset == offsetInBytes)
    15.         {
    16.             indices.x = fourIndices.x & 0xffff;
    17.             indices.y = (fourIndices.x >> 16) & 0xffff;
    18.             indices.z = fourIndices.y & 0xffff;
    19.         }
    20.         else
    21.         {
    22.             indices.x = (fourIndices.x >> 16) & 0xffff;
    23.             indices.y = fourIndices.y & 0xffff;
    24.             indices.z = (fourIndices.y >> 16) & 0xffff;
    25.         }
    26.  
    27.         indices = indices + meshInfo.baseVertex.xxx;
    28.     }
    29.     else if (meshInfo.indexSize == 4)
    30.     {
    31.         const uint offsetInBytes = (meshInfo.indexStart + primitiveIndex * 3) << 2;
    32.         indices = unity_MeshIndexBuffer_RT.Load3(offsetInBytes) + meshInfo.baseVertex.xxx;
    33.     }
    34.     else // meshInfo.indexSize == 0
    35.     {
    36.         const uint firstVertexIndex = primitiveIndex * 3 + meshInfo.vertexStart;
    37.         indices = firstVertexIndex.xxx + uint3(0, 1, 2);
    38.     }
    39.  
    40.     return indices;
    41. }
    42.  
     
    customphase, bb8_1, cecarlsen and 2 others like this.