Search Unity

Feedback Decal,And Terrain Atlas for mobile.

Discussion in 'Shaders' started by Rahd, Dec 14, 2019.

  1. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    So , I'm trying to increase the details in my mobile game using decals .
    But using projectors or any of the decal techniques on github or the assets store will put me into the following issues :
    1- won't work on mobile
    2-too expensive to render
    3-involves Baking or the stamped meshes will have a special shader or a script.
    4-involves camera trickery.
    5-won't support dynamic batching or Gpu instancing.
    6-Most importantly deferred rendering !! (i'm on mobile)
    7-i'm living under a Rock and refuse to switch above unity 2017.4 .
    ...and other issues related to mobile .


    so my solution was to have a Screen space decal shader , using an Atlas texture , to be batched with Gpu instancing.
    the way i did change the tiles was to use color uvs and Make the color the instanced Parameter
    Code (CSharp):
    1.  
    2.   uv = float2((fmod(uv.x/_Color_Instance.b/_AtlasCount,_tileSize)+_tileSize*_Color_Instance.r),fmod(uv.y/_Color_Instance.b/_AtlasCount,_tileSize)+_tileSize*_Color_Instance.g);
    3.        
    4.  
    5.  
    (found this in the forum , someone was trying to use vertex colors to make a cube atlas terrain)







    in the example video , 16 independent different decals uses 1 Draw Call No mesh Decals No texture modifications , No projectors , Nothing but a shader with one Texture and camera Depth.
    the decals receive shadows , but has no lighting or normal mapping .
    aside from lighting and normal mapping , what can i add or change for better performance and looks .
    and let me know is there a better alternative to camera depth . ? and is there any was to exclude other objects from the decal , i was thinking to grabpass the vertex colors and use them as splat map .
    where the decal will change or gets removed .

    is there a way to grabpass the vertex colors of the meshes on a separate camera ?

    or is the decal projector package in the new unity 2019+, better than all of this mess.

    never got the chance to thank @bgolus . this guy answered like thousand of my shaders related questions without asking him .

    Thank you i hope you came by this thread . and thanks to all the active members .
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    I hope you're not using an actual color property, just a Vector4/float4 value. Color values might be getting color corrected. Vertex colors are unique in that they are not ever color corrected so the Color32 values you set on the vertex are the ones you get in the shader. Basically you're just setting a per-instance scale and offset similar to the old Tiling and Offset settings you'd set on a texture property, but you're doing it through 3 or 4 different sets of values when you could be using a single float4 to do it all.
    Code (csharp):
    1. // _AtlasScaleOffset stored with zw as the scale, xy as the offset, ala the old _TexName_ST values
    2. float4 atlasScaleOffset = UNITY_ACCESS_INSTANCED_PROP(props, _AtlasScaleOffset);
    3. float2 uv = i.uv * atlasScaleOffset.zw  + atlasScaleOffset .xy;
    This has the added benefit of being able to use decals of arbitrary sizes as you can pack them into the texture any way you want, similar to how sprite atlases work.

    Performance? Not much. This is already pretty simple. Anything you'd add would make these more expensive.

    The camera depth texture is generated by rendering the entire scene using each opaque or alpha tested objects' shadow caster pass. You can render out vertex colors using a replacement shader render, or even render out multiple "sets" of data with a multi render target replacement shader pass. All of this gets expensive on mobile though, so you want to do as little of it as possible. Basically if you start doing this you're essentially replicating much of the same work a deferred renderer does that makes it expensive for mobile, at which point using the deferred rendering path might end up faster.

    If you're looking to mask objects from receiving decals, you might be better off using stencils and have your dynamic / non projector receiving objects use a shader that writes to the stencil buffer and have your decal shader skip areas that have that stencil value. Most modern mobile devices should have stencil buffers available now. Alternatively you could use the render / material queue so that all decal receiving objects render first, then decals, then your non-decal receiving objects and not have to deal with any extra shader complexity.

    That's a feature exclusive to the HDRP, Unity's new renderer aimed at top end gaming PCs and consoles. It is "better" in a lot of ways, like it being able to modify any surface property, and that it can also work on transparent objects, but also much more complex. Also, obviously, doesn't work in Unity 2017 or using the built in rendering paths.

    I'm not sure it yet supports masking objects to be excluded from decals, which is kind of a big missing features for any real use case, but maybe that's been added by now.

    If you do want to do some amount of lighting you can generate passable geometry normals from the depth texture alone. If you can convert the depth into a world space position (there are examples on the forum) and get the screen space derivatives of that you'll get a kind of low res world space normal. It doesn't work great on the edges of geometry though, that requires a more expensive shader and still won't be perfect. And it won't get you the normal maps of the original objects, that requires another pass of rendering every opaque object in the scene using a replacement shader, or deferred rendering which by its nature produces a camera normal texture. But it will give you something to do lighting on, and even potentially support decals with normal maps.
     
    Rahd likes this.
  3. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    Thank you , i'm using the colors :D , thought you can only change the colors in GPUI , but i will change it to a float4 Now.
    geometry normals from the depth texture is a really nice idea will try to find that example .
    and i will give the stencils a go as well .
    will post results soon . thanks a lot .
     
  4. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    hey @bgolus
    I did add a stencil mask, removed the colors for a float4, and added a command buffer (grab pass) to simulate a secondary light without breaking the GPU instancing.
    I removed all if statements and any Discard . and switched to alpha blending and step.
    Now the shader works very fast on Android.


    However, I know that GPU instancing works on OpenGL ES 3.0+.

    so I tried to add dynamic batching as a plus for some older phones or GPUs like Adreno 3xx.
    but the issue now :
    how do I have different Tiles without the Atlas_Instance? ... maybe cubes with each one of them assigned to one of the tiles in the uvs.
    and how do I feed the object vertex position to the UV2? since dynamic batching kills that.
    object vertex position seems to be the number one issue right now, and I saw you talk about it before, but no examples. i would really appreciate if you could help .
    Thanks in advance
     
    Last edited: Apr 19, 2020
  5. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    Dynamic batching also shouldn't remove the extra UV sets. What it removes is the per-object transform information, which is the important part that's missing, not really the vertex positions.

    You need your atlas offset (which can easily be encoded into a UV set) and you need the full per object transform matrix, which would need another 3 more UV sets. That means at a minimum you need to use 4 Vector4 UV sets (a 4x3 matrix and the uv scale & offset), and you need to store the same set of data in all of the vertices of a decal "box". You can set that data on a basic box mesh with additionalVertexStreams. It might reduce the efficiency of dynamic batching. Realistically it's not really that much worse than a normal mesh with normals, tangents, and two Vector2 UV sets (which is a not uncommon amount of data for a mesh to have). Plus you're only rendering boxes.
     
    Rahd likes this.
  6. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    Hi thanks for the prompt response, i took a look at https://github.com/slipster216/VertexPaint , when you said additionalVertexStreams. the tool has baking options for Pivot and rotation
    ( transform.localPosition and transform.localRotation.eulerAngles )
    so I kinda have an idea of how to pass info to uvs, the question is what to bake?

    my vertex shader goes like this :

    o.pos = UnityObjectToClipPos(v.vertex);
    o.screenPos = ComputeScreenPos(o.pos);
    o.ray = UnityObjectToViewPos(v.vertex).xyz * float3(-1,-1,1);
    o.ray = lerp(o.ray, v.texcoord, v.texcoord.z != 0);
    o.grabPos = ComputeGrabScreenPos(o.pos);

    v.vertex seams to be the main thing breaking the shader ... I'm really clueless tbh ...
    what other information I need to pass to uv2 uv3 .. etc and how do I extract them into the vertex shader
     
    Last edited: Apr 20, 2020
  7. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    All of that code is going to work perfectly regardless of if you’re using instancing or batching. That’s not the problem.* The problem is going to be what’s happening in the fragment shader.

    * The caveat being I don’t know what that
    v.texcoord.z
    thing is about. Where is this code originally coming from? That looks like something from a post processing shader that’s using a custom mesh with the view dir encoded into the vertex UVs.

    My expectation is if you were to change your shaders to just output the calculated world space position you'd see everything is working fine with batching vs instancing. The main issue should be the fact the decal projection takes the world space position and transforms it into local object space... something that doesn't exist anymore for batched meshes. Hence why you need to store that data on the mesh itself. That would have to be passed from the vertex shader to the fragment shader, but there's nothing super complicated about how to go about that. Just copy the data straight and move on.
    Code (csharp):
    1. o.scaleOffset = v.texcoord; // the atlas uv scale & offset
    2. o.worldToObject0 = v.texcoord1; // the first row of the world to object matrix
    3. o.worldToObject1 = v.texcoord2; // second row
    4. o.worldToObject2 = v.texcoord3; // third row
    In your fragment shader there should be code using the
    unity_WorldToObject
    matrix after sampling the depth texture and calculating the world position. That just needs to be updated to use the matrix being passed from the vertex, which can be extracted into a usable matrix like this:
    Code (csharp):
    1. float4x4 worldToObjectMatrix = float4x4(i.worldToObject0, i.worldToObject1, i.worldToObject2, 0, 0, 0, 1);
    (This may need a
    transpose
    on it, I honestly can never remember if the matrix constructor in hlsl is row or column based, plus Unity does some hidden transposes behind the scenes.)
     
    Rahd likes this.
  8. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    the v.texcoord.z is coming from this example , https://gist.github.com/bshishov/3eaa056d1560584ff82c1b96ee2f20fd
    removed it still the same results :p



    I changed the shader the way you told me, it works fine with GPU-instancing, but once the Dynamic batching is on it will render only one Decal out of 10. same issue as before .
    i know this is on my part as your code is perfect , so here is most of my shader (excluded the Texture and lighting Code) .
    I don't use the UV of the mesh at all for the texture. so I skipped that i use the objpos
    maybe that's the issue ?
    changed the _Atlas_Instance to a static one for testing used to be
    float4 _Atlas_Instance = UNITY_ACCESS_INSTANCED_PROP(__AtlasTile_arr, __AtlasTile);


    struct v2f {
    float4 pos : SV_POSITION;
    float4 screenPos : TEXCOORD0;
    float4 grabPos : TEXCOORD1;
    float3 ray : TEXCOORD2;
    float4 worldToObject0 : TEXCOORD3;
    float4 worldToObject1 : TEXCOORD4;
    float4 worldToObject2 : TEXCOORD5;

    };

    v2f vert(appdata_full v)
    {

    v2f o;
    o.pos = UnityObjectToClipPos(v.vertex);
    o.screenPos = ComputeScreenPos(o.pos);
    o.ray = UnityObjectToViewPos(v.vertex).xyz * float3(-1,-1,1);
    o.worldToObject0 = v.texcoord1;
    o.worldToObject1 = v.texcoord2;
    o.worldToObject2 = v.texcoord3;
    o.grabPos = ComputeGrabScreenPos(o.pos);


    return o;
    }
    fixed4 frag(v2f i): Color
    {


    i.ray = i.ray * (_ProjectionParams.z / i.ray.z);
    half3 normaldepth;
    float depth;
    DecodeDepthNormal(tex2D(_CameraDepthNormalsTexture, i.screenPos.xy / i.screenPos.w), depth, normaldepth);
    float4 prjPos = float4(i.ray * depth ,1);
    float3 worldPos = mul(unity_CameraToWorld, prjPos).xyz;
    float4x4 worldToObjectMatrix = float4x4(i.worldToObject0, i.worldToObject1, i.worldToObject2, 0, 0, 0, 1);
    float4 objPos = mul(worldToObjectMatrix , float4(worldPos, 1));
    clip(float3(0.5, 0.5, 0.5) - abs(objPos.xyz) );
    float4 _Atlas_Instance = float4(1,1,1,0);
    _AtlasCount =_AtlasCount + 0.05;
    half2 uvx = (objPos.xz + 0.5);
    half _tileSize = 1 / _AtlasCount*_AtlasCount;
    half2 uv = float2((fmod(uvx.x/_Atlas_Instance.z/_AtlasCount,_tileSize)+_tileSize*_Atlas_Instance.x),fmod(uvx.y/_Atlas_Instance.z/_AtlasCount,_tileSize)+_tileSize*_Atlas_Instance.y);

    float4 Tex = tex2D(_MainTex, uv) ;

    Really thank you, you are like a shader god on the unity forum :p.
     
    Last edited: Apr 21, 2020
  9. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    More Testing revealed that it has something to do with the Render Queue
    the shader is Alpha Blended, not AlphaTest is that the issue , read that Batching mess things up with transparent objects



    https://imgur.com/a/hSJO0Pe
     
  10. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    Please use
    [code][/code]
    tags.

    Yep, right there in the comment for that code it mentions it's for post processing.

    And yep. It'll just be an issue going forward as you'll suddenly need to be using the texcoords for storing data and that would have been getting triggered falsely as you wouldn't be storing the data it's expecting.

    Frak no, my code is frequently wrong. :D

    That said, are you sure you're getting proper matrices setup on each mesh? (Also, you should stick a
    nointerpolation
    in front of the
    worldToObject
    lines. It won't fix anything here, but might remove some minor noise / distortion in the decals once they do show up.) It'd be worthwhile to have your shader just return the value from one of the
    worldToObject
    interpolated values to see if each one is getting a unique value or not. The good news is at least one is (sometimes) showing up, so my code wasn't totally off. ;)
     
    Rahd likes this.
  11. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    The geometry, alpha test vs transparency queues are more about the order in which things should be rendered, not about what their actual blend mode is. Generally you do want all fully opaque (and z writing) objects first, then alpha test objects after that (they're still opaque and do zwrite, but they're slower to render), and then transparent objects sorted back to front afterwards. Decals are weird in that they're transparent, but generally get rendered as an "opaque" after alpha tested geometry, but before transparency as you don't want them to sort on top of transparent objects. It's not uncommon for decals to be right at the end of the opaque queue range (2500) so that all other opaques are rendered first. You might try that. Not really sure why 3000 and 3002 make a difference though as there's no functional difference for queues between 2501 and 4000.
     
    Rahd likes this.
  12. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    Worldtoobject returned this from all 3 of them

    Not sure what do you mean with Mesh Setup ?
     
    Last edited: Apr 21, 2020
  13. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    Ok, I think it worked!!!!!!!!!!
    I was baking localToWorldMatrix Not the worldToLocalMatrix
    for anyone interested in this use vertex painter pro
    and head to BakePivot.cs -------> DoBakePivot
    replace the first Uv0 case with this

    Code (CSharp):
    1.  case PivotTarget.UV0:
    2.                {
    3.                   InitBakeChannel(BakeChannel.UV0, jobs);
    4.                   foreach (PaintJob job in jobs)
    5.                     {   Matrix4x4 Matrix = job.meshFilter.transform. worldToLocalMatrix  ;
    6.                         Vector4 lp = new Vector4 ( Matrix.m00 ,Matrix.m01    ,Matrix.m02 ,Matrix.m03) ;
    7.                         job.stream.SetUV0(lp , job.verts.Length) ;
    8.                      EditorUtility.SetDirty(job.stream);
    9.                      EditorUtility.SetDirty(job.stream.gameObject);
    10.        
    11.                  
    12.                     }
    13.                     InitBakeChannel(BakeChannel.UV1, jobs);
    14.                     foreach (PaintJob job in jobs)
    15.                     {
    16.                         Matrix4x4 Matrix = job.meshFilter.transform. worldToLocalMatrix ;
    17.                         Vector4 lp =new Vector4 ( Matrix.m10 ,Matrix.m11    ,Matrix.m12 ,Matrix.m13) ;
    18.                         job.stream.SetUV1(lp , job.verts.Length) ;
    19.                         EditorUtility.SetDirty(job.stream);
    20.                         EditorUtility.SetDirty(job.stream.gameObject);
    21.  
    22.                     }
    23.  
    24.                     InitBakeChannel(BakeChannel.UV2, jobs);
    25.                     foreach (PaintJob job in jobs)
    26.                     { Matrix4x4 Matrix = job.meshFilter.transform. worldToLocalMatrix ;
    27.                         Vector4 lp =new Vector4 ( Matrix.m20 ,Matrix.m21    ,Matrix.m22 ,Matrix.m23) ;
    28.                         job.stream.SetUV2(lp , job.verts.Length) ;
    29.                         EditorUtility.SetDirty(job.stream);
    30.                         EditorUtility.SetDirty(job.stream.gameObject);
    31.  
    32.                     }
    33.  
    34.                     InitBakeChannel(BakeChannel.UV3, jobs);
    35.                     foreach (PaintJob job in jobs)
    36.                     { Matrix4x4 Matrix = job.meshFilter.transform.localToWorldMatrix ;
    37.                         Vector4 lp =new Vector4 ( Matrix.m30 ,Matrix.m31    ,Matrix.m32 ,Matrix.m33) ;
    38.                         job.stream.SetUV3(lp , job.verts.Length) ;
    39.                         EditorUtility.SetDirty(job.stream);
    40.                         EditorUtility.SetDirty(job.stream.gameObject);
    41.  
    42.                     }
    43.                   break;
    44.                }
    it's a fast solution for now, but I will make a script for this later and using a great Editor tool like vertex painter, renders the Decals static, you can move them but the projection won't. so a runtime Uv baker will make this a good solution
    I will remove the forth vector4 m30-m33
    and do more changes .
    i hope this helps someone


    @bgolus Thank you Thank you Thank you Thank you Thank you Thank you Thank you !!
    you saved me a lot of time and your help means the world to me now!
    2 days ago I had no idea what's the World to object ... now I know more thanks to you .
    cheers mate :)
     
    Last edited: Apr 21, 2020
  14. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    @bgolus Ok, I made more Modifications/optimizing To the shader and created a script for Batching.
    Optimizing looking at the shader's generated code I noticed the following :
    clip(float3(0.5, 0.5, 0.5) - abs(objPos.xyz) ); = 1 more Texture and 1 More branch
    So to remove that I did this :
    since the shader is Transparant I added that to the Alpha channel like this :
    c.a = TextureTransparency - any(step (float3(0.5, 0.5, 0.5) - abs(objPos.xyz), float3(0.0, 0.0, 0.0)));
    this brought up some Stencil issue , but was later fixed .

    Next thing is The fmod For the Uvs, Each fmod Cost One branch, So looking at some of your old posts @bgolus
    I stole this :

    float Fmod_NoB(float x, float y)
    {

    return (x - y * floor(x/y));
    }

    But this can Bring some Tiling issues like black lines at the edge of decals
    so to fix it, an offset the x and y like this :

    float Fmod_NoB(float x, float y)
    {
    x= x+0.015;
    y= y+0.015;
    return (x - y * floor(x/y));
    }

    Result :
    Before // Stats: 111 math, 4 textures 4 branch

    After // Stats: 111 math, 3 textures

    Now the shader :
    it's 2 Shaders Now
    one With GPU_instancing the other is with dynamic batching
    and using this script, i will find out if the phone supports GPU_instancing or not i will set the shader to one of them :

    script :

    Code (CSharp):
    1. public    enum Batching
    2.     {
    3.         GPU_Instancing,
    4.         Dymanic_Batching
    5.    
    6.     };
    7. public Batching BatchingType   ;
    8.     private void Bake()
    9.     {    if (Bake_Now) {
    10.  
    11.  
    12.  
    13.  
    14.  
    15.  
    16.  
    17.  
    18.  
    19.  
    20.             float cellSize = (1 / TileCount);
    21.  
    22.             Vector3 m_Tile = new Vector3 (Tilex * cellSize, Tiley * cellSize, 1);
    23.  
    24.  
    25.             MeshRenderer MeshObj =    GetComponent<MeshRenderer> ();
    26.             MeshFilter MeshFilter = GetComponent<MeshFilter>();
    27.             if (SystemInfo.supportsInstancing && BatchingType== Batching.GPU_Instancing) {
    28.                 MeshFilter.sharedMesh = DecalMesh;
    29.  
    30.                 MeshObj.sharedMaterial.shader = GPU_Instancing_Shader;
    31.                 // Create property block and set to the mesh.
    32.                 MaterialPropertyBlock propertyBlock = new MaterialPropertyBlock ();
    33.                 propertyBlock.SetVector ("__AtlasTile", m_Tile);
    34.            
    35.                 MeshObj.SetPropertyBlock (propertyBlock);  
    36.  
    37.                
    38.  
    39.  
    40.             }  else if (BatchingType==Batching.Dymanic_Batching) {
    41.                 MeshObj.sharedMaterial.shader = Dymanic_Batching_Shader;
    42.  
    43.                 MeshFilter Meshf =    GetComponent<MeshFilter> ();
    44.                 Vector3[]    verts = Meshf.sharedMesh.vertices;
    45.                 Matrix4x4 Matrix = Meshf.transform.worldToLocalMatrix;
    46.                 SetUV0 (new Vector4 (m_Tile.x, m_Tile.y, m_Tile.z, 1), verts.Length);
    47.  
    48.                 SetUV1 (new Vector4 (Matrix.m00, Matrix.m01, Matrix.m02, Matrix.m03), verts.Length);
    49.  
    50.                 SetUV2 (new Vector4 (Matrix.m10, Matrix.m11, Matrix.m12, Matrix.m13), verts.Length);
    51.  
    52.                 SetUV3 (new Vector4 (Matrix.m20, Matrix.m21, Matrix.m22, Matrix.m23), verts.Length);
    53.  
    54.  
    55.            
    56.             }
    57.             Bake_Now = false;
    58.         }
    59.     }
    SetUV can be found on https://github.com/slipster216/VertexPaint

    I will post later some mobile build Testing Results , on Samsung S5 , Huawei mate , Note edge , Moto X(2013)
     
  15. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    clip()
    isn't a branch.
    any()
    and
    step()
    can both be a branch (but usually aren't).
    Technically all 3 are conditionals, and it's up to the compiler to decide if it should be a branch or not ... and 99.99% of the time it's not going to be a branch.

    Also you may actually want to keep the
    clip()
    too, as it may have some performance benefits if you're near a large decal and looking at the fully transparent part compared to just relying on transparency alone. That'll be highly dependent on the hardware though (and shouldn't hurt on hardware that it doesn't benefits). At least as long as your shader is using
    ZWrite Off
    , which it should be.

    uh, what?
    fmod()
    it as much a "branch" as
    any()
    or
    step()
    . That snippet of code isn't a "branchless
    fmod()
    ", it's a "have the results match GLSL
    mod
    ", which is a different but just as valid way of calculating a modulo.

    Where are these stats coming from? They honestly don't make any sense to me. I don't understand why changing the code you presented would remove a texture sample, or why there were any branches to begin with. If anything I would expect it to be slower now.

    You might be better off with a single shader and swap behaviors with a
    #pragma multi_compile
    global keyword.
     
  16. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    Now i'm intrigued to learn more about this !! confused to say the least .

    the Stats are from this :

    here is how the shader was and how it become :








    I was really following this guide's advice :
    Code (CSharp):
    1. Instead of using alpha testing or discard instructions to kill pixels, use alpha blending with alpha set to zero.
    https://developer.apple.com/library...ProgrammingGuide/Performance/Performance.html

    since the game is for phones, I kept reading about how Alpha testing is very slow on mobiles, especially phones.

    to have a more in-depth look at the code For example :
    Code (CSharp):
    1.     clip(float3(0.5, 0.5, 0.5) - abs(objPos.xyz)  );

    Becomes in the Compiled shader :

    Code (CSharp):
    1.  
    2. x_26 = (vec3(0.5, 0.5, 0.5) - abs(tmpvar_25.xyz));
    3.   bvec3 tmpvar_27;
    4.   tmpvar_27 = lessThan (x_26, vec3(0.0, 0.0, 0.0));
    5.   if (any(tmpvar_27)) {
    6.     discard;
    7.   };
    After changing it to :
    Code (CSharp):
    1. c.a = TextureTransparency - any(step (float3(0.5, 0.5, 0.5) - abs(objPos.xyz), float3(0.0, 0.0, 0.0)));
    it Becomes in the Compiled shader :
    Code (CSharp):
    1.  c_10.w = (Tex_11.w * tmpvar_37);
    2.   bvec3 tmpvar_38;
    3.   tmpvar_38 = bvec3(vec3(greaterThanEqual (vec3(0.0, 0.0, 0.0),
    4.     (vec3(0.5, 0.5, 0.5) - abs(tmpvar_28.xyz))
    5.   )));
    1 Texture and 1 branch was removed from the stats of the complied code same as
    Removing the Clip without replacing it will lower the Texture count to 3 and removes 1 Branch



    I know that I use 3 Textures ( Atlas , Depth and command buffer grabpass )

    so when I saw 4 Textures I started removing lines from the shader until the Compiled Shader Dopped to 3 textures , And the line causing that was Clip()!!

    and removing the if statements for the Cutoff removed 1 branch
    now it looks like this :
    Code (CSharp):
    1. c.a   = Tex.a * step(_CutOff , Tex.a);      
    in the compiled shader :

    Code (CSharp):
    1.   highp float tmpvar_37;
    2.   tmpvar_37 = float((Tex_11.w >= _CutOff));
    that was basically my process, same goes for Fmod ... each fmod will add 1 branch to the branches in the compiled shader ...
    the old Shader code is :
    Code (CSharp):
    1.     half2 uvx = (objPos.xz + 0.5);
    2.                   half _tileSize = 1 / _AtlasCount*_AtlasCount;
    3.         half2  uv = float2((fmod(uvx.x/_Atlas_Instance.z/_AtlasCount,_tileSize)+_tileSize*_Atlas_Instance.x),fmod(uvx.y/_Atlas_Instance.z/_AtlasCount,_tileSize)+_tileSize*_Atlas_Instance.y);
    the code above Compiles into :

    Code (CSharp):
    1. highp vec2 tmpvar_28;
    2.   tmpvar_28 = (tmpvar_25.xz + 0.5);
    3.   uvx_12 = tmpvar_28;
    4.   highp float tmpvar_29;
    5.   tmpvar_29 = ((1.0/(xlat_mutable_AtlasCount)) * xlat_mutable_AtlasCount);
    6.   _tileSize_11 = tmpvar_29;
    7.   highp float tmpvar_30;
    8.   tmpvar_30 = (((uvx_12.x / __AtlasTile.z) / xlat_mutable_AtlasCount) / _tileSize_11);
    9.   highp float tmpvar_31;
    10.   tmpvar_31 = fract(abs(tmpvar_30));
    11.   mediump float tmpvar_32;
    12.   tmpvar_32 = (tmpvar_31 * _tileSize_11);
    13.   mediump float tmpvar_33;
    14.   if ((tmpvar_30 >= 0.0)) {
    15.     tmpvar_33 = tmpvar_32;
    16.   } else {
    17.     tmpvar_33 = -(tmpvar_32);
    18.   };
    19.   highp float tmpvar_34;
    20.   tmpvar_34 = (((uvx_12.y / __AtlasTile.z) / xlat_mutable_AtlasCount) / _tileSize_11);
    21.   highp float tmpvar_35;
    22.   tmpvar_35 = fract(abs(tmpvar_34));
    23.   mediump float tmpvar_36;
    24.   tmpvar_36 = (tmpvar_35 * _tileSize_11);
    25.   mediump float tmpvar_37;
    26.  
    27.   if ((tmpvar_34 >= 0.0)) {
    28.     tmpvar_37 = tmpvar_36;
    29.   } else {
    30.     tmpvar_37 = -(tmpvar_36);
    31.   };
    32.   highp vec2 tmpvar_38;
    33.   tmpvar_38.x = (tmpvar_33 + (_tileSize_11 * __AtlasTile.x));
    34.   tmpvar_38.y = (tmpvar_37 + (_tileSize_11 * __AtlasTile.y));
    35.   uv_10 = tmpvar_38;
    36.  
    And that's a lot of lines if you ask me :D

    now changing it to :
    Code (CSharp):
    1.     half2 uvx = (objPos.xz + 0.5);
    2.                   half _tileSize = 1 / _AtlasCount*_AtlasCount;
    3.         half2  uv = float2((Fmod_NoB(uvx.x/_Atlas_Instance.z/_AtlasCount,_tileSize)+_tileSize*_Atlas_Instance.x),Fmod_NoB(uvx.y/_Atlas_Instance.z/_AtlasCount,_tileSize)+_tileSize*_Atlas_Instance.y);
    4.          
    with Fmod_NoB Being :
    Code (CSharp):
    1.     float  Fmod_NoB(float x,   float y)
    2. {
    3. x= x+0.015;
    4. y= y+0.015;
    5.   return (x - y * floor(x/y));
    6.  
    7. }
    Compiles into :
    Code (CSharp):
    1.  highp vec2 tmpvar_28;
    2.   tmpvar_28 = (tmpvar_27.xz + 0.5);
    3.   uvx_14 = tmpvar_28;
    4.   highp float tmpvar_29;
    5.   tmpvar_29 = ((1.0/(xlat_mutable_AtlasCount)) * xlat_mutable_AtlasCount);
    6.   _tileSize_13 = tmpvar_29;
    7.   highp float x_30;
    8.   x_30 = ((uvx_14.x / __AtlasTile.z) / xlat_mutable_AtlasCount);
    9.   highp float y_31;
    10.   y_31 = _tileSize_13;
    11.   x_30 += 0.015;
    12.   y_31 += 0.015;
    13.   highp float x_32;
    14.   x_32 = ((uvx_14.y / __AtlasTile.z) / xlat_mutable_AtlasCount);
    15.   highp float y_33;
    16.   y_33 = _tileSize_13;
    17.   x_32 += 0.015;
    18.   y_33 += 0.015;
    19.   highp vec2 tmpvar_34;
    20.   tmpvar_34.x = ((x_30 - (y_31 *
    21.     floor((x_30 / y_31))
    22.   )) + (_tileSize_13 * __AtlasTile.x));
    23.   tmpvar_34.y = ((x_32 - (y_33 *
    24.     floor((x_32 / y_33))
    25.   )) + (_tileSize_13 * __AtlasTile.y));
    26.   uv_12 = tmpvar_34;
    27.  

    my question is, the Stas on the Compiled shader code Wrong? or what?

    the stats show a lot of math ," 111 " i know that a lot of math calculation in the shader can be reduced , is there any benefits in Reducing that?
    An example is the Fmod:
    replacing the
    Code (CSharp):
    1.     float  Fmod_NoB(float x,   float y)
    2. {
    3. x= x+0.015;
    4. y= y+0.015;
    5.   return (x - y * floor(x/y));
    6.  
    7. }
    to
    Code (CSharp):
    1.     float  Fmod_NoB(float2 x )
    2. {
    3. x= x+0.015;
    4.   return (x.x - x.y * floor(x.x/x.y));
    5.  
    And


    Code (CSharp):
    1.    half2  uv = float2((Fmod_NoB(uvx.x/_Atlas_Instance.z/_AtlasCount,_tileSize)+_tileSize*_Atlas_Instance.x),Fmod_NoB(uvx.y/_Atlas_Instance.z/_AtlasCount,_tileSize)+_tileSize*_Atlas_Instance.y);
    2.        
    To :

    Code (CSharp):
    1.  half2  uv = float2((Fmod_NoB(float2 (uvx.x/_Atlas_Instance.z/_AtlasCount,_tileSize))+_tileSize*_Atlas_Instance.x),Fmod_NoB(float2  (uvx.y/_Atlas_Instance.z/_AtlasCount,_tileSize))+_tileSize*_Atlas_Instance.y);
    2.        
    Will reduce the math number on the compiled shader to 109 ! o_O
    I know that my shader is plagued with not so useful math ... I will keep reducing and see if there is any gain to this.




    did I optimized the shader or just the same if not worst ...?

    I promised a benchmark I will post the results of the old shader and the new one soon.

    about the 2 shaders, (swap behaviors with a #pragma multi_compile) that's a better solution !!
    Working on it right now Thank you.
     
    Last edited: Apr 22, 2020
  17. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    Now Testing :
    Subject : Moto X 2013 GPU Adreno 320 2GB RAM

    Test 1 is between :
    Old Shader, Depth only, No Specular Lighting, With the old Clip and fmod No dynamic batching.

    New shader: Depth Normals, Specular Lighting, Without the Clip or the fmod with dynamic batching..

    Test 1 : No shadows, Objects have unity mobile diffuse shader , No glass shaders on the cars , or Cutoff on the fence. stencil mask enabled (Objects are not affected by decal since we are using the mobile diffuse shader).
    Test 2 : No shadows, Objects My alpha lit Fake Brdf shader , with glass shaders on the cars,and Cutoff on the fence.
    Test 3 : Soft shadows, Objects My alpha lit Fake Brdf shader , with glass shaders on the cars,and Cutoff on the fence.


    Each Test is a New apk Build , where i would let the phone cool for 5 mins, then run the test after a fresh build
    the tests did not run 6 times only i did run them more than I can count just to make sure it's not a one time thing.


    Results :
    Try to view the image in full to see the details
     
    Last edited: Apr 22, 2020
    bgolus likes this.
  18. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    That documentation lacks some minor nuance to why
    clip()
    or
    discard
    is bad. (Note: The
    clip()
    function just calls
    discard
    if the value passed to it is less than zero.) Usually you’re using
    discard
    on an opaque material that is also doing ZWrite. This is super bad for mobile GPUs for a lot of reasons. If you dig deeper into documentation from actual GPU makers, most of them recommended using
    discard
    on fully transparent areas for any shader that doesn’t use depth writes (
    ZWrite Off
    ).

    Technically yes. The shader code it shows is the HLSL translated into GLSL. Until it’s compiled (which for Android happens on the device itself, so you’d need to use the device specific profiling tools to see what the compiled shader looks like) you don’t know if an
    if
    is actually a branch or not. That said, it may still compile to faster code if they’re avoided on old GLES devices, like that Adreno 320 (ooph, that’s old).

    It also never occurred to me how bad
    fmod
    translates from HLSL to GLSL. It makes sense when I think about it, but yeah, your modifications there will likely be faster overall.

    I still don’t understand why you have one less texture read overall though. *shrug*

    I also can’t help but notice that the new shader is visibly darker than the old one, so it makes me think your changes did more than just optimize and there were some functional changes as well that may explain the perf increase.
     
    Rahd likes this.
  19. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    the funny thing is the new darker shader is more expensive :D
    the old shader has a very poor Lighting code, and won't be affected by secondary lights, and lack a lot of functions
    have a look :


    And I just finished Testing on the Huawei Mate 10 Lite Mali-T830
    Using GPU instancing or dynamic batching Will Have 10 FPS Drop
    While Using Non-Batched Decals Works Faster ... I'm Lost !! this is why I hate phones So much
    I'm Gonna use Adreno profiler for a better look on the Moto X
    The Fps counter is not that Great ...
     
    Last edited: Apr 22, 2020
    bgolus likes this.
  20. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    Yeah, packing & transferring data between the vertex and fragment has a cost. Draw calls are less evil than people think.
     
    Rahd likes this.
  21. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    ok here is a very simple test :
    Create a Unlit shader using unity
    and add clip( 0.1-0.3); in the frag

    Code (CSharp):
    1. Shader "Unlit/TESTshader"
    2. {
    3.     Properties
    4.     {
    5.         _MainTex ("Texture", 2D) = "white" {}
    6.     }
    7.     SubShader
    8.     {
    9.         Tags { "RenderType"="Opaque" }
    10.         LOD 100
    11.  
    12.         Pass
    13.         {
    14.             CGPROGRAM
    15.             #pragma vertex vert
    16.             #pragma fragment frag
    17.             // make fog work
    18.             #pragma multi_compile_fog
    19.        
    20.             #include "UnityCG.cginc"
    21.  
    22.             struct appdata
    23.             {
    24.                 float4 vertex : POSITION;
    25.                 float2 uv : TEXCOORD0;
    26.             };
    27.  
    28.             struct v2f
    29.             {
    30.                 float2 uv : TEXCOORD0;
    31.                 UNITY_FOG_COORDS(1)
    32.                 float4 vertex : SV_POSITION;
    33.             };
    34.  
    35.             sampler2D _MainTex;
    36.             float4 _MainTex_ST;
    37.        
    38.             v2f vert (appdata v)
    39.             {
    40.                 v2f o;
    41.                 o.vertex = UnityObjectToClipPos(v.vertex);
    42.                 o.uv = TRANSFORM_TEX(v.uv, _MainTex);
    43.                 UNITY_TRANSFER_FOG(o,o.vertex);
    44.                 return o;
    45.             }
    46.        
    47.             fixed4 frag (v2f i) : SV_Target
    48.             {
    49.                 // sample the texture
    50.                 fixed4 col = tex2D(_MainTex, i.uv);
    51.  
    52.                 clip( 0.1-0.3);
    53.                 // apply fog
    54.                 UNITY_APPLY_FOG(i.fogCoord, col);
    55.                 return col;
    56.             }
    57.             ENDCG
    58.         }
    59.     }
    60. }
    61.  
    on the compiled code it gives :

    Code (CSharp):
    1.  // Stats for Vertex shader:
    2. //         gles: 0 math, 2 texture
    3. Pass {
    4.   Tags { "RenderType"="Opaque" }
    5.   //////////////////////////////////
    6.   //                              //
    7.   //      Compiled programs       //
    8.   //                              //
    9.   //////////////////////////////////
    10. //////////////////////////////////////////////////////
    11. No keywords set in this variant.
    12. -- Vertex shader for "gles":
    13. // Stats: 0 math, 2 textures
    Clip = 1 texture lol
    and if you use a shader properties it will add a branch i think
     
  22. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    Bad test. You can't use hard coded numbers for both values there as the shader compiler will just optimize that entire line away. Because 0.1 - 0.3 is -0.2, That'll get converted to just:
    discard;


    Try:
    clip(col.a - 0.3);
     
  23. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    // Stats: 2 math, 2 textures, 1 branches
     
  24. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    Yeah ... no idea why that's 2 textures.
     
  25. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    here is a better one


    Code (CSharp):
    1. if (col.a > 0.3 ){
    2.                 discard;
    3.                 }
    // Stats: 1 math, 2 textures, 1 branches
    it will be compiled into :

    if ((tmpvar_1.w > 0.3)) {
    discard;
    };
    So Clip or discard adds one Texture and using shader properties like Texture or a Float will add a branch.
    discard is the one Adding a Texture, I think it has to do with the Build Target? I really don't know
     
  26. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    More accurately, a
    clip
    or
    discard
    will incorrectly increase the texture count stat by one and incorrectly lists all
    if
    statements as branches.
     
    Rahd likes this.
  27. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    yeah my terrain shader had like 12 texture samplers (triplanar thanks for the examples on github )
    and on the complied code it shows No texture count at all.
    change some lines in the shader and it will show 14 textures...
    and the shader is a surf shader, and in all variants, the compiler just bugs out in the stats line.
     
  28. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    Hi @bgolus
    I hope this is not too much to ask
    So I had this not so great idea to use what I have learned from you in my terrain shader.
    the shader is a triplanar terrain Based on :
    your Github examples
    farfarer Terrain shader
    and Jason Booth height map blend shaders.

    here is the problem I'm Facing :
    the terrain on mobile sucks, so I started converting my terrain into mesh terrain,
    however it is still slow, so I switched to vertex colors to save up 1 sampler
    then I found out about the triplanar shader and how it looks so great!
    but using 4 tile textures on a triplanar shader will cost 12 Samplers
    I tried to reduce the texture samplers by atlasing , it works but it will lower the frame rate since we gonna reuse the sampler again ...

    so here is what i have come up with , why not assign the uvs like you do in maya or blender but for each face /tri you paint on !!
    doing this with UVs calculation into mesh directly is a process my brain won't handle!
    so I tried doing it on the shader side
    here is the results and the last problem I'm facing.
    using the same concept of the decals, but instead of flooding the whole mesh I would pass the atlas info into the UV0 of the tris my mouse Raycast is hitting.
    Code (CSharp):
    1. //hitcam is the Raycast hit
    2. //m_Tile is the atlas info
    3. //Raduis is the affected area near to the triangleindex
    4. OnHitGiveUV  (  hitcam ,Raduis, m_Tile);
    5.  
    then inside the shader, i would read that infos from UV0
    and pass them into this :

    Code (CSharp):
    1.  
    2.  
    3.  
    4. //_TileSize  is the scaling of the texture
    5.  
    6.  
    7.  
    8.  
    9. fixed4 GetUvfromAtlas(   float4  _Atlas_TexelSize , float4 _Atlas_ST , float2 uvs , float2 _TileSize ,float4 Atlasinfo   ){
    10.  
    11.                     fixed4 finaluv;
    12.                 float2 TexelSizeFinal = (float2(_Atlas_TexelSize.x , _Atlas_TexelSize.y));
    13.                 float2 AtlasScale = ( TexelSizeFinal * Atlasinfo.xy );
    14.                 float2 uv0_Atlas = uvs * _Atlas_ST.xy + _Atlas_ST.zw;
    15.                 float2 AUV  = uv0_Atlas;
    16.                 float2  InvertedTilesSize = ( float2( 1,1 ) / _TileSize );
    17.                 float2 BUV = InvertedTilesSize;
    18.                 float2 Fracting  =   frac(AUV /BUV)*BUV;
    19.                 float2 AtlasScale2 = ( Atlasinfo.zw * TexelSizeFinal );
    20.                 float2 TilesResize = ( AtlasScale2 - AtlasScale );
    21.                 float2 Tiling = ( Fracting * TilesResize );
    22.                 float2 finalUV  = ( AtlasScale + ( Tiling * _TileSize ) );
    23.        
    24.        
    25.                 finaluv = float4( finalUV,0,0);
    26.                 return finaluv;
    27.  
    28.  
    29.  
    30.  
    31.             }
    here is the Runtime painting :

    now my shader is only 3 Texture samplers with one sampler for each side of the triplanar projection.and i dropped the whole Vertex colors splating.


    Code (CSharp):
    1. uv = i.worldPos.zy;
    2. uv =  GetUvfromAtlas( _Atlas_TexelSize  ,  _Atlas_ST ,   uv , _TileSizeSplat0 ,i.Atlasinfo  );
    3. albedoX =    tex2D(_Atlas, uv);
    4.  
    5. uv = i.worldPos.xz  ;
    6. uv =  GetUvfromAtlas( _Atlas_TexelSize  ,  _Atlas_ST ,   uv , _TileSizeSplat0 ,i.Atlasinfo  );
    7. albedoY =    tex2D(_Atlas, uv);
    8.  
    9. uv = i.worldPos.xy;
    10. uv.x *= -1;
    11. uv =  GetUvfromAtlas( _Atlas_TexelSize  ,  _Atlas_ST ,   uv , _TileSizeSplat0 ,i.Atlasinfo  );
    12. albedoZ =    tex2D(_Atlas, uv);
    13.            
    However, you can see how horrible is the stretching UVs ...
    and that's my problem :( .
    other than losing the height map blending ,since you can't blend uvs. ( restored it on the Y projection but 4 samplers )
    is there a way to do remove the stretching?
    is there any other way around this 12 samplers issue other than this?
    thank you .
    Update :
    the issue is not the shader , it's how I'm pickup the triangles of the mesh
    I changed how the radius of the hit works , using a lerp between the old Uvs and the new ones :
    Code (CSharp):
    1. if (sqrMag < radius) {
    2.  
    3.  
    4.  
    5.     currentUVS [i] =  UVinfos    ;
    6.  
    7.             }
    to this :

    Code (CSharp):
    1. if (sqrMag < radius) {
    2.  
    3.  
    4.  
    5.  
    6.                 currentUVS [i] =Vector4.Lerp(currentUVS [i] ,UVinfos ,strength)   ;
    7.  
    8.             }
    9.             if (sqrMag < (radius/5)*4) {
    10.  
    11.  
    12.  
    13.  
    14.                 currentUVS [i] =  UVinfos    ;
    15.  
    16.             }
    so that the 4/5 of the circle where the ray hit would get the full uv info , and the outer 1/5 will get a blend and as I change the strength I can set the correct UVs without stretching.

    this works only when the tiles in the atlas texture are next to each other .... so I will just height blend them with the old Y-axis textures and back to 6 texture samplers
     
    Last edited: Apr 28, 2020
    bgolus likes this.
  29. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    If you really want to go down this rabbit hole, check out MicroSplat. @JasonBooth has managed some amazing feats in limiting the sample count even with a lot of very complex texturing features enabled by careful picking of the samples to blend between.
     
    Rahd likes this.
  30. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    yeah, I did check it out more than once, it uses Texture Arrays, that's the limitation of it.

    so far the 6 texture samplers shader is doing ok.
    the old scene (decals and fake pbr shader) + 48ktris terrain chunks is 18- 20 fps on that old MotoX


    I will recheck microsplat, see how it works, I will keep this updated if I managed to gain more fps. so that anyone would benefit from this post if there is anything good in it.
    thank you ben :)
     
    Last edited: Apr 28, 2020
  31. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    Added Something like stochastic Anti tiling But Without any more samplers and takes more work :
    I create 4 Variations of the same texture, then make them all tile inside One texture


    then using this Function in the shader to pick a random tile

    Code (CSharp):
    1.          float2  GetRandomUVs( float2 uvs , float2 _AtlasInfo )
    2. {
    3.            float2 scaledUV = uvs / _AtlasInfo;
    4.  
    5.              float2 indexOffset = floor(hash22(uvs ) * _AtlasInfo) / _AtlasInfo;
    6.  
    7.      return  frac(scaledUV + indexOffset);
    8.  
    9.  
    10. }
    Texture fetching:
    fixed4 Tile0 = tex2Dlod ( _Tile0 , float4( GetRandomUVs (uv ,_AtlasInfo_Tile0 ) ,0,0) );
    Example :

    Results:


    Bigger image : https://i.imgur.com/c5qCTL8.png
     
    Last edited: Apr 29, 2020
  32. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    hippocoder and Rahd like this.
  33. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    I would have to agree! I will try to introduce the technique linked in the height blending with the other layers
     
  34. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    Gonna Stick with the random uvs :p
    After texture calibration for the normal map, and blending.
    here is how it looks under a point light:

    Changed how the Uv tile painting works and now i only use the xy on Uv0
    so i was left with wz as doing nothing, so I implemented metalness and smoothness to work with wz on uv0 and the alpha-channel of the textures using smoothstep to do this :


    Code (CSharp):
    1. //_ColorAdd is propertie
    2. //Smoothnessmap is a smoothstep of the Albedo .a
    3. half Colorrange=     smoothstep ( Smoothnessmap,  _ColorAdd,1-Albedo .a) *_Color.a ;
    4. o.Albedo =(  lerp (Albedo  ,    (( Colorrange    ) *_Color  )  +( Albedo  * 1-Colorrange)  , i.Atlasinfo.w ));
     
    bgolus likes this.
  35. Rahd

    Rahd

    Joined:
    May 30, 2014
    Posts:
    324
    hi @bgolus
    I took some time processing this uv stuff i learned , and i started experementing with feeding pbr and color information into UV0 UV1 UV2 .
    (feeding negative values in soomthness and metallic makes the shader glow in the dark with different colors than the day time color...)
    then took a look at Jason booth vertex painter and I made this.


    overall performance is ok on mobile Thank you so much.