Search Unity

Cutout shader iOS performance

Discussion in 'Shaders' started by adslitw, Sep 27, 2018.

  1. adslitw

    adslitw

    Joined:
    Aug 23, 2012
    Posts:
    275
    I've just been having a chat with my artist and he suggested we just use a cutout shader for whole canopy of this tree:


    Now, based upon years of scouring Unity threads for help, I just kind of assumed it was a terrible idea, "cutout on mobile is bad", "mobile GPUs aren't optimised for it", etc. etc. So I thought I would cut a little skirt around the bottom bit, use a mobile diffuse shader for the opaque bits and a cutout shader for just the bits with any transparency. Huge performance saving surely!?



    It seems not, and I just wanted to see if anyone could suggest why? There's obviously a lot of extra draw calls due to the extra material, but I suppose my naive assumption was that use of a simpler shader would more than make up for it. Is the cutout shader only expensive when actually clipping/discarding, but otherwise for the opaque bits it's fine?

    First is a cutout material for whole canopy, second is 'optimised' using a diffuse shader for most of the canopy, tested on an iPhone 5S.

     
    IgorAherne likes this.
  2. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    You probably are sending more drawcalls in the optimized shader, are the tree batch? I'm not familiar with iphone, but it's generally bad due to mobile haveing tile based optimization, and it trash the local tile cache and early z, generally speaking
     
  3. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,339
    Cutout shaders are more expensive across the entire surface, including areas that have been clipped and aren't visible. The reason is related to how GPUs handle depth writing, sorting, and early rejection. When an entire polygon surface is opaque, to render to the depth buffer the GPU does not have to execute the fragment shader first, it can simply use the triangle itself to draw into the depth. For cutout shaders the fragment shader has to run before it knows where to write to the depth buffer and where not to. Any place a cutout shader covers you're still paying the cost of the full pixel shader being run, the results are simply being thrown away. Additionally, there are optimizations and compression the GPU can use on the depth buffer when only guaranteed fully opaque triangles are used. As soon as any cutout shader is rendered, these are disabled making every write and read to and from the depth buffer slower.

    However, you're also rendering a ton of triangles and objects. Your bottleneck may not be the pixel throughput, so any performance you've gained back in the fragment shader stage you may not be seeing because the bottleneck is in the vertex shader or CPU which you've increased.

    As a simplified example, you may have improved pixel shading by 2x, but if it only took 2ms to render the pixels, and 36ms to process the vertices, if you increased the vertex count by 10%, you're now looking at 1ms for the pixels and 39ms for the vertices.
     
  4. adslitw

    adslitw

    Joined:
    Aug 23, 2012
    Posts:
    275
    Great stuff @bgolus, thank you very much.
     
  5. astracat111

    astracat111

    Joined:
    Sep 21, 2016
    Posts:
    725
    Do these things and your game will run smoothly:

    More Performance
    1) Try GPU Instancing those trees if they all have exactly the same material/texture on them! It's far more performant than static batching!

    On your terrains, since Unity 2018.3, they now have Draw Instanced under terrain settings in it's inspector. Make sure that checkmark box is checked, and also make sure your materials have their GPU Instancing checkmark box checked.

    With Terrain also make sure your pixel error is set to something like 60 or above, which will reduce draw calls dramatically.

    I know since you're using mobile this is already relevant, but use OpenGLES 3.1 for integrated graphics and mobile, for some reason in my tests it's come out waaay faster than DirectX which seemed to chugg in every instance even with no shadows.

    Less Overdraw
    2) On mobile, it'd probably be best to draw less stuff! Always remember, the best best best way of gaining that frame rate you want is to reduce what's drawn on screen! This is best done by designing your levels cleverly.
    Do this with this amazing free asset store script:

    https://assetstore.unity.com/packages/tools/camera/per-layer-camera-culling-35100

    Just place it on your camera and create some layers for culling.

    I created the following layers:
    'Cull Short'
    'Cull Medium'
    'Cull Far'

    Cull Far distance I set to like 60, Cull Medium to 30 or so, and Cull Short to 10. Group your game objects and set their parent to these layers so you can quickly decide what groups of game objects to cull at what distance.

    Another thing to look into is an asset store script (that's also free) called 'Panorama 360'. Panoramas would for arena-type levels in where there's a clear center. You can look at older games to see how they used them. One I've been turning to is Sonic Adventure 1, which made use of panoramas in almost every scene:

    https://assetstore.unity.com/packages/tools/camera/360-panorama-capture-38755

    Remember to use less materials per scene. Using shared materials is what's needed. In the example of your tree, each tree uses the same shared material. For level editing a scene you could make like about 8 materials and make an entire world out of them.

    For Terrain, they can actually be culled! This means you can use Terrain in conjunction with the cull layers that you might create above using the free per layer camera culling script.

    Less Vertices
    3) Make use of billboarding/imposters. Just take a screenshot of your tree from the front, make it into a texture using photoshop or some image editing software, put an LOD script (unity's built in one works) on it and put this shader on your material for the tree. This will mainly reduce your vertices, but it won't necessarily reduce your overdraw. A tip: you do not need this texture/image to be big because it's going to be displayed from far away.

    Set Unity's LOD script to display the billboard of your trees prefab when it's about...I'd say try 95%/5% ratio to start.

    A tip, whatever you do do NOT use in a script "LookAt(myCamera.transform)" that will take up waaaay too much memory! Use a shader like this instead. I implemented the whole GPU Instancing checkbox so make sure to checkmark that.

    Code (CSharp):
    1. Shader "Astrah_Graphics/Billboard" {
    2.     Properties{
    3.         _Color("Main Color", Color) = (1,1,1,1)
    4.         _MainTex("Base (RGB) Trans (A)", 2D) = "white" {}
    5.     }
    6.  
    7.         SubShader{
    8.         Tags
    9.     {
    10.         "Queue" = "Transparent"
    11.         "IgnoreProjector" = "True"
    12.         "RenderType" = "Transparent"
    13.         "DisableBatching" = "True"
    14.     }
    15.         LOD 100
    16.  
    17.         CGPROGRAM
    18. #pragma multi_compile_instancing
    19. #pragma vertex vert
    20. #pragma surface surf Lambert alpha:fade
    21.  
    22.         struct appdata_t
    23.     {
    24.         float4 vertex : POSITION;
    25.         float3 normal : NORMAL;
    26.         half4 color : COLOR0;
    27.         float2 texcoord : TEXCOORD0;
    28.         float2 texcoord1 : TEXCOORD1;
    29.         float2 texcoord2 : TEXCOORD2;
    30.         float4 tangent : TANGENT;
    31.         UNITY_VERTEX_INPUT_INSTANCE_ID //
    32.     };
    33.  
    34.     sampler2D _MainTex;
    35.     /*sampler2D _BumpMap;*/
    36.     fixed4 _Color;
    37.  
    38.     struct Input {
    39.         float2 uv_MainTex;
    40.     };
    41.  
    42.     void Billboard(inout appdata_t v)
    43.     {
    44.         UNITY_SETUP_INSTANCE_ID(v);
    45.  
    46.         const float3 local = float3(v.vertex.x, v.vertex.y, 0); // this is the quad verts as generated by MakeMesh.cs in the localPos list.
    47.         const float3 offset = v.vertex.xyz - local;
    48.  
    49.         const float3 upVector = half3(0, 1, 0);
    50.         const float3 forwardVector = UNITY_MATRIX_IT_MV[2].xyz; // camera forward
    51.         const float3 rightVector = normalize(cross(forwardVector, upVector));
    52.  
    53.         float3 position = 0;
    54.         position += local.x * rightVector;
    55.         position += local.y * upVector;
    56.         position += local.z * forwardVector;
    57.  
    58.         v.vertex = float4(offset + position, 1);
    59.         v.normal = forwardVector;
    60.     }
    61.  
    62.     void vert(inout appdata_t v, out Input o)
    63.     {
    64.         UNITY_INITIALIZE_OUTPUT(Input, o);
    65.         Billboard(v);
    66.     }
    67.  
    68.     void surf(Input IN, inout SurfaceOutput o) {
    69.         fixed4 c = tex2D(_MainTex, IN.uv_MainTex) * _Color;
    70.         o.Albedo = c.rgb;
    71.         o.Alpha = c.a;
    72.     }
    73.     ENDCG
    74.     }
    75. }
    With all of these tips used, I got my game running flawlessly with post processing effects including a lot of use of bloom, depth of field, color grading, with realtime lighting and shadows and everything on an intel uhd 620.

    This is due to getting those draw calls under 100 consistently. With baked lighting and transparent shaders, I could also get my draw calls down to 30 in a big scene (from about 80-90)! Vertices will drop with billboarding/imposters and panoramas as well.
     
    Last edited: Feb 6, 2019
  6. adslitw

    adslitw

    Joined:
    Aug 23, 2012
    Posts:
    275
    This was 4 months ago. :)
     
  7. astracat111

    astracat111

    Joined:
    Sep 21, 2016
    Posts:
    725
    Four months ago but there'll be more people that are having trouble with optimization and I've just spent the last month cranking out all this info, thought a post would be good.
     
  8. adslitw

    adslitw

    Joined:
    Aug 23, 2012
    Posts:
    275
    Would probably have started a new thread personally, but whatever floats your boat! :)
     
  9. astracat111

    astracat111

    Joined:
    Sep 21, 2016
    Posts:
    725
    My thought is that people look for answers more from the question threads than the answer ones.
     
  10. ashwinFEC

    ashwinFEC

    Joined:
    May 19, 2013
    Posts:
    48
    Have you ever heard of Google? When I search for "cutout shader slow iphone" this is the first post that shows up for me. If astracat111 posted this in a new thread it might not show up in the top search hit.
     
  11. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Back on ancient mobile hardware we were able to do a lot of vegetation by simply modelling in the cuts and using a cheap opaque shader. Don't use cutout, use opaque or alpha transparency, if you care about speed on mobile.

    Mobiles process verts like nobody's business, they're not typically a bottleneck, because mobile rendering is typically just the one forward pass.

    Classic transparent rendering is not that expensive if the shader is quite slim and texture access minimal. The only reason one doesn't want it on mobile is usually going to be sorting problems.
     
  12. astracat111

    astracat111

    Joined:
    Sep 21, 2016
    Posts:
    725
    I'm hoping that some day, that even though mobile gpus cut down on power by using tile rendering, in the future it'll be negligible. I think we're going to come to that time, because with the iPhone 11 it's fast enough where it can just blast right through it. In 3 or 4 years, you have to think that the iPhone 11 will be slow, and when that day comes I'm thinking it'll be more like you're developing a game for like a Geforce 1050 or something.
     
  13. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,339
    Tiled rendering isn’t the problem. The last several generations of Nvidia GPUs have been tile based too. The big bottleneck with mobile is memory bandwidth, which is also constrained for power usage reasons.

    Modern mobile phones are already way faster than something like a PS3 or Xbox 360. According to Apple the iPhone 11 is as fast as an Xbox One S, though no one is really sure how they come to that conclusion, but it is very very fast. They’re certainly getting close to a 1050 already. The hardest problem for mobile GPUs is the screen resolutions of modern phones.
     
  14. ClocktimerGamer

    ClocktimerGamer

    Joined:
    May 25, 2017
    Posts:
    5
    Who the hell are you ? :) An actual wizard? This was an amazing find. Thank you so much. Trying to squeeze every bit of FPS out of my game has been a 6 month journey. Just when I thought I found every optimization technique possible, I came across this. Thanks again.

     
    astracat111 likes this.
  15. astracat111

    astracat111

    Joined:
    Sep 21, 2016
    Posts:
    725
    There's also this other script on the asset store called X-Frame that doesn't seem to be too popular and it really helps out.
     
    ClocktimerGamer likes this.
  16. ClocktimerGamer

    ClocktimerGamer

    Joined:
    May 25, 2017
    Posts:
    5
    Thank you, I will check it out.

    At the moment I am comparing 2 things (still in early early stages of testing, but maybe you could share your insights):

    For example, in a Forest Scene for Mobile :

    1. LOD's for All Objects, including loads of grass meshes. (Most objects will just be set to LOD0 and Cull at around 3-4% depending on the object.)

    2. Or Separating everything into layers and using the Per Layer Camera Culling. Grass in "near" layer. Some trees and bushes in "medium" layer, background objects and terrain in "far" layer.

    -----OR----

    3. Or a Combination of the 2 above ?

    --- ALSO---

    4. How have you implemented Mesh Baking with your scenes ? (this was the previous method I used to drop draw calls big time, but then I had some memory issues).

    As I understand it, LOD's impact CPU. Per Layer Camera culling ? GPU?

    Thanks again. I was very lucky to find you post. It opened up some new pathways of thinking about how I put together my scenes. I am just not sure if technically I am getting the same result (LOD's vs Per Layer Culling) or of there are actual benefits to one of the other, in particular for mobile.
     
  17. astracat111

    astracat111

    Joined:
    Sep 21, 2016
    Posts:
    725
    @ClocktimerGamer The camera culling, drawing less to screen is gonna work better, but having 3d geometries instead of cutout objects or use a transparency shader won't make it 1/4th the speed on mobile.

    The problem is that once you've ruined the tile rendering optimization, from my understanding, there's no going back. Once there's like one cutout shader/material in your scene, you chug the mobile CPU.

    I think though we're coming to an age where mobile CPUs are gonna get so speedy that it won't matter for less graphically intense games, but I'm not sure...

    So there are AlphaTest (cutout shaders) and Alpha, and Alpha works better on mobile because ti doesn't kill the tile rendering optimization.
     
  18. JohnnyFactor

    JohnnyFactor

    Joined:
    May 18, 2018
    Posts:
    343
    Does alpha test affect battery life in any significant way? I don't have any fps issues but battery life is always a concern.

    Also, does an alpha test shader affect optimization in only the tiles it's visible, or the whole frame?

    I would really prefer to use alpha test because alpha/fade requires splitting my models into about 30 pieces to defeat sorting issues.