Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Should I expect higher fps with low resolution output from my camera?

Discussion in 'Shaders' started by SupriyaRaul, Nov 9, 2019.

  1. SupriyaRaul

    SupriyaRaul

    Joined:
    Jun 20, 2018
    Posts:
    28
    I have created a test Unity XR scene with some spheres in it. I am trying to compare the frame rendering time (in ms) I get with default camera resolution and with a low resolution display output. I was expecting to get lower frame time but it didn't happen. Can anyone tell me if I am doing something wrong here?
    To get the low resolution output, I created a render texture that has lower resolution and then I update my camera output like this

    Code (CSharp):
    1.     void OnRenderImage(RenderTexture source, RenderTexture destination)
    2.     {
    3.         Graphics.Blit(source, _LowResRenderTexture, LowResMaterial);
    4.         Shader.SetGlobalTexture("_LowResolutionTex", _LowResRenderTexture);
    5.         Graphics.Blit(source, destination, FinalMaterial);
    6.  
    7.     }
    The FinalMaterial has a shader applied, find it below

    Code (CSharp):
    1. Shader "Unlit/FinalMatTest"
    2. {
    3.     Properties
    4.     {
    5.         _MainTex ("Texture", 2D) = "white" {}
    6.     }
    7.     SubShader
    8.     {
    9.         Tags { "RenderType"="Opaque" }
    10.         LOD 100
    11.  
    12.         Pass
    13.         {
    14.             CGPROGRAM
    15.             #pragma vertex vert
    16.             #pragma fragment frag
    17.             #include "UnityCG.cginc"
    18.  
    19.             struct appdata
    20.             {
    21.                 float4 vertex : POSITION;
    22.                 float2 uv : TEXCOORD0;
    23.             };
    24.  
    25.             struct v2f
    26.             {
    27.                 float4 vertex : SV_POSITION;
    28.                 float2 uv : TEXCOORD0;
    29.             };
    30.  
    31.             sampler2D _MainTex;
    32.             float4 _MainTex_ST;
    33.  
    34.             sampler2D _LowResolutionTex;
    35.             float4 _LowResolutionTex_ST;
    36.  
    37.             v2f vert (appdata v)
    38.             {
    39.                 v2f o;
    40.                 o.vertex = UnityObjectToClipPos(v.vertex);
    41.                 o.uv = TRANSFORM_TEX(v.uv, _MainTex);
    42.                 return o;
    43.             }
    44.  
    45.             fixed4 frag (v2f i) : SV_Target
    46.             {
    47.                 fixed4 col;
    48.                 col = tex2D(_LowResolutionTex, i.uv);
    49.                 return col;
    50.             }
    51.             ENDCG
    52.         }
    53.     }
    54. }
    55.  
     
  2. jamespaterson

    jamespaterson

    Joined:
    Jun 19, 2018
    Posts:
    390
    There are many factors affecting fps. Usually i find i am cpu limited, not gpu. Use the unity profiler to find out where the time is going. Note that unity has a global setting for maximum fps and also with vsync enabled may be waiting for monitor sync. This seems to be a very simple shader so it is unlikely to be very taxing on the gpu, you are probably not fill rate limited where reducing resolution would help. Good luck!
     
    SupriyaRaul likes this.
  3. SupriyaRaul

    SupriyaRaul

    Joined:
    Jun 20, 2018
    Posts:
    28
    Thank you for your quick reply James. I have already disabled vsync. I found out the that in Unity profiler Camera.Render and specifically RenderForward.RenderLoopJob and Shadows.RenderShadowMap take most of the GPU time in ms. I have 14400 spheres in my test scene, so I hope the fill rate is high enough. All the frames are taking between 12.5-20 ms to render in both the cases.
     
    Last edited: Nov 9, 2019
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,238
    If you’re rendering for XR, vsync will likely be forced on no matter what, and you do want it on. That won’t really increase the FPS if your frame time is too high, it’s more so a game that has very low frame time can run at faster than the refresh rate of the display, or run at rates that don’t match the refresh rate, but at the cost of tearing.

    If you have 14400 spheres, and they’re all their own game object & renderer component, you’re almost certainly CPU limited rather than GPU limited. And if they’re all opaque, then it’s not a fill rate problem either. It’s purely a how long it takes for the CPU to tell the GPU to render each object problem.

    You probably want to look into instancing via DrawMeshInstanced script calls.
     
    SupriyaRaul likes this.
  5. SupriyaRaul

    SupriyaRaul

    Joined:
    Jun 20, 2018
    Posts:
    28
    Thank you so much for your suggestion @bgolus ! :) Anytime I have any doubts and I search for answers on this forum, I find your answers to similar queries really helpful.
    I didn't know vsync is forced on. Sure, I would try using instancing via DrawMeshInstanced script calls.
    I am just trying to increase the complexity of my scene by adding thousands of spheres, I just want to see significant performance difference (in terms of GPU render time) between high and low resolution application. Do you think I should try using Unity's custom rendering pipeline for that or Built-In pipeline should be fine?
     
  6. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,238
    If you’re CPU limited by the number of sphere you’re rendering, it won’t really matter what resolution you’re rendering at. You could render a 1x1 pixel and it’ll be about the same as a “normal” resolution, since you’re not GPU constrained at that point.

    What you want for a scene that I’ll improve with lower resolutions is very high amounts of overdraw (from overlapping transparent objects) and / or high shader complexity. Basically things that’ll take a lot of effort for the GPU without it being purely memory bandwidth constrained or CPU limited.
     
    SupriyaRaul likes this.
  7. jamespaterson

    jamespaterson

    Joined:
    Jun 19, 2018
    Posts:
    390
    I agree with bgolus. One quick check is how many draw calls for your 14400 spheres? If they are not batching that may hit the cpu pretty hard. The stats window and frame debugger is useful for this.
     
    SupriyaRaul likes this.
  8. SupriyaRaul

    SupriyaRaul

    Joined:
    Jun 20, 2018
    Posts:
    28
    @bgolus sure, I will try to apply transparency to my gameobjects using an additional shader pass. I have already tried making my shader complex by adding shadow caster/receiver passes and spot light calculation (added up to 5 spot lights with diffuse, specualr and ambieant color components) in my fragment shader for each and every gameobject but as you said it's most probably CPU limited before I can put load on the GPU. Today I will try using DrawMeshInstanced and also making some of the spheres transparent.
    @jamespaterson I have attached the stats window screenshot for both the versions here.
    statsComparison.png
     
  9. SupriyaRaul

    SupriyaRaul

    Joined:
    Jun 20, 2018
    Posts:
    28
    And in the frame debugger this is the reason the draw calls aren't batched : " A submesh we are trying to dynamic-batch has more than 300 vertices."

    drawCallNotBatched.png
     
  10. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,238
    Most of the cost of shadows are in the extra draw calls (CPU) and the rendering of the shadow maps (the resolution of which don’t scale with the screen resolution), so neither of those are going to be affected much by the render resolution,

    Adding an extra pass to an already opaque shader does add some extra cost, but far, far less than if you used a transparent base pass. Basically anything that’s opaque (specifically that writes to the depth buffer) helps the GPU ignore large swaths of overdraw that it would otherwise not be able to skip.

    Use something like the Standard shader, set to fade, without any shadows, static batching, all of the spheres overlapping, and maybe two very large point lights. You’ll bring most GPUs to their knees with that with only maybe 2000 spheres, but dropping the resolution will help a bunch.

    Note if you go too low it’ll actually start going back up as the GPU’s ability to do a lot of work in parallel goes away.
     
    SupriyaRaul likes this.
  11. jamespaterson

    jamespaterson

    Joined:
    Jun 19, 2018
    Posts:
    390
    So, looking at the stats window screenshots you have a lot of geometry (25m) and a lot of draw calls (33k). Actually both your cpu and gpu frame time seem about the same. To render a scene of this complexity at fast frame rates i would expect you will need to use gpu instancing techniques, irrespective of target camera resolution. You will then see batching take place and a significant reduction in draw calls. This means less overhead between the cpu and gpu and hence much faster rendering, but there are restrictions e.g. lots of identical geometry.
    Here is an old but still relevant nvidia presentation on batching:

    http://www.nvidia.com/docs/io/8230/batchbatchbatch.ppt

    There are quite a few ways to go with that depending on which rendering pipeline you are using. I personally favour gpu instancing shaders but i am using the older the pipeline. Good luck!
     
    SupriyaRaul likes this.
  12. SupriyaRaul

    SupriyaRaul

    Joined:
    Jun 20, 2018
    Posts:
    28
    Thank you @bgolus and @jamespaterson for these clarifications, I really appreciate your time. I will get back to you with my results based on your suggestions.
     
    jamespaterson likes this.
  13. SupriyaRaul

    SupriyaRaul

    Joined:
    Jun 20, 2018
    Posts:
    28
    Hello again, so I tried using GPU instancing. I first created a sphere in Blender with almost 32k vertices, and used that to spawn 1000 opaque and 1000 transparent spheres (hopefully its a lot of overdraw, see attachment). Finally I compared my GPU performance with normal and low resolution. I am still not getting good results. Please see the attachments showing comparison. If you have a look at the bar chart, low resolution frames take more time on average. Do you have any comments @bgolus and @jamespaterson?
     

    Attached Files:

  14. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,238
    Vertex count doesn't scale with screen resolution. You should be using spheres closer to 32 vertices. Not 32k, 32.

    You've got a handful of spheres with maybe at most 3 or 4 overlapping on a single pixel. I was thing more on the order of a few hundred or few thousand overlapping on every single pixel.

    Change your setup to 1000 sub 50 vertex spheres, all transparent using the standard shader, put multiple point lights in the scene, and have them all big enough to cover the entire screen so they're all overlapping each other rather than spaced out.
     
    SupriyaRaul likes this.
  15. jamespaterson

    jamespaterson

    Joined:
    Jun 19, 2018
    Posts:
    390
    So, looking at the stats batching is working (many less draws calls) which is good, and your frame rate seems much better, however the resolution change doesn't have much affect. As bgolus says at 24m vertices you may be geometry limited, not fill rate. Try less vertices in your spheres!
     
    SupriyaRaul likes this.
  16. SupriyaRaul

    SupriyaRaul

    Joined:
    Jun 20, 2018
    Posts:
    28
    Thank you for your prompt replies @bgolus and @jamespaterson.
    Now, I have a string of 1000 transparent spheres (42 vertices each) in front of the camera and I have 9 point lights. I had to create my own transparent shader for them because standard shader was giving me weird artifacts with lights. I will share the results with both of you soon.

    Meanwhile I have another question, I just got to know that to use instancing in combination with multiple lights, we need to switch to the deferred rendering path. But do you think having transparent spheres with deferred rendering would work? And as more point lights put more load on CPU, would you recommend implementing custom point lights that again are instanced (If yes I need to figure out how to do that) ? Thanks in advance.
     
    Last edited: Nov 29, 2019
  17. jamespaterson

    jamespaterson

    Joined:
    Jun 19, 2018
    Posts:
    390
    Deferred rendering is quite different in terms of how it works compared to conventional forward rendering. Generally transparency is more difficult to achieve wth deferred. My understanding is that unity can mix and match deferred and forward rendering to achieve this, but i might be wrong. If your criteria is transparency and nine lights (i think unity supports four by default but it depends on per vertex vs per pixel lighting) then writing your own forward shader might be the easiest way to go.

    This guys tutorials are the best on the web for in in depth info on this kind of thing. Unity should hire him :

    https://catlikecoding.com/unity/tutorials/scriptable-render-pipeline/lights/


    Good luck!
     
    SupriyaRaul likes this.
  18. SupriyaRaul

    SupriyaRaul

    Joined:
    Jun 20, 2018
    Posts:
    28
    @bgolus Yes, you were right. I noticed considerable performance gain with low-res scene up until 2000 spheres but at 2500 spheres the GPU's performance goes down again. Could you please explain why GPU’s ability to work in parallel goes away after that?
     
  19. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,238
    It's not going away, just the thing that's taking the most time is changing and there are different bottlenecks on performance.

    The more vertices you're drawing the more time the GPU has to spend calculating their positions. Those are done in parallel, but the cost is unrelated to the render resolution. 10,000 vertices takes the same time to calculate on the GPU regardless of if you're rendering at 4k or to a single pixel. And instancing is mainly to reduce the CPU time, not the GPU time, so 10,000 vertices that are a single mesh vs 10,000 vertices that are from a hundred 100 vertex meshes doesn't change the cost for the GPU that much. In fact the single mesh is usually faster.

    The one big bottleneck that does get in the way of the GPU's parallelism is when lots transparent of things overlap. This is the same problem any kind of multi-threaded process has where if multiple things need to write to the same set of data in a certain order, they have to be done in serial in that order, one after the other. So if you've got 100 transparent triangles that cover the same screen pixel, the GPU can't render all 100 triangles at once, but instead has to render each triangle one at a time for that pixel. 100 triangles that don't overlap and are distributed across a larger area can all be rendered in parallel because there's no output data dependence between them.