Search Unity

Render textures vs image effects on mobile: why?

Discussion in 'Image Effects' started by Lex-DRL, Oct 31, 2017.

  1. Lex-DRL

    Lex-DRL

    Joined:
    Oct 10, 2011
    Posts:
    140
    TL;DR: Why image effects on mobile are much more expensive then using a material with render texture passed to it?



    I'm a bit confused.
    If i use a render texture in any material on Android/iOS, it costs almost the same as using a regular texture.
    But if I attach any image effect (even a simplest one), it drops the performance to almost a zero fps (on low-end phones).

    Why is this happening? I thought an image effect script passes the screen image as render texture, so there would be no difference at all.

    I also thought that RenderTexture, once rendered, stays in the VRAM and never leaves the GPU (unless told to be released). But if so, why image effects are so expensive on mobile? Basically, it's just a fullscreen quad with render texture passed to the shader. The texture is in vram, passing a shader with it's properties is just a single extra draw call. Where all the performance hit comes from?

    The only guess I can make myself so far is that image effect somehow forces this screen texture to go back and forth from and to the GPU memory. Which, in turn causes extra delays and performance drops.
     
  2. Kumo-Kairo

    Kumo-Kairo

    Joined:
    Sep 2, 2013
    Posts:
    343
    Mostly Image Effects are costly because of the fillrate and bandwidth. Your 3D objects take up only so much screen space and it's possible to render a lot of them, while keeping the total count of rendered pixels low. Mobile GPUs usually have a lot of optimization techniques like early rejection that works like additional "occlusion culling", so a lot of object pixels are not rendered at all (less pixel shader calls)

    If you stack up three or four full screen semi-transparent textures, it will slow the rendering down to a halt, much like a three-four pass full screen image effect. You can try it right away using UI textures (they are alpha blended-only)

    Moreover, built in OnRenderImage and Graphics.Blit take up lots of CPU time too, and they don't always allow downsampling (which is a common technique in real-world post process effects).

    So if you render a scene to a full-sized render texture and then render this texture on a full screen quad, it's basically rendering twice the pixel count (close to a rendering the same scene twice). Add additional shader logic to that and you get a lot of wasted fillrate.

    Another problem on mobiles is dependent texture reads that are used for things like AO, pixel ripple effects etc. This is due to the fact that mobile GPUs are usually tile-based and random texture read operations are costly because they require a full VRAM fetch.

    I'm making a mobile-oriented post processing stack that doesn't use OnRenderImage or built-in Graphics.Blit. Right now I have implemented fast bloom that is barely slowing the devices down - it renders at 60 fps on most of them, event on full screen final compose pass.
    https://forum.unity.com/threads/sleek-render-lightning-fast-mobile-bloom-effect.502181/
    I've already submitted it to the assetstore and it's waiting the review process. You will be able to grab it while it's free, and look at how things are done there (rendering pipeline and shaders).
     
    Last edited: Oct 31, 2017
  3. Lex-DRL

    Lex-DRL

    Joined:
    Oct 10, 2011
    Posts:
    140
    Thanks @Kumo-Kairo.

    But maybe you could also clarify that RTs and VRAM part? Do RTs reside in GPU memory only, or they are transferred back to CPU side at some point?
     
  4. Kumo-Kairo

    Kumo-Kairo

    Joined:
    Sep 2, 2013
    Posts:
    343
    On mobiles there's no such thing as a separate RAM for CPU / VRAM for GPU, it's all integrated on one chip. Or at least this is how I remember it. My knowledge in this field is limited though, but it's interesting to look at native profilers like Tegra Graphics Debugger or Adreno Profiler to see those fetch timings and where do they usually go. Basically I've described what I know for sure and what I've measured in the previous answer.

    What devices do you usually test your builds on?
     
    Last edited: Nov 3, 2017
  5. Lex-DRL

    Lex-DRL

    Joined:
    Oct 10, 2011
    Posts:
    140
    Android. Specifically, Samsung Galaxy S3. I intentionally use it for test, as a lowest-tier hardware.
    Thanks for your advice, I'll investigate this with a native profiler.
     
  6. Kumo-Kairo

    Kumo-Kairo

    Joined:
    Sep 2, 2013
    Posts:
    343
    I've checked a few different devices with native profilers and all memory is referred to as "system memory", or "DRAM", so on mobiles it's really all in the same global memory.
     
  7. Lex-DRL

    Lex-DRL

    Joined:
    Oct 10, 2011
    Posts:
    140
    It's from the hardware point of view. But on the software side, all the resources go to the GPU through the graphics API, isn't it? So even if physically memory is the same, Unity may re-send something (e.g., textures) from "RAM" to "VRAM" at the GPU-change-state stage.

    That's what I'm trying to figure out: does Unity re-send any render textures "to GPU side" or get them "from GPU side"? Does it do anything that causes GPU to stall because of that preparation step?
    And if it doesn't (which, I believe, is the case), then how image effect input texture is different from this?

    In short, render texture may be passed to GPU in 2 possible ways.
    1. CPU (kinda) tells to GPU: "render this, store the resulting image in your memory, keep it there with this texture pointer. I don't need the texture itself, just keep it". Then: "remember, you have rendered that texture? Now use that texture from your memory as an input property for this new draw call, store the result in the regular frame buffer".
    2. "Render this, return me the texture itself. I'll wait till you finish". Then: "Here's a new draw call. This is the texture for one of it's input properties. I'm sending it."
     
    Last edited: Nov 1, 2017
  8. nat42

    nat42

    Joined:
    Jun 10, 2017
    Posts:
    353
    Actually, I think you were trying to figure out if rendering to a texture and then rendering the result with pixel/fragment image effect/post-processing shaders is equivalent to Unity's post processing/image effects.

    Your guess here is wrong. Kumo-Kairo seems to have given an exceeding great answer as to the real reason for the performance.

    The "I'll wait till you [to] finish" would be a "stall", Unity does not do post processing effects like that.

    Ultimately it seems the elephant in the room is you want to do post-processing effects on mobile, and aren't asking about that?
     
  9. Kumo-Kairo

    Kumo-Kairo

    Joined:
    Sep 2, 2013
    Posts:
    343
    In OpenGL world it's called binding. It's the process of telling the graphics driver to use specific textures, vbo's or shader programs for subsequent rendering. It doesn't incur any fetches by itself. And the dreaded drawcalls that everyone fear of cost almost nothing by themselves. It's all these preparation and binding steps that make drawcalls slow. If you render several different objects using the same shaders, render targets, vbo's or textures, it will mostly be fast as hell. By the way, post processing usually requires quite a few render target switches (render texture targets) and it surely slows things down too.

    The best thing you can do for yourself is to go and try building something with pure OpenGL ES / C++. It will give you enourmous experience in computer graphics and rendering in general
     
    zhuhaiyia1 likes this.
  10. Lex-DRL

    Lex-DRL

    Joined:
    Oct 10, 2011
    Posts:
    140
    Sure. But I know the answer is: "With the regular image-effect approach, you can't". So I'm trying to find the exact reason why this happens, to then find an alternative way to do (hopefully) the same things. Or to see what I need to fake/approximate to get at least a similar result.

    Right now, the project I'm working on, has quite a complex UI. Which (due to the UI nature) is mostly transparent. And it sometimes produces, like, x5 overdraw of the entire screen. With different shaders, sprite atlases etc. So, essentially, it is the same as if I had five simple image effects. But somehow, it's not the same from the performance point.
    That's the mystery I'm trying to solve. Which, of course, would let me use somewhat looking like image effects.

    Thanks for the advice. I already had plans to do it, but seems like I'll do it sooner now.
    And I'm looking forwad to the release of your great Sleek Render asset ;)
     
  11. daxiongmao

    daxiongmao

    Joined:
    Feb 2, 2016
    Posts:
    412
    I didn't read everything but I didn't see mentioned about what image effects are doing differently than just a render texture.

    Just drawing to a render texture and then drawing that to the screen is not really that much more expensive.

    But most image effects are completely different. Say for instance a blur.
    It copies every pixel from the screen into another buffer. Then depending on the down sampling it may do that 2 or 3 times smaller and smaller. Then it will read back from that down sampled texture possibly 4 to 16 times for every pixel on the screen blending them together to produce a blurred image.

    So with some image effects you could be looking at for every pixel doing 4-16x the work.
    Compared to just rending some pixels on the screen 5x isn't much compared to 100% of the pixels.

    Now add two or three image effects. Blur, bloom, color grading and you have just blown a ton of performance.

    I way oversimplified things but you can see where the performance can go quickly. Then combine that with the generally slower hardware on mobile.
     
  12. Kumo-Kairo

    Kumo-Kairo

    Joined:
    Sep 2, 2013
    Posts:
    343
    You didn't see it here because it's actually false. OpenGL and DirectX don't differentiate between "regular" 3D scene rendering and image effects rendering. From the GPU's point of view it's the same thing. And this is where Unity plays a bad role - it makes users think that image effects are actually something "different" than usual material setup and rendering. This similarity becomes apparent when you start profiling your game with native profilers or looking at native openGL code (most of the best tutorials and examples on rendering are written in GLSL and C++ OpenGL API calls)

    Regarding the performance differences between regular OnRenderImage + Graphics.Blit vs custom post processing code - I became curious too and will conduct a few experiments on this and post the results here once I have some data.
     
  13. Lex-DRL

    Lex-DRL

    Joined:
    Oct 10, 2011
    Posts:
    140
    @Kumo-Kairo, @daxiongmao thanks guys. You're very supportive.
    Exactly! That's what I'm trying to figure out. Technically, it's just a fullscreen quad with a material attached to it and a current screen image passed as an input texture.

    I tried even a simplest distortion image effect. It's just two texture reads per pixel and nothing more. One of those texture reads is dependent, but still the shader is extremely simple.
    Or a simple grading with LUT. Two dependent texture reads now, but still not so much to do.
    And even those two effects (each of them tested separately) killed the performance.

    But anyway, I shouldn't bother you repeating my question over and over again. You already helped me a lot.
    So thanks again. I'll reply as soon as I have something to add, too.
     
    Last edited: Nov 1, 2017
  14. Kumo-Kairo

    Kumo-Kairo

    Joined:
    Sep 2, 2013
    Posts:
    343
    There, you have your answer. a few LUTs here, a few DTRs there, all in fullscreen resolution. This is what causes the problem in your case. DTRs cause severe GPU stalls because it incurs a full System Memory fetch instead of the fast on-chip fetch.
    upload_2017-11-2_12-7-13.png
    http://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-TileBasedArchitectures.pdf

    So in your case it was not the fillrate that caused the problem, it was bandwidth. Try adding this distortion or LUT color correction shader to a full screen UI texture (just random photo/piece of art) and see how it goes. I bet it will be as slow as rendering it as an image effect.

    Meanwhile, I've made my experiments. I did a simple OnRenderImage + Graphics.Blit test with a simple luma-based color grading and a standard invert-colors shaders. None of those has any DTRs and major GPU stall moments and guess what. It renders at smooth 60 fps even on my oldest ZTE V811.

    So the main reason for the slow built-in OnRenderImage + Graphics.Blit is rendering complex post effects at the full resolution. OnRenderImage doesn't allow downscaling out of the box, it requires you to use two passed render textures which both are at full res. And this is where custom solution come into play - downscaling intermediate (I call them "transient") textures by 4 by both sides reduces overall processing work required by 16. And after that, even DTRs don't seem too bad (they are bad nontheless, but their impact becomes less apparent).

    So the conclusion is this - there's nothing wrong or internally slow in OnRenderImage and Graphics.Blit, it's just not flexible enough for mobiles.
     
    senkal_, chadiik and Lex-DRL like this.