Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

MSAA performance on mobile

Discussion in 'General Graphics' started by exltus, Feb 10, 2020.

  1. exltus

    exltus

    Joined:
    Oct 10, 2015
    Posts:
    58
    I heared in many talks and resources that 4xMSAA is basicly free on mobile devices, but I tested it on multiple (2-3 years old devices) and performance drop is drastic. What is your experiences with it? Is there some special setup, that must be done? I just enabled MSAA on camera and in quality settings and my fps drops to half immediatly.
     
    gaolei_nls likes this.
  2. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    2,948
    Hi!
    It can be almost free if it's done correctly, specifically when the MSAA render target contents are not written back to main memory.
    What is your camera setup? Which version of Unity and which rendering pipeline are you using?
     
  3. andyz

    andyz

    Joined:
    Jan 5, 2010
    Posts:
    2,223
    Is this true for 2018 LTS, foward rendering?
     
  4. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    2,948
    @andyz it's possible to make this work both proper and not, it really depends on your setup :)
     
  5. toto2003

    toto2003

    Joined:
    Sep 22, 2010
    Posts:
    528
    i m also interested into optimising performance using msaa, could you develop further what would be the required setup to fully optimise that?
     
  6. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    2,948
    gaolei_nls likes this.
  7. Dragnipurake97

    Dragnipurake97

    Joined:
    Sep 28, 2019
    Posts:
    40
    IIRC MSAA being free is a misconception, it appears to be free but that's because it's done on dedicated hardware so it doesn't show up when profiling but not 100% on that being the case. Still cheap though just not quite free. So older phones may not have the dedicated hardware for it so performance tanks. There is an analysis page about it for Quest which uses mobile hardware (Adreno 540) here: https://developer.oculus.com/documentation/native/android/mobile-msaa-analysis/

    PowerVR does it on chip: https://docs.imgtec.com/PerfRec/topics/c_PerfRec_msaa_performance.html which has the note: " 2x MSAA is virtually free on most PowerVR graphics cores (Rogue onwards), while 4x MSAA+ will noticeably impact performance. "
     
    Last edited: Feb 18, 2020
    rinKhung likes this.
  8. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,492
    I think the way it work is by exploiting the tile system of phone, ie rasterizing the zbuffer in the fast tile local memory at a higher resolution than resolved to the framebuffer. It's free as in relative to the framebuffer writing bandwidth. It's likely you still pay the local fillrate cost on that local memory, but ideally they only rasterized the z buffer at higher resolution.
     
    Dragnipurake97 likes this.
  9. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    2,948
    All modern (and most old) mobile GPUs do this on chip. Yes, it has some cost, but it's comparatively small.
    Only older mobile GPUs (ones that don't have either EXT or IMG version of multisampled_render_to_texture) will suffer a performance penalty.
     
  10. Prodigga

    Prodigga

    Joined:
    Apr 13, 2011
    Posts:
    1,123
    Sorry to Necro, does this mean using post FX or anything else that causes a ResolveAA pass to appear in the frame debugger breaks this optimisation? (Since I guess it requires the render target to store and resolve?)

    And is this optimisation specific to 4x MSAA? No advantage in 2x? Or does the optimisation not even apply for 2x so you pay the full cost?

    Trying to find more information on this topic.
     
  11. joshuacwilde

    joshuacwilde

    Joined:
    Feb 4, 2018
    Posts:
    721
    From my experience, yes if you need to write to some intermediate buffer before writing to the screen framebuffer, MSAA becomes very expensive on mobile. Would be interested to hear other thoughts though, possible I was doing something incorrectly.
     
    yamasaki1111 likes this.
  12. dnach

    dnach

    Unity Technologies

    Joined:
    Mar 9, 2022
    Posts:
    89
    MSAA gives quality antialiasing, while being relatively cheap for forward rendering.
    With additional samples per pixel, rasterization results in smoother edges.
    The additional samples increase memory usage (larger buffer to contain the additional samples).
    x4 would require more sub samples per pixel, thus more memory usage as well as more expensive resolve.

    This is also the reason why MSAA is less suited for Deferred Rendering. (Due to multiple GBuffer attachments / increased memory usage)

    MSAA has to be resolved (downsampled) regardless of postprocessing, before display.
    By "resolving", this simply means averaging the subsamples per pixel.
    This is done either with a blit/copy operation, or fullscreen shader pass.

    When and how resolve happens can vary, and can impact performance.
    If post processing effects need to read/write multiple samples per fragment - this can be quite expensive.
    Thus resolve is often performed before post-processing.

    In the case of URP, a "standard" resolve is applied using a blit/copy operation to a single-sample buffer.
    This operation corresponds to 'RenderTexture.ResolveAA' sample seen in the frame debugger.
     
    Last edited: Mar 9, 2022
    yamasaki1111 likes this.
  13. Prodigga

    Prodigga

    Joined:
    Apr 13, 2011
    Posts:
    1,123
    Great info, thanks! Makes sense the resolve happens before post FX.

    Does this effect this optimisation that happens on tiled GPUs that makes msaa "free"? (Performance wise).

    I'm having trouble finding any concrete information about this - under what conditions (which devices, rendering setups, etc) msaa is "free" (performance wise) for mobile devices and what breaks this optimisation.
     
  14. dnach

    dnach

    Unity Technologies

    Joined:
    Mar 9, 2022
    Posts:
    89
    On modern APIs (Vulkan, Metal), we can specify a resolve attachment in the render pass description, and resolve is handled accordingly. The attachment contents are cleared after the render pass, or stored in memory based on the store operations. (Store, Resolve, StoreAndResolve, DontCare)

    To take advantage of render-target optimizations for tile-based GPUs, as Aleksander mentioned above, set the attachment store operation as DontCare when possible.

    Edit: To clarify, MSAA is definitely not free! as Aleksander mentioned in the first post, cost can be significant if we store the multiplesampled attachment in main memory, so avoid using MSAA attachments with Store or StoreAndResolve.

    One noticeable example is when enabling URPs 'forced depth prepass' with MSAA, which will have a performance impact as we store the multisampled depth attachment, and thus should be avoided.
     
    Last edited: Aug 9, 2022
    xiangshushu and Prodigga like this.
  15. CChong_at_Infinity

    CChong_at_Infinity

    Joined:
    Apr 7, 2020
    Posts:
    27
    With respect to MSAA and the resolution scale setting with the URP, would MSAA still be relatively cheap? I have no idea if frame buffers are being copied around internally and whether that'll break the optimisation.
     
  16. drallcom3

    drallcom3

    Joined:
    Feb 12, 2017
    Posts:
    162
    MSAA free? My project definitely takes a small hit from 2x and a much larger one at 4x+.
    It's just standard URP + PP on an Adreno device.

    Since you won't know your user's device, you can't assume 2x is "free".
     
  17. JSmithIR

    JSmithIR

    Joined:
    Apr 13, 2023
    Posts:
    111
    how do we prevent this? I need to read the color buffer before it is resolved, however passing renderingData.cameraData.renderer.cameraColorTargetHandle directly to my compute shader in a scriptable renderer pass only gives 1 sample value per pixel, even when msaa is set to 4x. In fact, if I try to copy the cameraColorTargetHandle to another RTHandle (which has bindMS = true and msaaSamples = 4) I get an error: "CopyTexture called with mismatching sample count per pixel (src 1 dst 4)." Has Unity really made it impossible for the user to access an unresolved color buffer? So frustrating, please help!
     
  18. Timtaotao

    Timtaotao

    Joined:
    Sep 13, 2019
    Posts:
    12