Search Unity

Additive vs standard alpha blending performance

Discussion in 'General Graphics' started by bleu, Jun 23, 2016.

  1. bleu

    bleu

    Joined:
    Apr 6, 2013
    Posts:
    41
    It is my understanding that additive blending is generally more efficient compared to normal (src alpha, one minus source alpha) blending. Is there a particular reason why?

    Is it because the additive equation is simpler and thus does less calculations overall? The additive parameters are typically (One, One), which means that the final pixel color is:

    New Fragment = Src*1.0 + Destination*1.0;

    I assume that the "1.0" could be trivially excluded when the shader is compiled.

    As opposed to the more complicated, normal blending equation:

    New Fragment = Src*Src_Alpha + Destination*(1.0 - Src_Alpha)

    Which would require consulting a Src_Alpha component and computing 1.0 minus that. Or maybe it's because additive requires no sorting of objects?

    I'm thinking of normal game objects as opposed to particles here. I know that particles can be left unsorted (depth-wise) and use additive blending, and that type of blending is appropriate anyway because it is order-independent (in terms of Z).
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    On modern PCs there's no real performance difference. Its not that the blend operation itself isn't faster, it might be on some hardware, but generally the difference, if it exists at all, isn't going to be measurable for various reasons.

    On most non-Apple mobile device it is actually noticeably faster for the reason you guessed, specifically any devices not using PowerVR GPUs. For PowerVR however any blend operation takes exactly the same amount of time as a quirk of the hardware.
     
  3. bleu

    bleu

    Joined:
    Apr 6, 2013
    Posts:
    41
    Ok, so basically the real cost of alpha blending is rendering fragment multiple times, and on some lower-end platforms a more complicated blending equation is probably more costly. As far as sorting is concerned, I don't know if Unity manually sorts any transparent objects by default -- without knowing about the implementation details, I don't expect it to do so (right? the person arranging the scene should know how to sort transparent objects relative to a camera anyway). Only particles have a sorting option.

    As far as mobile hardware is concerned, I'm more interested in GearVR. If you consider the Samsung Galaxy S6 as the baseline, that's basically a Mali-T760 MP8 GPU with 1 GB of VRAM (I think it's 1 GB; it's difficult to find the full specs). I'm not sure if that would be on-par with an iPhone 6's GPU in terms of using normal alpha blending. I do know that the GearVR documents don't recommend using transparency a lot because of fill rate limitations.
     
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    Unity sorts all transparency by default, but like most real time engines it only sorts the meshes. Particles have sorting options for enabling per particle sorting, as otherwise Unity handles each particle system as single whole meshes with the particles rendered in the order they're generated from a looped buffer.

    For GearVR I found those devices to be surprisingly powerful and I threw a lot of localized transparency at them. Additive will be slightly faster on those devices, especially for effects that cover a lot of the view or overlap, but don't feel you need to avoid them. It is recommended you avoid alpha test on those devices though. Alpha test is much more expensive than transparency, and it looks bad in VR. If you need alpha test for some reason you probably want to use AlphaToMask instead. It is just as expensive as alpha test, but can be anti-aliased.
     
    theANMATOR2b likes this.
  5. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    Something else to consider.. additive can be faster than standard alpha blending for a software related reason: additive blending is commutative and requires no app-side primitive sorting, whereas normal alpha blending requires a back to front sort before draw call submission, to give the expected results.
     
    theANMATOR2b likes this.
  6. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    Is Unity smart enough to automatically not sort mesh objects that use additive shaders? I was under the assumption it was not.

    For particle systems it's understandable there would be some perf savings as you don't have to sort the individual particles, hence there being a manual switch. If you mix additive and alpha blended objects however you still have to sort everything and I wouldn't expect the savings from knowing not to sort additive sub meshes to outweigh the cost of the added complexity of the sort delegate. Best case I could see not sorting entire sorting groups & queues if all objects within are only using additive shaders.
     
  7. bleu

    bleu

    Joined:
    Apr 6, 2013
    Posts:
    41
    When I wrote that one message I totally forgot about the fact that normal (not additive) alpha blending disables writing to the zbuffer anyway (since you have to see through objects) and that the sorting has to be done manually in the back-to-front manner by the engine (where the colors of stuff in the front will "overlap" stuff behind).

    bgolus: I don't I typically use alpha testing. I either use additive or normal alpha blending. The latter in case I wanted a result that was order-dependent. I guess I'm a bit hesitant to use transparency in general because of fill rate limitations. However when I think about it, it seems like the Samsung S6 might have a decent fill rate compared to its predecessors...

    Let's see. If I remember correctly Unity by default renders to a 1024x1024 texture per eye (out of two eyes), which means that we render to a 2048x1024 buffer -- about 2 million pixels. If we want to render at 60 fps, that's about 125 million pixels per second. The theoretical peak of the Mali-T760 GPU in the S6 is about 9.6 gigapixels/second (clockrate * shader count), which is quite a bit higher. I suppose that one could reach that limit by using lots of multipass shaders or transparency.

    I realize that I am oversimplifying the situation here since the phone's GPU will not always run at full clock rate because of thermal issues, and that it might be tough to know how a theoretical limit could affect a running application. There might be components of the phone's GPU that relate to fill rate that I am not taking into account here. And I'm not sure if I am relating rendered screen pixels to fill rate properly.
     
    Last edited: Jun 28, 2016