Search Unity

Gamma vs Linear Space

Discussion in 'Shaders' started by MoistDumpling14, Jan 16, 2019.

  1. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
  2. DonPaol0

    DonPaol0

    Joined:
    Sep 30, 2019
    Posts:
    11
    Just to clarify my setup:

    - Unity 2020.1.0b6
    - URP 8.0.1 / Gamma
    - Very Lightweight Custom Foliage Shader doing all the work per vertex
    - Simple Terrain w/o details & trees (as they are pathetically slow)
    - AlphaToMask without Clipping

    @hippocoder: Nothing more, and I'm quite aware of my assets and imports :)

    @atomicjoe: A frame rate of 52 fps seems unacceptable, as Oculus will deny any app that's permanently below the 72 fps target. In these regards I've been quite happy with LWRP / URP performance, because I instantly managed to get full framerate, with the built-in pipeline I never did.

    I just can say that I get a full 72 fps with even some Early Frames with the whole screen covered with 10+ stacked grass billboards. And I can confirm: With clip() performance is unbearable, as the whole tile is discarded and has to be rerendered.


    The interesting question is indeed: Why does Linear URP blit and built-in doesn't? Is this a bug of Linear URP?
    If anybody has a suggestion for what I may be doing wrong, I'd be more than grateful :)
     
    glenneroo likes this.
  3. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    I would have expected A2C to always be slower, or at least be the same as alpha testing. Super fascinating that
    discard
    is slower on Quest. One of supposed “benefits” of
    discard
    is it can allow GPUs to skip the “rest” of the shader, and there’s a lot of older optimization guides that suggest putting the
    discard
    as early in the shader as possible. I’ve always been curious about this since, a) compiled shaders rarely look anything like the original written shader, with operations often shuffled around and done in different orders anyway, and b) because the era of GPUs that kind of suggestions came with couldn’t do any kind of real branching so there’s presumably no way it could have skipped the rest. Modern GPUs, including mobile GLES 3.0, can do branching, but it might come with a significant cost. So I wonder if the
    discard
    is now actually branching and forcibly skipping the rest of the shader code, but this ends up slower than running the code anyway. When running A2C it doesn’t cause the branch. That’s just a wild guess though, my mobile GPU optimization knowledge is still very much stuck in GLES 2.0 days, primarily PowerVR or early GearVR related. Back then discard or A2C were both way too slow to use for much of anything, plus A2C didn’t even always work on those devices. Me complaining to Unity got it implemented on several platforms that it hadn’t been enabled for. :p

    On PC, I actually do both
    clip()
    and A2C and saw an improvement in some cases, so the current Quest port I’m working does the same. But I guess I should re-evaluate that. A lot of low level GPU documentation actually suggests using
    clip()
    on alpha blended shaders! The usual point being that
    discard
    is slow when used with depth writing shaders, but fast when not. I guess I should test that and see if there’s any difference in that case or if
    discard
    is actually always slow on Quest.


    BTW, on PC & Consoles, alpha testing is faster than alpha to coverage for one very specific reason. For alpha testing each entire pixel is either covered or not. With alpha to coverage multiple surface can be visible and still need to be evaluated & resolved. In the areas outside of the transition (fully opaque or fully transparent) I never saw any perf difference between the two.
     
    glenneroo and hippocoder like this.
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    So, I did a bunch of benchmarks on the Quest with a variety of different shader setups using 100 or 500 overlapping 1 unit scale particles spawned at once at about 1 unit away sorted near to far for best case depth discarding. Also had them fade out from full opacity to fully invisible slowly so and stay at full or max alpha for a few seconds to see if there was any difference between fully opaque and fully transparent. Max Particle Size set to 10000 to prevent them being scaled by distance (though setting them to 1 and making them huge would technically be a more consistent test).
    • Alpha Blend
      100: 33~34 fps, GPU 4 @ 99. Drops down to <20s if I lean forward into them.No difference between opaque and transparent.
      500: 3~17 fps, GPU 4/5 @ 99. Quest started freaking out and loose tracking, locking up occasionally for a second or two, visual artifacts ... this was bad. Eventually full on crashed back to the home.
    • Alpha Blend w/ Alpha Test
      100: 33~34 fps, GPU 4 @ 99. drops down to <20s if I lean forward into them. No difference between opaque and transparent.
      500: 3~17 fps, GPU 4/5 @ 99. Same issues as above.
    • Alpha Test
      100: 72 fps, GPU 4 @ 75. If I leaned forward it'd drop to ~68 fps while they were opaque, ~64 when invisible with GPU 4 @ 99!
      500: 24 fps, GPU 4 @ 99. Leaning forward dropped that into the high teens. Leaning to the side so the first particle wasn't occluding everything fully dropped down to 22 or so.
    • Alpha Test w/o ZWrite
      100: 72 fps, GPU 4 @ 99. Leaning forward dropped it town to ~64 no matter the opacity.
      500: 22 fps, GPU 4 @ 99. Leaning forward dropped that into the low teens.
    • Alpha to Coverage
      100: 72 fps, GPU 4 @ 72. Leaning forward drops down to 66 fps GPU 4 @ 99. Cheaper than alpha testing by a tiny bit, but probably just within the margin for error for how loosely I was doing this test. No difference depending on opacity.
      500: 26 fps, GPU 4 @ 99. Leaning forward dropped to similar high teens as Alpha Test.
    • Alpha to Coverage w/o Zwrite
      The same as Alpha to Coverage?! Literally no discernible difference.
    • Alpha to Coverage w/ Alpha Test
      Same?
    • Alpha to Coverage w/ Alpha Test w/o ZWrite
      Nothing different again.
    One big thing I noticed. Most of these test that were running below 72 dropped into the 40s or lower if the Guardian system was triggered. It's pretty expensive! This is also a super simple scene with a flat unlit ground plane and the default skybox (which is actually somewhat expensive).

    So, Alpha blending is more expensive than alpha testing‽ ZWrite is actually even more potentially helpful for performance if it occludes something that's more expensive, but probably depends heavily on draw order since the Adreno isn't tiled deferred (need to test more there). Alpha to Coverage does come out slightly faster in the fully transparent case vs Alpha Test, just because it doesn't seem to vary with opacity like Alpha Test does. But also this was using the default particle texture so it was blocking far less in the center than the alpha test was. Adjusting the shader to sharpen the alpha so the base Alpha to Coverage shader had similar visual coverage as Alpha Test saw a difference between full and min opacity, with it maintaining 72 fps when leaning in & fully opaque!
     
    st-VALVe, MaxEden, glenneroo and 3 others like this.
  5. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Thanks for sharing the tests. I think this pretty much goes against most conventional mobile wisdom and this is pretty much great news really for my project as you can imagine.
     
    AcidArrow likes this.
  6. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    I need to do another pass at these, testing in a more real world scene than the “empty world” I had to see if the numbers hold up.
     
  7. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    So, I benchmarked again to settle this once and for all. (for the Quest, at least)
    GPU level always 4.
    Overdraw benchmark over a regular game scene (with other things in it)
    Results:

    Alpha Test using Clip on a fully opaque zone: 68 fps / 99% GPU use
    Alpha Test using Clip + Alpha to Coverage on a fully opaque zone: 68 fps / 99% GPU use
    Alpha to Coverage only without Clip on a fully opaque zone: 71 fps / 99% GPU use
    Alpha Blending + Clip on a fully opaque zone: 54 fps / 99% GPU use
    Alpha Blending (without Clip) on a fully opaque zone: 48 fps / 99% GPU use

    Alpha Test using Clip on a fully transparent zone: 72 fps / 95% GPU use
    Alpha Test using Clip + Alpha to Coverage on a fully transparent zone: 72 fps / 97% GPU use
    Alpha to Coverage (without Clip) on a fully transparent zone: 50 fps / 99% GPU use
    Alpha Blending + Clip on a fully transparent zone: 47 fps / 99% GPU use
    Alpha Blending (without Clip) on a fully transparent zone: 48 fps / 99% GPU use

    Conclusions:

    Both Alpha Test and Alpha to Coverage are always faster than Alpha Blending
    Alpha Test is faster than Alpha to Coverage on fully transparent zones
    Alpha to Coverage is faster than Alpha Test on fully opaque zones


    Alpha to Coverage and Alpha Test do work differently at the hardware level.
    Alpha to Coverage is faster when the surface is mostly opaque but is very slow when the surface is mostly transparent.
    Using both at the same time should give the best results at a whole scene level, since the performance gains on fully transparent areas are way higher that the loses on the opaque areas... BUT you'll have to test it on your particular scene to see if it's worth it overall:
    On a test scene full of trees, Alpha to Coverage alone on the foliage gave me significantly better performance than Alpha to Coverage + Clip. I guess it depends on the amount of fully transparent areas: with few fully transparent areas -> only Alpha to Coverage. With lots of fully transparent areas -> Alpha to Coverage + Clip
    In the case of foliage I found Alpha to Coverage alone to work faster.
    EDIT: As a result of this tests, I have now put a conditional compiler feature in my cutout shaders to always use Alpha To Coverage but enable and disable Clip() on a material basis. This way I can enable it if I see a particular object is rendering too much completely transparent pixels and it boosts performance A LOT.

    EDIT: updated alpha blending values with proper z-write off and transparent queue :p
     
    Last edited: May 10, 2020
    koirat, hippocoder, glenneroo and 2 others like this.
  8. florianpenzkofer

    florianpenzkofer

    Unity Technologies

    Joined:
    Sep 2, 2014
    Posts:
    479
    Adreno 5xx and newer have a form of hidden surface removal called “low-resolution z buffer” (LRZ).
     
    AcidArrow and hippocoder like this.
  9. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Is it automatic? LRZ I mean.
     
  10. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    Oh, that's very cool, thanks for the info.
    Indeed it seems to be working for opaque geometry, although front to back render is still faster and works with alpha to coverage too.
    On my test scene, forcing a back to front rendering order of opaque geometry writing to the z-buffer gives 72 fps and 95% GPU usage, which is slower than forcing a front to back rendering order: 72 fps and 70 % GPU usage. (although much much faster than it would be if there wasn't any hidden surface removal. So it's actually working!)
    Sadly Alpha to Coverage seems to disable this optimization per tile, since Alpha to Coverage and a full opaque surface gives 52 fps and 99% GPU usage, which is much slower than forcing a front to back rendering order: 71 fps and 99% GPU usage.

    @florianpenzkofer Unity should force a front to back rendering order for every render queue the same way it forces a back to front render order in the transparent queue. This way we would have much better performance by default. Right now I'm doing it myself using raycasts and it works flawlessly.

    @hippocoder thanks for completely ignoring my benchmarks! :) I guess we'll have to wait for someone with a higher pedigree than me to say the same for people to actually believe it.
     
  11. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Not sure how you have this impression... the investigation is still ongoing since it's not clear if using A2C will cause a resolve on the relevant tile.

    Not dismissing your efforts, you're obviously helping all of us.
     
  12. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    Well, we have the benchmarks, I don't know what more we need.
    Any amount of additional technical insight will be welcomed, but it will not change the frames per second :p
    Sorry if I overreacted :oops:
     
    hippocoder likes this.
  13. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    @bgolus Thanks again for the alpha to coverage article and the
    Code (CSharp):
    1. o.Alpha = ((tex.a - 0.5h) / fwidth(tex.a)) + 0.5h;
    trick to have crisp antialiased borders!

    Everybody doing VR should read it!
     
    MaxEden and hippocoder like this.
  14. florianpenzkofer

    florianpenzkofer

    Unity Technologies

    Joined:
    Sep 2, 2014
    Posts:
    479
    Yes
     
  15. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    Front to back is still faster and works with Alpha to Coverage, though.
     
    hippocoder likes this.
  16. florianpenzkofer

    florianpenzkofer

    Unity Technologies

    Joined:
    Sep 2, 2014
    Posts:
    479
    https://docs.unity3d.com/ScriptReference/Rendering.OpaqueSortMode.html
    Afaik OpaqueSortMode.Default is currently NoDistanceSort on Adreno that have LRZ and on PowerVR/Apple GPUs. It’s FrontToBack on others (including Mali that have “forward pixel kill”).
    Are you suggesting that the current default is not ideal for Adreno?
     
  17. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    My benchmarks are an extreme case. Stupidly high overdraw in an otherwise overly simple scene and a very basic particle shader (just outputs tex color * vertex color, with option clip). @atomicjoe ’s benchmarks are probably closer to the real world case most people will see, including me. It’s highly plausible that my test case is hitting some other bottleneck that’s skewing the results I see vs @atomicjoe ‘s (i.e. we can both have accurate benchmarks even if they don’t agree.)

    I just did the simple scene because it builds, uploads, installs, and loads on the Quest faster than my real project takes just to install after building & uploading.

    @atomicjoe Are your shaders all using a transparent queue, or are the alpha test ones using AlphaTest (2450)? I realize my test was with all of them using Transparent (3000).
    @florianpenzkofer Thank for responding to this thread. It’s useful to have someone who can actually see into the guts of Unity. Is the LRZ flag set conditionally by Unity? Like only for opaque queue objects with ZWrite? Or is it just left enabled and up to the device to decide what to do? Do you know if can transparent queue surfaces can contribute to the LRZ depth?
     
  18. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    Yes, but even more: I'm suggesting even Opaque sort mode FrontToBack is not enough, since it only sorts "roughly" front to back. We need an actual accurate front to back sorting for mobile in general. Just like the transparent queue but inverted.
    I actually came to realize this when developing for the Nvidia Shield (and thus Nintendo Switch): forcing a manual front to back render using raycasts and an explicit sorting order in the mesh renderer gave me a HUGE performance boost.
    This same method works wonders in the Oculus Quest too.
    And I'm talking about real life actual experience here.

    AlphaTest + manually forcing the render order front to back.
    Putting them on the Transparent queue has some overhead even when forcing the render order front to back. (don't know why)
     
  19. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    Built-in forward only sucks when you use realtime lights. Even adding a SINGLE point light will add an extra pass to your surface. (only a single directional light is supported in the base shader pass).
    But using URP is not the solution, since it's more limited than built-in forward, is incompatible with nearly a decade of assets in the asset store and is incompatible with surface shaders.
    So the actual solution is to complement the built-in forward renderer with custom shaders that support realtime lights in a single pass. Which is exactly what Valve did when they had to use Unity for the HTC Vive Lab demo.
    And that's what I actually did myself when I started with the Quest some weeks ago, completely disappointed with the way Unity tech is handling the new scriptable render pipeline.
    So now I have all the realtime spotlight and point lights I want in a single shader pass in built-in forward. With blackjack. And hookers.

    Edit: If you want multiple realtime lights in a single pass without implementing it yourself, check out Shader One on the Asset Store.
    The guy that develops it is f***ing good.
    In my case I just wanted to do it myself and learn from it.
     
    Last edited: May 10, 2020
    a436t4ataf and bluescrn like this.
  20. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    Checked Alpha to Coverage using Queue 2450 instead of Queue 3000 like above ... and now I can't get the frame rate to drop at all even leaving forward with 100 particles! So there's definitely a difference between the queues.
     
    glenneroo, GridWanderer and AcidArrow like this.
  21. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,023
    Nope, this is set automagically by the driver. I don't think there's a hook exposed anywhere to control this.
    When we see an Adreno GPU with LRZ support, we just modify the default Opaque sort mode to not do front-to-back sorting, as @florianpenzkofer said.

    I think this is highly dependent on the content.
     
  22. DonPaol0

    DonPaol0

    Joined:
    Sep 30, 2019
    Posts:
    11
    Nice, this thread alone gave me more insight than countless I've crawled the past weeks. Kudos again to @bgolus for the great article on AlphaToMask!

    @atomicjoe: interesting point, I think should give it another try. As I'm already doing custom RT & LM lighting, extra passes are not the issue.

    But I'm still struggling with the Linear vs. Gamma issue regarding the additional sRGB blit in URP... any suggestions on that?
     
    Last edited: May 10, 2020
  23. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    All the tests I have done show Linear is at least as fast as Gamma if not faster. Blit or not blit, doesn't make a difference for that.
    But then again, I don't use URP but built-in forward, so I don't know.
    Maybe I don't understood you well, but extra passes are the major issue on mobile hardware. More than anything else. More than draw calls.
    Mobile hardware has an extremely limited bandwidth and rendering several passes one over the other will make you hit the limit very very fast. So the most important optimization is to avoid rendering a single pixel more than once.
    Every pixel can have a very complex pixel shader and still have great performance, but as soon as you render several times the same pixel even with very fast pixel shaders, your FPS will hit rock bottom.
     
    Last edited: May 10, 2020
  24. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    For opaque rendering on Adreno 540, in what scenario does front to back not become a default choice? I mean wouldn't you want this for all opaque rendering?
     
  25. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Also we need Alpha To Coverage added to Shadergraph - it is not available there yet.
     
    FaberVi likes this.
  26. AcidArrow

    AcidArrow

    Joined:
    May 20, 2010
    Posts:
    11,791
    I'm guessing the scenario is when you're CPU bound and not GPU bound. (since front to back weighs on the CPU, AFAIK).

    Although for mobile hardware, I don't know which game isn't really GPU bound. I'm willing to bet for real game scenarios, front to back will net better overall performance. So @florianpenzkofer , please take another look at Adreno 5xx on a real project, and re-evaluate on what the default should be.
     
    atomicjoe and hippocoder like this.
  27. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Ah I never considered CPU sorting as being slow enough. I mean even 500 objects sorted in C++ optimised code on a quest should basically be no problem. You don't even need real distance, just relative. That's extremely fast.

    I've learned something today though and @atomicjoe 's raycasting doesn't seem as wild as I thought! I could improve some more on it with some Culling Groups fun.

    I guess Unity's customers do really crazy things on mobile that can add up to it being slower... like you said Acid. So it makes sense.

    It's been a great 24 hours of digging for a lot of people on the forums regarding perf intricacies and a reminder to me you can never assume with those mobile GPUs :D

    Thanks Atomic, bgolus and everyone else. Very helpful to my planning and project!
     
    atomicjoe and AcidArrow like this.
  28. florianpenzkofer

    florianpenzkofer

    Unity Technologies

    Joined:
    Sep 2, 2014
    Posts:
    479
    @AcidArrow @atomicjoe Changing the defaults is always very tricky because there is almost no way to do it without making some users mad :).
    However it sounds like a reasonable request for Unity to support more opaque sort modes, e.g. one that sorts purely by distance and no “rough”/“bucketed” sorting.

    It’s not just the cost of sorting itself. Not sorting front-to-back allows you to sort by Material etc so you may end up with less state changes.
     
  29. florianpenzkofer

    florianpenzkofer

    Unity Technologies

    Joined:
    Sep 2, 2014
    Posts:
    479
    (LRZ). Yeah afaik alpha-to-coverage, clip/discard all disable any kind of early depth testing (including hidden surface removal) because the depth is only known after fragment shading (except when forcing early depth testing).
     
  30. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Would be a nice touch! I'd probably use it, most of my stuff will be opaque and the same shader (for SRP batching). Some will be A2C.
     
    atomicjoe and GridWanderer like this.
  31. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    Well, this is how they do it on Unreal Engine for mobile:
    First they group draws per shader to reduce state changes, but then they still finely sort inside those groups to render front to back.
    "Roughly" is NOT enough.
    Check the GDC video starting at 32:
    https://gdcvault.com/play/1020860/Next-Generation-Mobile-GPUs-and

    I have no experience with PowerVR, but on Adreno and Tegra X1, I can atest fine front to back sorting does increase performance a lot. So much in fact that I can make it throwing rays and disabling batching and still have no sweat on the CPU and be faster on the GPU.
    Maybe PowerVR is using some patented technology that Qualcomm had to circumvent to implement their own hidden surface removal and as a result it's less performant. Or maybe PowerVR would benefit from it too, idk.
    But the FPS don't lie: front to back is faster. Always.
    (ok, maybe if you have thousands of objects it will not be. But what mobile game has thousands of objects to render?? it would choke on it anyway)

    Also, I usually have to disable static and dynamic batching precisely because it's LESS performant than more batches with correct front to back rendering.
    I don't know why everybody is so obsessed with draw calls on mobile, while the biggest culprit of bad performance is overdraw.
    In my experience, batching and even GPU instancing has always given me less performance than accurately sorting from front to back and avoid overdraw at all costs.
    On mobile, usually the bottleneck is in the GPU, not the CPU.

    (GPU instancing has ALWAYS given me LESS performance than a regular render on mobile. Maybe it would be useful for, say, an asteroid field with thousands of them, where the overhead of GPU instancing is less compared to the amount of objects to render. But on regular scenes, GPU instancing is always slower.)
     
    Last edited: May 10, 2020
  32. florianpenzkofer

    florianpenzkofer

    Unity Technologies

    Joined:
    Sep 2, 2014
    Posts:
    479
    I believe every depth bucket is also sorted front-to-back but other criteria have priority.
     
  33. DonPaol0

    DonPaol0

    Joined:
    Sep 30, 2019
    Posts:
    11
    Sorry for not being clear, was trying to say: Since I'm doing custom lighting, I don't need extra lighting passes.

    Regarding the blit issue on URP I'll run another test with an empty scene and standard shaders.
     
  34. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    If this is the case then for just Opaque + A2C I should be fine leaving it on default...?

    Or what advice would you give?
     
  35. florianpenzkofer

    florianpenzkofer

    Unity Technologies

    Joined:
    Sep 2, 2014
    Posts:
    479
    This was about general sorting of opaque objects (OpaqueSortMode.FrontToBack).
    Unity does coarse sorting front to back into buckets. Per bucket Unity sorts by other criteria (such as light index, shader, material, pass, mesh). In case all these criteria are equal then you should also get front-to-back sorting per bucket.

    As I understand it @atomicjoe argues that are many cases where a full front-to-back sorting would be better than Unity's . Unity currently doesn't have an easy way to do that.

    On PowerVR and Adreno since 5xx Unity skips the coarse sorting completely because it is assumed that the driver/GPU does its hidden surface removal and sorting is not needed.

    All of this is mostly for built-in pipeline.
    SRP is more configurable, but at first glance it also does not allow pure front-to-back sorting.
    It also does *not* have the special case for PowerVR and Adreno and does *not* sort by distance within a bucket.

    I'll follow up on this.
     
    AcidArrow and hippocoder like this.
  36. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Thanks this is great info. Looking forward to your findings.
     
    AcidArrow likes this.
  37. florianpenzkofer

    florianpenzkofer

    Unity Technologies

    Joined:
    Sep 2, 2014
    Posts:
    479
    NotaNaN, liam_unity628 and hippocoder like this.
  38. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Thanks, very interesting.
     
  39. a436t4ataf

    a436t4ataf

    Joined:
    May 19, 2013
    Posts:
    1,933
    Going back to the linear-vs-gamma topic ;) ... after a day of scratching my head about why WebGL builds were flickering like mad ...

    TL;DR: don't use gamma for anything unless you absolutely have to because it has significantly many extra bugs in Unity that Linear does not ... e.g. I logged one last year about texture-importing, that doesn't work correctly in gamma ... and today I discovered that gamma + WebGL = random lighting errors (that vanish if you switch to linear - but that requries WebGL2, which causes new problems, LOL). My guess at why Gamma is less supported than Linear is is simply a side-effect that Unity aren't doing much testing in Gamma these days, and so they don't notice the bugs in Gamma.

    But also remember: all old projects that you bring into a recent copy of Unity will keep whatever mode they started with; if they're old enough, that will be Gamma, and UnityEditor won't warn you that you're in this messed-up lighting mode ;). Could have saved myself hours if I'd remembered to check that sooner
     
    atomicjoe likes this.
  40. a436t4ataf

    a436t4ataf

    Joined:
    May 19, 2013
    Posts:
    1,933
    You can also argue (sadly, while weeping in frustration) that in 2020: Apple and Microsoft still don't support Linear in their flagship web browsers (Safari and Edge both run WebGL1.0 - which is now > 3 years of out date, so they're being pretty impressively slow...) ... WebGL1 can't do Linear (according to the UnityEditor - I assume this is a limitation of the webGL1.0 spec).

    So, sadly, we're stuck with a large number of people - approx 15% of all web users, just for Safari alone! (about 7-8% of all desktop users) - who can't see content if you use Linear. Even though their hardware can happily render it, their browser won't.
     
  41. DonCornholio

    DonCornholio

    Joined:
    Feb 27, 2017
    Posts:
    92
    You can do the colorspace conversion in the shader yourself and get linear lighting in gamma color space project setting. Inputs from srgb to linear and output back from linear to srgb. You can even do tonemapping before the output conversion. That's what i did on a project where we needed pbr + linear, but had to use webgl 1.0 - i never understood why unity did never bother to offer that option with a simple toggle
     
  42. ctedin187

    ctedin187

    Joined:
    Aug 20, 2018
    Posts:
    11
    WebGL on a mac (using Safari) still doesn't support latest version of WebGL 2.0. Therefore, to be backwards (emphasis on backwards!) compatible, one needs to export the whole thing as WebGL 1.0 and force gamma. Ugh. Does anyone know a workaround for these seemingly endless Mac issues?
     
  43. creat327

    creat327

    Joined:
    Mar 19, 2009
    Posts:
    1,756
    How do you do this? You created a collider on every texture and cast a physics.raycast on them from the screen?
     
  44. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    On every gameobject, yes.
    Sort of: I actually cast a single very wide and very thin box from the camera position in the Camera.Forward direction using Physics.BoxCastNonAlloc and then set the render order of each renderer component hit by it from first to last as front to back using Renderer.sortingOrder using the same order of the array I get from the box cast.

    I should note this brute force approach is not always a good idea, since you are effectively disabling static batching. So if you already have lots of draw calls, this will only worsen your case. However, if draw calls are very low but you are still GPU bandwidth or ALU bound, then this will improve things A LOT.
     
    st-VALVe, MaxEden and creat327 like this.
  45. Gulliver

    Gulliver

    Joined:
    Jan 8, 2013
    Posts:
    101
    I can't understand one thing. Imagine a tiger in the woods. It "stores colors" in a linear way. So when the signal from the tiger comes to the eye, it (the eye) makes a gamma-correction and the brain receives tiger colors more brightly than in reality. So the brain can distinguish the tiger and command the body to run away. Okay, let's make a "snapshot" of the real world with woods and tiger -- the photo. The eye looks at the snapshot, makes gamma correction, and sends a signal to the brain. So -- as far as the brain receives the same picture -- photo stores colors in the same linear way as the real world. In both cases, gamma correction was made by the eye. So here is the question -- for which purposes make some gamma correction in the photos, textures, monitors, etc. if the eye always receives a signal in a linear way and then transforms it to the gamma space for the brain? In other words -- why for God's sake all textures are gamma-corrected ?!
     
  46. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    The eyes don't make any gamma correction.
    It's a screen display thing that depends on the color space you're using, which is usually sRGB.
    If a TV screen could be as bright as the sun we wouldn't need any gamma correction: we would just feed it linear lighting.
    But since displays have a very limited brightness range and the signal is also bandwidth limited, a gamma correction was used to put more bits in the zone the eyes are more sensible to.
    In fact, newer HDR standards record the image in linear fashion because they have much higher bit depth. Then the TV applies its own lighting transform depending on its HDR capabilities.

    It's actually a quite complex subject (as everything technical with a long history)
    The thing is that the sRGB color standard has its gamma shifted and must be compensated for calculating light correctly (linearly) because otherwise the lighting is wrong, but it's because we use the sRGB standard. If you used the REC2020 standard (HDR) instead you wouldn't need to convert anything.

    Also, if your textures are not in HDR, they are in sRGB format, and this means their gamma is shifted too, so you have to correct it before applying it over a 3d model so it behaves as linear light.

    If everything was in the REC2020 color space (HDR), we wouldn't need to gamma correct anything.

    You can learn more about all of this reading this wikipedia entry:
    https://en.wikipedia.org/wiki/Rec._2020
     
  47. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    Also, I think you are mixing gamma correction with tone mapping, which are two different things.
    Tone mapping is the process of converting a final image with a brightness range higher than sRGB can show and compress it so that it can fit into its limited range.
    This is actually an artificial effect that can be achieved in different ways (hence the different tonemap algorithms).
    Even analog photography does it via the sensibility curve of the photographic film to light, but it's still artificial and every photographic film behaves differently in that aspect.
     
  48. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    So… color spaces, gamma correction, sRGB, and high brightness / gamut ranges are a complex topic. One I don’t have a complete understanding, of especially as we get into HDR and things like rec2020 or DCI-P3, etc. But here’s the thing about sRGB or similar gamma spaces*: it’s primarily an image compression technique.

    Also, biological eyes do not react to light linearly. Nor does our human brain perceive the data from our eyes linearly. Which is why sRGB exists at all.

    We’ve known for a long, long time that humans do not perceive light linearly. Instead we perceive light roughly logarithmically. Our ability to perceive the difference between two shades of brightness also isn’t all that good, especially between brighter lights.

    sRGB exists to exploit those facts to allow images to be stored with much less data than would otherwise be needed. If we stored imaged in a linear format that recorded the pixel brightness in real world photon energy levels, we’d need to store images with at least 100 bits per color to match an sRGB image that uses 8 bits per color, the latter of which is the standard for most image formats. Why? Because we’re good at perceiving the difference in brightness between darker values, and bad at seeing the difference between brighter values. So sRGB applies a curve to the brightness so that there’s less precision given to the bright parts and more to the dark.

    Basically sRGB means you can store or send a decent looking image using significantly less data. Which is something that mattered in the early days of computing.

    As for why we can now render in linear space, computers today are fast enough to handle floating point numbers and math with ease. Floating point numbers happen to also be effectively logarithmic in their precision. In fact the term “floating point” refers to the fact the total number of decimals that can be accurate represented in a floating point number is always the same, and the decimal point can move (aka “float”) around. A 32 bit precision floating point number for example only the first 7 digits are guaranteed to be accurate, and anything beyond that may be. For example, 123456789 cannot be accurately stored in a 32 bit number, it’d end up as 123456792. Similarly 0.000123456789 cannot be accurately stored, it’ll end as something like 0.00012345670256. But you’ll notice those first 7 digits (ignoring the leading 0s) are accurate! So we can calculate light using linear intensity values without loosing precision in places where it’s important.

    So why don’t we use floating point for stored images now? Well, because it still takes up more space, and 8 bit sRGB is good enough most of the time.
     
  49. Gulliver

    Gulliver

    Joined:
    Jan 8, 2013
    Posts:
    101
    Thanks for replies. My project was set to Gamma. Recently, I wanted to turn on post-processing, Bloom for example. And I found that it works correctly only in the Linear space. After switching to linear space, about half of the locations began to look better than before, but the second half is much worse and requires reworking of lighting and textures. Now I'm not sure what to do - does it make sense to spend time on the transition to linear space, will it be worth it in the future for better graphics quality?
     
  50. AcidArrow

    AcidArrow

    Joined:
    May 20, 2010
    Posts:
    11,791
    I mean, only you can answer that.