Search Unity

Huge performance hit with HDRP vs Built-in pipeline

Discussion in 'Graphics Experimental Previews' started by manutoo, Aug 17, 2019.

  1. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    Hello,

    I'm working on the new version of my tennis game using Unity 2019.1.14f1 + HDRP 5.16.1 .

    And I just compared the performances with my previous tennis game, in 2 nearly identical views, using Unity 5.6.7.

    On the top right of the screenshots, I'm checking the GPU usage is ~100% ; on the bottom right, you can see the CPU usage.

    My rig : Widnows 10 + Intel I7@4.6ghz + 32GB + NVidia GTX 1060 6GB at 1920x1200, with the latest NVidia drivers.

    Only post processing effects in both games for the test were exposition & bloom.

    In build, simple scene, no shadow ; 165 vs 590 fps, HDRP is ~250% slower :

    2019-08 - Benchmark TE4 - Simple [No Shadow].jpg
    2019-08 - Benchmark TEM2 - Simple [No Shadow].jpg

    In build, complex scene, even the crowd has shadow ; 125 vs 315 fps, HDRP is ~150% slower :

    2019-08 - Benchmark TE4 - Complex [All Shadows].jpg
    2019-08 - Benchmark TEM2 - Complex [All Shadows].jpg

    Note : each crowd mesh includes dozen of people, not only one.

    The legacy game uses the deferred path, because the forward path was much slower with the stadium shadows.
    The new game uses the forward path, because the doc states it's faster ; although I tried the deferred path and the performances seemed similar.

    Both games use realtime GI + linear color space.

    Both games use reflection probes. The new one uses 5 instead of 1, but after checking, it doesn't change noticeably the performance.

    Here the stats from the editor for each screenshot, top left is HDRP Simple, bottom right is Legacy Complex :

    2019-08 - Benchmark TE4+TEM2 - Stats.jpg

    The new games use materials for the stadium instead of composed textures, and thus there are more draw calls, although nothing crazy (any good PC rig can handle thousands of these).

    Note : the stats with in Unity 2019.1 seem bugged, as the tris & verts counters count only the skinned meshes ; disabling anything else doesn't change them. Even the shadow casters wrongly says 0 for the complex scene as the shadows were on.

    So anyone would have any idea what's going on ? Are there obvious mistakes that could create such a performance hit ?

    Thanks in advance for any tip ! :)
     
    Last edited: Aug 17, 2019
  2. AcidArrow

    AcidArrow

    Joined:
    May 20, 2010
    Posts:
    6,130
    AFAIK, HDRP is not done so there's that, and also it's supposed to be really good and fast when you take advantage of its strong points, like using a lot of lights.

    It looks to me like your game may be better served by LWRP or Built-In.
     
  3. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    Note : I had mistakenly tested with my 2019.1.12f1 build.
    With the 2019.1.14f1 build, the complex scene fps are up to 125 from 100. It's a bit better, but it should be at least at 200 fps to be reasonable considering the "complex" scene isn't that heavy.

    I guess now I have to test with 2019.2 ... :p

    However, between the 2 Unity versions, I also had turned off the Contact Shadows support (although I didn't use them), so maybe the fps boost came from there.

    @AcidArrow ,
    I'm already fan of the HDRP in a general way ; it's hard to set up & dig into at 1st, but overall, it's a major step up over the legacy built-in path. And the LWRP has some serious quality limitations compared to the HDRP (I'm in the process of changing the legacy assets seen in the screenshots above ;) ).

    I found some HDRP optimization advices by a Unity dev, I'm going to deep into them and see if anything has a meaningful impact.
     
    Last edited: Aug 17, 2019
  4. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    The fps boost wasn't from the Unity version change, nor the Contact Shadows support, but from switching to the Deferred rendering. So it's actually faster than the Forward rendering... :cool:

    After I turned off everything unneeded in the HDRP asset & the Frame Settings, and it didn't seem to change a single anything in term of performances.

    I turned off the SRP Batcher and got back my tris & verts stats. The complex scene gives 5M tris + 6 M verts, vs 1.6M tris + 10M verts with the Built-in version. In the simple scene, it's 64k tris + 70k verts vs 31k tris + 55k verts, so it's much closer. I'm not sure what to think about all these numbers. It could plead for a 50% slower rendering, I guess.

    I'm going to try to display the old assets in the HDRP game, it might ease the testing.
     
  5. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    Ok, I put back the old stadium & most of the props. It now gives for the complex scene 1.9M tris + 8M verts vs 1.6M tris + 10M verts with the Built-in version, and 170 fps vs 315 ; still 85% slower for the HDRP.

    I'd settle for a 20% loss, so hopefully the future HDRP versions will be a bit more optimized, because right now I don't see what else I could do to speed things up.
     
  6. rizu

    rizu

    Joined:
    Oct 8, 2013
    Posts:
    1,229
    These benchmarks are kinda pointless, measure the gpu cost in ms if you want to get some real figures (don't use FPS as it doesn't really tell that much). Also would suggest testing on weaker GPU to get some use cases where the perf difference actually matters: 170 fps is probably fine for 100% of your player base..

    As additional note: do you really need realtime GI for this? :)
     
  7. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    @rizu ,
    fps is what actually matters for the end user, it's the only metric that counts in the end. Plus it's super easy to test, and here I need to check ballpark figures, not small gains. :)

    A few user expects to play with their 200 hz gaming monitors, a lot more expect to play with their 120/144 gaming monitors. Others will be expecting to play in 4K. And others will be playing with less powerful GPU, as you have pointed out.

    Plus, it's not 170 fps with my current assets, but with the old ones ; moreover I expect the final fps to lower more & more as I'll be adding stuff along the way, so now is a good time to get a general idea about where I stand.

    Lastly a GTX 1060 is pretty mid-range now, so it's a good reference.

    So anything result under 200fps in my little test is problematic. :(

    BTW, what would you use for measuring the GPU cost in ms ?
     
  8. rizu

    rizu

    Joined:
    Oct 8, 2013
    Posts:
    1,229
    For end users, yes, but you are now posting on a developer forum. Using FPS for perf measurements is bad because it is a sum of many things, you don't really see where you actually sink the perf at all. Use Unity's profilers to see what is expensive and what is not and you don't have to guess or use some super rough ballpark methods like measuring FPS.

    Those few with 240Hz 1080p monitors most likely got a decent GPU already :)

    All this being said, at such light use case project as yours is, you can't beat built-in renderer in perf with HDRP, or even match it. HDRP has more initial overhead.
     
  9. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    My initial goal was to see how the performances compared between the 2 engines. Thus the Fps was the best tool for that.

    Considering the huge performance hit, my 1st idea is that I did some newbie mistakes (I had read the HDRP doc, but there are really sparse).

    Diving into the Unity Profiler gives very little bit of info, except "waiting commands" without further detail for nearly 75% of the rendering thread ; and anyway, the rendering thread is CPU time, not GPU time, so it's nearly useless. So it's not that that will tell me if I did a mistake or not, and what could be optimized.

    So if the HDRP is nearly 100% slower than the Built-in in a normal case, I think it's really a problem.

    In another topic about this performance issues, I read a Unity dev bragging the HDRP would shine when using 200 lights on screen. Ok, great, but how many games actually need to show 200 lights most of the time ?

    A base scene like mine shouldn't be that much slower. A 20% hit would be understandable due to the extra stuff, but nearly 100% isn't.

    So I still hope I missed something important, or that they optimize the hell of it for the official release... :)
     
  10. AcidArrow

    AcidArrow

    Joined:
    May 20, 2010
    Posts:
    6,130
    No it's not. HDRP is supposed to scale really well. Built-in doesn't scale that well. It's supposed to be used when you need a ton of lights and other high end features and it's intended for high end platforms. It's supposed to have a lot of overhead, but then scale really well.

    If that's not suited for you, use LWRP or Built-In.
     
  11. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
  12. AcidArrow

    AcidArrow

    Joined:
    May 20, 2010
    Posts:
    6,130
    You did, but if you are a fan of HDRP, you need to accept that it has higher overhead than the other options, that's how it was designed to be.
     
  13. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    I'm more after understanding what's going on, so after I could do a more educated choice, wether it's tuning some of my assets, working on the rendering side, or like you said, just accept it's slower. Right now, I'm in the blank, and that would be stupid to not look more into it and not fix the things I could fix... :)

    I've read all the blog posts about HDRP, all the HDRP doc, and a few posts on the forum by Unity devs, and right now, I feel it's really hard to know what's the deal with HDRP (just check that other guy post to which I answered as well, he's in the same boat than I) ; I guess there's a higher overhead, but it should be explained & documented in details, so we could understand what we can do to limit it, to take advantage of it (I hope it's not only to render 200 lights, coz that would seriously limit the utility of the HDRP).

    Right now HDRP is the obvious choice of rendering quality, but the performance cost shouldn't be that high.

    I'll quote 2 things from that blog's post : https://blogs.unity3d.com/2018/03/16/the-high-definition-render-pipeline-focused-on-visual-quality/
    Both of these things made me think that HDRP was not the slow hog I got with my test scene. (except if we consider that a GTX 1060 is too old to harvest the HDRP awesomeness)

    In that post, they also never talk about overhead, nor lower performance. They only state you need recent hardware.

    And the 1st quote could mean I didn't configure something correctly, although I checked everything I could think of, but as I don't master all this, I may have had some oversight, thus my reaching for help in this forum... :)
     
    Last edited: Aug 19, 2019
  14. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    I just did another test, on the complex scene, but without the shadows on the crowd, and using the legacy stadium in the new game, in exclusive fullscreen mode (before I had tested in windowed mode ; exclusive mode is ~5% faster with Unity 2019, and ~10% faster with 5.6).

    I wanted to see the fps loss when using a bigger resolution, so I tested 1920x1080 vs 2560x1440 : this is 77% more pixels ; here the results :
    - Built-in : 373 vs 235 fps ; 59% slower
    - HDRP : 203 vs 134 fps : 51% slower

    This is quite similar. So if there's an overhead, it's mostly in the pixel shaders, which means there's a real performance issue right there, as quality wise the shaders are pretty similar looking when using 1 directional light, GI, normal & mask maps.
     
  15. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    25,938
    You could try 6.9 HDRP, which seems a bit newer.
     
  16. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    New test with Unity 2019.2.2f1 + HDRP 6.9.1, in the complex scene, with the new stadium, no SSAO nor FSAA, 1080p vs 1440p :

    - 2019.1 : 156 vs 103
    - 2019.2 : 154 vs 107

    So the new version is a tiny bit slower in 1080p and a bit faster in 1440p. I'm not sure what to think about that.

    Bonus note : the SSAO in HDRP 6.9.1 is completely broken (I opened a new topic about that issue :p )

    Double bonus : the Fps in the 2019.2 editor are super low (around 60 fps instead of ~85) ; if it gets a bit more low, I won't be able to easily test my gameplay in the editor anymore... :confused:
     
    Last edited: Aug 25, 2019
  17. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    As one of the particularity of the SRP is that it's scriptable in C#, I tried to build using IL2CPP, thinking it may help to run faster.

    - still 2019.2, 1080p vs 1440p : 147 vs 108

    So it got a bit slower in 1080p and a tiny mini bit faster in 1440p. Once again, I don't know what I should think about that... :D
     
  18. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    So I downgraded my game to use Unity 2019.1.14f1 Built-in pipeline.

    In the same condition than the previous 3 messages, I got :
    - 1080p vs 1440p : 260 vs 174
    - compared to the best HDRP, it gives : +~70% vs +~60%

    I guess the lower boost on 1440p means my GPU is closer to its limits.

    Side note :
    Test with Light Probes, 1080p vs 1440p : 243 vs 171
    Test with Light Probes + SSAO, 1080p vs 1440p : 218 vs 145

    PPSv2 SSAO @1080p takes about 0.5ms, which is in par with the one from HDRP 5.16.1 .

    So I'll stick to the Built-in pipeline, at least till the HDRP is stabilized & optimized, and works on Intel Iris. HDRP looks better for me, though... :( (but apparently, not to my users nor the Unity users :p )

    EDIT:
    the poly count is slightly lower in the latest version of my stadium, but it gives less than a 1% boost. (I just moved a bit the camera in the HDRP test to get more or less the equivalent of the new stadium :D )
     
    Last edited: Sep 4, 2019
  19. alexandre-fiset

    alexandre-fiset

    Joined:
    Mar 19, 2012
    Posts:
    405
    Well, I wouldn't create a game on HDRP to target such devices ;p

    What you could try is adding volumetric fog, GPU particles, subsurface scattering and deffered decals to the built-in and then compare. Or replicate this setup and run it at 30 FPS, 1080p on PS4. To this I'd say: Good luck :D
     
  20. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    Last edited: Sep 8, 2019
  21. rizu

    rizu

    Joined:
    Oct 8, 2013
    Posts:
    1,229
    You say you need realtime gi, SSS for character skin etc, but none of your screenshots really show either of these clearly - which just makes one wonder why even bother requiring them. LWRP (/URP on 2019.3) would give you these same visuals with less overhead and would be more future proof.
     
  22. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
  23. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    25,938
    You will need to be changing pretty much everything from the probes, the lights, the shadow settings, the HDRP volume asset (global) and the HDRP config asset for engine.

    There's a lot to go through to get optimal performance, but if you can share your full light settings, your full global volume settings and your faull HDRP asset settings we may be able to spot a problem or two.

    HDRP's design will cap your framerate lower but also stay above say, 30fps much longer than built-in will assuming built-in was tasked with having a similar feature set rendered. That's the hallmark of modern AAA graphics engine design - it is not about the highest FPS you can achieve, but allowing the engine to maintain a framerate under some pretty heavy loads by managing bandwidth really well.

    By contrast, built-in is much much closer to Universal pipeline, which is direct (except for the lights and SRP batching), so perhaps that pipeline is actually more suitable? You can only get Universal Pipeline on 2019.3 (beta 2 and above) though.

    I do not think the Intel Iris is a good fit for HDRP. HDRP is designed for consoles and up, basically, but there is still plenty to tweak. And you will have to tweak.
     
  24. rizu

    rizu

    Joined:
    Oct 8, 2013
    Posts:
    1,229
    hippocoder likes this.
  25. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    @rizu ,
    I'll do it in short then..! :p

    Unity introduced the HDRP as looking better, using less VRAM, and being faster than the Built-in engine, without indicating any condition to the extra performances except having a modern compute-shader GPU.

    So I found out it's true, except for the extra performances, thus this topic.

    I also found out it's really nice & easy to use once the initial learning is done, as it's very fast to make everything look right.

    If you can't notice the difference between the HDRP & the LWRP (or the Built-in), it's good for you. And from my polls, it seems to turn out that most people are like you. Me, I can spot the differences even on my screenshots. Especially on my screenshots.

    Thus my big disappointed to being forced back to the built-in pipeline.

    Side note : improving my assets isn't incompatible with getting the more modern lighting system of the HDRP.

    @hippocoder ,
    what I said to rizu, plus :

    I already redid everything, we are forced too, as almost nothing works the same... ;)

    I already checked everything I could (as exposed in this topic). From all my fps tests above, I think we can guess the issues are mostly with the shaders, and at this point there's nothing we can do about it except hoping they'll be optimized in the future.

    I don't know if you're a gamer or not, but I have a 144hz monitor, and once I started to play games at 100 fps, there was no turning back, and for a tennis game, 120 fps is really neat. Just reading "30 fps" makes me shiver... :confused: ... :D

    As a side note, the loss of fps on my Intel HD 3000 was less bad than on my GTX 1060 (using my special IGP render mode on the IGP, and the normal one on the GTX).
     
  26. rizu

    rizu

    Joined:
    Oct 8, 2013
    Posts:
    1,229
    I didn't say there isn't possible visual difference between the renderers themselves, but the advantages can be subtle if you don't need the extra featureset from HDRP (which is one of the main strengths of HDRP - it does a lot out of the box).

    What I did try to tell however few times already was that looking at the in-game screenshots you've posted, it doesn't look like you are getting notable visual gains from using feats like realtime GI or SSS, both which you brought up for arguments for not using LWRP. Your poll images didn't indicate you utilize HDRP feats extensively either which is just another reason to recommend LWRP (or built-in) instead.
     
  27. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    326
    I optimized a bit more my stadium to lower the poly count, and now, I get, still for built-in, in the same condition than the last test :

    - 1080p vs 1440p : 270 vs 175
    - +10 fps ( = ~4%) vs +1fps

    I guess it means at 1440p, the bottleneck is the pixel shader, while at 1080p, the poly count still counts (shameless pun intended :D ).

    Note : I've just upgraded to 2019.2.4f1, but I don't think it should change anything to the Built-in engine.
     
  28. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    25,938
    Built-in will never change apart from being removed one day or fixed if bugs occur. But you are comparing FPS not millisec differences.

    Do you know why that is a terrible mistake to make?