Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Bug Very weird performance behavior of High Quality Line Renderer

Discussion in 'High Definition Render Pipeline' started by Qleenie, Jul 30, 2023.

  1. Qleenie

    Qleenie

    Joined:
    Jan 27, 2019
    Posts:
    734
    Hi,

    it seems the High Quality Line Renderer needs more and more compute time if the cluster takes less screen space. This is completely against the intuition: It should take less compute time if there is not much on screen. The impact is quite severe, going from taking like 2-3 ms of GPU time up to 30-40ms!

    This probably also explains the hiccup in the beginning of Enemies demo, as soon as the character shows up ( performance on my 3080 goes down heavily).

    I did use XR (Multi Instance, as Single Instance does not work with the feature, the provided fix in the 2023.2 beta is also way, way too slow to be useful), DLSS on. Happens also without XR. If DLSS is off, effect is still there and drastic, but a bit less (going from 2ms to like 10ms).
    Changing parameters of High Quality Lines does not have much impact.

    There seems to be something very wrong.

    For the feature itself, which looks really, really fantastic if it works (!!!!), we now have a hug lists of bugs lining up:
    - does not work in builds
    - does not work with re-instancing
    - does not work in XR (the fix provided is not a real fix which would be usable)
    - this one here about weird performance drop

    As I have now like 7 unresolved bugs lined up since beginning of year, I am hesitant to do another bug report, as also in this case would require some coding work (to show the issue by moving the object with Line Renderer more far away).
     
    schema_unity and Unifikation like this.
  2. Qleenie

    Qleenie

    Joined:
    Jan 27, 2019
    Posts:
    734
    I did a bug report anyway, IN-49607
     
  3. schema_unity

    schema_unity

    Joined:
    Jun 13, 2018
    Posts:
    109
    I've been playing around a bit with builds and profiler.

    I also get a very weird behavior, though maybe I'm not understanding it correctly, but it would kind of match what @Qleenie has been experiencing.

    So basically, I've been watching ExecuteRenderGraph->LineRendering in the GPU usage profiler on a test (dev)build.

    Making my window fullscreen on a 4k monitor, looking at the character, LineRendering takes ~2ms.

    Now, when I reduce the size if the window to around 1080p, I expected that time to go down, but it actually goes up. And not by a small amount. It's almost double at around ~3.5ms.

    Geometry processing is the same, sorting (workQueue) is half on 1080p, but "FineStage" almost triples in time spent on 1080p (~1.7ms) compared to 4k (~0.6ms).

    I suspect that switching from 4k to 1080p has the same effect as what @Qleenie described with increasing the camera distance to the hair, though maybe not as severe, but it might be a good way to test for this, as it removes the camera as a factor.
     
  4. Qleenie

    Qleenie

    Joined:
    Jan 27, 2019
    Posts:
    734
    this sounds like the same issue. I got the following response from QA:

    "After some resolution with developers, we have decided that this issue is expected behaviour: as the camera moves further, the area, covered by the hair in the screen space, diminishes. This causes the hair rendering to lose much of the parallelization and causes a couple of long-living workgroups, while the GPU is mostly idle otherwise."

    And I got some recommendations how to deal with this. As none of the recommendations helped, I reopened the ticket. I don't find the explanation reasonable, and if it is, there needs to be a LOD system in the HD Lines, else it's simply not usable in any interactive context with bigger space
     
  5. Qleenie

    Qleenie

    Joined:
    Jan 27, 2019
    Posts:
    734
    Also found this in the code:
    // Launch for every wave on the device.
    // TODO: Querying the correct number of thread groups to launch for device saturation.
    cmd.DispatchCompute(resources.systemResources.stageRasterFineCS, fineStageKernel, 4 * 360, 1, 1);

    Never saw that high of a thread count to be useful in a compute shader, but I did not dig deep into it. But the fact that there is a ToDo above it does not really build confidence in the answer I got from QA ;)
     
  6. schema_unity

    schema_unity

    Joined:
    Jun 13, 2018
    Posts:
    109
    I took a look at it earlier, and also found that to be rather odd. The compute shader itself I don't really understand. It seems to be running like a bunch of consumers waiting for input data, and stopping once there is nothing more queued up (if i understood this correctly at quick glance).
    I haven't looked too much into it. It might be a bit of wasted gpu time when there is not enough to do, but I'm not sure it would be really noticeable, but maybe it is?

    From what I gathered, the hair is rasterized in quads in screen space, so when the area gets smaller, there is more hair depth-wise to process in a very small amount of quads, so it can't process in parallel (at least I think so).

    I don't know if maybe the rasterizer could may work with dynamic/subdivided quads, or something along those lines to handle high density over small screen space better.

    The only other way would be to do a manual LOD like turning off HQLines for hair at a certain distance, which I'm not sure is smoothly doable without lag spikes, and it might look bad. Unfortunately it's also not something possible for a lower resolution. I guess adapting the hair line thickness plus less populated hair would be an option, but I no matter what, it's going to be somewhat messy and I still don't think it's something we should have to do manually.

    I get that unity has a lot of things on their plate, but I can't help but be a bit disappointed over the lack of feedback we are getting over all these issues.
     
  7. Qleenie

    Qleenie

    Joined:
    Jan 27, 2019
    Posts:
    734
    I have some experience in Compute Shader optimization, have not looked into the specific hlsl code yet, though. Maybe as an idea: Let's create a public Github repo with just the LineRenderer directory (this should be allowed, or?) and work together on fixing the bugs? Then we'd be able to easily copy the HDRP package to local packages, still update with new Unity releases, and copy the modified LineRenderer over it in case of update.
     
  8. Qleenie

    Qleenie

    Joined:
    Jan 27, 2019
    Posts:
    734
    after a brief look in the hlsl, I guess the main performance issue is to do an InterlockedAdd inside a variable "for"-loop; this seems to be needed for the queuing / binning system. The whole system is obviously pretty complicated, I guess it would take quite a while to fully understand what's going on. I have experience in binning, but not in queuing in compute shader, and the question would be whether the latter can be avoided at all by some brute force stuff, which would run more often, but better parallelized. But all speculations, not sure if I find time to dive into it, especially if it's completely unknown whether I'll fully understand the stuff and if this would be fixable at all.
    Still think having a Github to remove / change the obvious stuff in c# part would be good to have.
     
    Unifikation likes this.
  9. schema_unity

    schema_unity

    Joined:
    Jun 13, 2018
    Posts:
    109
    Sounds like a good idea. I've done a fair bit of work in compute shaders now, so maybe if I can find the time, I can help dissect this a bit.
     
  10. Kabinet13

    Kabinet13

    Joined:
    Jun 13, 2019
    Posts:
    57

    The repo sounds like a good idea, I bet with enough people we could figure this out. Some kind of dynamic tile size would probably help, though we might have to lower workgroup size a lot, which comes with it's own problems.