Search Unity

Question Is "CreateGPUProgram" only for "uploading" the shader content from CPU to GPU?

Discussion in 'General Graphics' started by bitinn, Jan 11, 2022.

  1. bitinn

    bitinn

    Joined:
    Aug 20, 2016
    Posts:
    961
    Hi,

    There is a part of the shader loading description in the doc I want to get a clearer answer:

    The doc says: "The profiler marker for uploading the shader program to the GPU, and waiting for the GPU to perform any required work, is CreateGPUProgram."

    Screenshot 2022-01-11 104831.jpg

    Say I have a worker process acting as above, can we be certain that the timing (which is on CPU side), only includes the shader uploading? Or does it also includes waiting for GPU to run the shader?

    To put this in another way, does CreateGPUProgram immediately trigger GPU drawcalls? or are they just submitting commands?

    Screenshot 2022-01-11 104850.jpg

    This is important to us because we see a certain VFX Graph trigger a long sequence of "CreateGPUProgram" for its shaders, on a single worker process, causing stalls (it's 2ms on a GTX 3080, on a GTX 1060 it stalls for 20~30ms)

    My search led me to believe ShaderVariantCollection should help eliminate "CreateGPUProgram", but in my test it does not, even though I have both: (1) verify the correct shader variants are included via shader compilation log, (2) shader variant collection is either warm up manually a frame before or preloaded on launch.

    I tried my hands on NSight and RenderDoc, but are unable find a metric for isolating CPU -> GPU command timing.

    In short, I had a hard time understanding whether I should optimize the VFX Graph (GPU Bound), or I should preload these shaders (reduce CPU -> GPU latency). Maybe both?

    EDIT: RenderDoc show time duration this VFX Graph cost around 8ms, but Unity Profiler show the total CreateGPUProgram cost is around 20ms.

    And another question is, I can clearly see, out of about 15 shaders within this VFX Graph, only 2-3 are submitted via different workers, while the rest are submitted sequentially on a single worker. (I also note the order of submissions are non-deterministic each session.)

    Does it suggest we should break the VFX Asset into multiple assets?

    Any pointers would be helpful.

    Thx!
     
    Last edited: Jan 11, 2022
  2. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,014
    This is only the upload part. Draw calls are a separate thing.

    The programs are uploaded by whichever thread requests them first. If it's a single worker thread that does that, it will be serial.

    It won't, as the upload still happens. It only gives a way to control when it takes place.
     
  3. bitinn

    bitinn

    Joined:
    Aug 20, 2016
    Posts:
    961
    Thx, I eventually come to the same conclusion, using Shader Variant Collection's warm up function I was able to move these CreateGPUProgram to scene loading time, so that they can take place on main thread and not right before the VFX play.
     
    fxlange and aleksandrk like this.