Search Unity

  1. Looking for a job or to hire someone for a project? Check out the re-opened job forums.
    Dismiss Notice
  2. Unity 2020 LTS & Unity 2021.1 have been released.
    Dismiss Notice
  3. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

Resolved Compute Dispatches in CommandBuffer execute half the kernels on older HW

Discussion in 'General Graphics' started by npatch, Feb 22, 2021.

  1. npatch

    npatch

    Joined:
    Jun 26, 2015
    Posts:
    180
    I have 7 kernels running back to back(with a couple of CopyTexture and AsyncReadbacks sprinkled here and there), per frame using a CommandBuffer.
    In Desktop machine with GTX1060, it runs at about 2.7ms in total.
    At two work machines, one with a GTX645 and another with a GTX680, only the first 3 kernels execute and the AsynchReadbacks(one of which is issued after the 4 kernels that don't execute).

    Apart from 1 kernel, all others work on the same amount of data, therefore it's not the load , as in how many threads they can use, at least for the 680 which is quite beefy considering it's 7 generations old.

    Also another thing that's weird is that RenderDoc captures(1 or 2 consecutive frames) only contain 3 kernels and two AsyncReadbacks, but the Frame Debugger inside Unity, shows all 7. But the last executed kernel's output seems to get properly finished, but the same resource for the same id, which acts as the input in the next kernel(first non-executing kernel), is black. All textures are preallocated. I was using TemporaryRTs at some point but they flickered in FrameDebugger between black and correct state, so I replaced them with preallocated textures.

    Looking at a profile capture in-editor without vsync and looking at spikes in both CPU and GPU, for the GTX680 machine, CPU is at 7.2ms and GPU at 3.9ms(while showing all 7 dispatch calls under GPU Usage and a total of 0.778ms which I gather is just the first 3 kernels being executed). But I don't see any kind of bottleneck that would explain dropping dispatches.


    Any ideas on what it might be? Or how I could further investigate?

    UPDATE: Obviously the machines do support Compute Shaders. I have also compared the SystemInfo stuff and apart from VRAM, everything else was the same between the 645 and the 1060. Texture Count used for the compute stuff is 7-8 textures(color only) at 1280x720. And apart from two which are ARGB32, the rest are R or RG(int or float).
     
    Last edited: Feb 22, 2021
  2. npatch

    npatch

    Joined:
    Jun 26, 2015
    Posts:
    180
    UPDATE: As always you forget some of the most important ways to check such issues....
    Got advised to run a build or the editor with -force-d3d11-debug which apparently enables DX debug layer information. Honestly, I thought it was enabled by default in the editor as I remember errors from previous engine versions that had to do with DX11 debug layer stuff.
    Anyhow, this is the error from the graphics debugging in VS:

    D3D11 ERROR: ID3D11Device::CreateComputeShader: Shader uses new Typed UAV Load formats, but the device does not support this.
    To check for support, check device caps via the CheckFeatureSupport() API [ STATE_CREATION ERROR #2097322: CREATECOMPUTESHADER_INVALIDSHADERBYTECODE]


    I had previously checked SystemInfo but this is very specific. I wrote some C# code to display all info about the RT formats I used and their respective GraphicsFormat(i.e. whether they are supported, blending is supported, every FormatUsage option support etc). And running this on the machine with the GTX680, everything comes back Supported. Also checked GetCompatibleFormat in case there was a different suggestion but no dice.
    upload_2021-2-25_13-7-49.png

    Just so happens that someone also had the same issue with checking. Their solution using
    graphicsDeviceVersion is what is used for the HelpBox in the image above and it at least works for alerting us when the machine should try a different approach but I still need more info.
     
    Last edited: Feb 25, 2021
  3. npatch

    npatch

    Joined:
    Jun 26, 2015
    Posts:
    180
    UPDATE2: Ran the Graphics Debugging session in VS again, but this time, I tried to unmute the Info messages in hopes of more information about which CS kernel creation is failing.
    But sadly all of them look like this:

    D3D11 INFO: Create ID3D11ComputeShader: Name="unnamed", Addr=0x00000235EE69FBF0, ExtRef=1, IntRef=0 [ STATE_CREATION INFO #2097298: CREATE_COMPUTESHADER]


    Even though CS has #pragma enable_d3d11_debug_symbols, no names are shown properly.
     
  4. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    703
    npatch likes this.
  5. npatch

    npatch

    Joined:
    Jun 26, 2015
    Posts:
    180
    Can't post the code unfortunately, but in the second post, the spoiler button shows a screenshot of the RenderTexture types used in my kernels, along with their GraphicsFormat.
     
  6. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    703
    The problem isn't the render texture, it's how you are declaring it to be accessed in your kernel, like RWTexture2D<float4> or something.

    Check this for more info:
    https://stackoverflow.com/questions...how-to-read-from-a-rwtexture2dfloat4#57846130
     
    npatch likes this.
  7. npatch

    npatch

    Joined:
    Jun 26, 2015
    Posts:
    180
    Yeah that was actually it. I had compacted stuff into less textures, using all channels if possible with whatever type made sense, which for my 1060 was fine, but the 680 as you said, only supports single channel, int, uint, and float.
    I've converted most of them back to separate textures and supported types and kernels started showing up.
    Thanks!
     
    Last edited: Feb 25, 2021
  8. npatch

    npatch

    Joined:
    Jun 26, 2015
    Posts:
    180
    Now my only issue is that most of the textures are flickering(kinda like a refresh rate issue), but looking at a profile of a build, all kernels take 1.1ms. And supposedly one kernel has to finish for another to be executed, per-frame.
     
  9. npatch

    npatch

    Joined:
    Jun 26, 2015
    Posts:
    180
    Scratch that. one of those tex type changes was not correct and it was used very early in the chain which messed up everything.

    That said, I might try to do a variant for high end , just to keep the more compact version(less textures) and dispatch the appropriate variant based on the feature set.

    Thanks again @KokkuHub!
     
unityunity