Search Unity

  1. The 2022.1 beta is now available for testing. To find out what's new, have a look at our 2022.1 beta blog post.
    Dismiss Notice

Help Wanted Feasibility of in-pipeline compute shader

Discussion in 'Barracuda' started by JustASquid, Jun 14, 2021.

  1. JustASquid

    JustASquid

    Joined:
    Aug 22, 2015
    Posts:
    2
    From reading the documentation and blog posts e.g. Real-time style transfer in Unity, it seems as though Barracuda neural networks must be called and evaluated separately from the render pipeline.
    E.g. in the demo code for the project mentioned above, as far as I can see, the rendered output of the game is copied (at presumably great expense) back into CPU memory [EDIT: this is not true, see reply below] so that it can be converted into a
    Tensor
    and then submitted to the Barracuda model... which runs on the GPU.

    The blog post even mentions CNN's in the rendering loop, but it's unclear what the actual technical limitations are / if any work is being done to address them.

    It's clear that there could be a lot of potential applications for evaluating models in-pipeline. My question is, what sort of technical limitations (if any) exist to prevent us from being able to run the generated compute shaders in the same way as we'd run any other compute shader as part of the rendering pipeline?
     
    Last edited: Jun 15, 2021
  2. fguinier

    fguinier

    Unity Technologies

    Joined:
    Sep 14, 2015
    Posts:
    136
    Hi @JustASquid

    At the moment the scheduling and execution of Barracuda is indeed separated from the rendering loop, however this does not mean data need to be copied back to CPU! In the demo above the color buffer is fed from the rendering loop (rendertarget) to a tensor (computebuffer) so yes there is a copy but it's only happening on GPU! (it is also no a copy but rather a small compute shader as color space and memory shuffling is happening there).

    Actually at the moment we don't even support Tensor created from texture and used on a CPU Barracuda backend (this is a however something we want to support soon, but will require a download then).

    So the limitation to integrate ML/CNN to the rendering loop at the moment are i would say:
    - Barracuda scheduling is using "dispatch" making integration with commandbuffer like rendering code less clean. --> This is in the roadmap.
    - In general deep learning model are quite expensive both in term of inference/execution time and amount of memory used --> We continuously pushing for speed and lowering memory usage, this is however a shared responsibility with the actual model runned.

    Make sense?

    Florent
     
  3. JustASquid

    JustASquid

    Joined:
    Aug 22, 2015
    Posts:
    2
    Thanks @fguinier!

    Good to hear my cursory read on the GPU/CPU copy was wrong.
    And also good to hear that commandbuffer-like compatibility is in the works! Excited to see development here!
    Your point about deep models being very expensive especially applied at full screen resolution is absolutely true - my planned use cases are cheaper, e.g. using shallower networks, often not applied full screen (for example, neural BRDF, NERF, neural image compression, etc.)
     
    fguinier likes this.
unityunity