Search Unity

Feature Request baracuda GC zero allocation

Discussion in 'Barracuda' started by laurentlavigne, May 5, 2021.

  1. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,335
    I see in the ML blog that there is an allocation per episode, baracuda is the cause, is this something you'll end up turn to 0 alloc? if it's even possible
     
    apkdev likes this.
  2. fguinier

    fguinier

    Unity Technologies

    Joined:
    Sep 14, 2015
    Posts:
    146
    Hi @laurentlavigne,

    My understanding is that metrics in the blog post https://blogs.unity3d.com/2021/05/0...ports-training-complex-cooperative-behaviors/ are excluding alloc from Barracuda.

    Performance improvements
    These metrics exclude the memory used by Barracuda (the Unity Inference Engine that ML-Agents relies on for cross-platform inference):


    However at inference Barracuda is indeed allocating some memory in most case 88 Bytes per layer, aka the output Tensors. We definitively want to get rid of that however, 0 alloc is the goal!
     
    Last edited: May 6, 2021
    laurentlavigne likes this.
  3. fguinier

    fguinier

    Unity Technologies

    Joined:
    Sep 14, 2015
    Posts:
    146
    FYI @laurentlavigne 0 alloc at inference have been added to our backlog (no ETA however for now).
     
    laurentlavigne likes this.
  4. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,335
    Thanks for the heads up @fguinier. You're right, I misquoted. Is it 88 byte/layer/frame? I'm sure this isn't stopping anyone, just good to know ahead of time what to expect when I test performance.
     
    fguinier likes this.
  5. fguinier

    fguinier

    Unity Technologies

    Joined:
    Sep 14, 2015
    Posts:
    146
    Yes, that is current expectation 88B/layer being executed, so if you execute model everyframe 88B/layer/frame :).

    As a note: You might see some layers allocating more for some networks, those are usually quick fix. Feel free to report them we did a pass on recent releases (1.3/1.4) but i'm sure some remains.
     
  6. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,335
    Will do.
    Are there any published projects using barracuda? I'd like to get a sense of what's possible.
     
  7. alexandreribard_unity

    alexandreribard_unity

    Unity Technologies

    Joined:
    Sep 18, 2019
    Posts:
    53
  8. apkdev

    apkdev

    Joined:
    Dec 12, 2015
    Posts:
    283
    Hey guys, any chance to get some eyes on the GC allocs? I went through the code and there's tons of low-hanging fruits.

    I hacked at some allocations and went from 12.7kb to 1.1kb per frame (executing Keijiro's BlazeFaceBarracuda). And I don't even know a thing about Barracuda, I just followed the deep profiler.

    upload_2022-5-12_23-51-31.png
    upload_2022-5-12_23-51-46.png
     
  9. airoll

    airoll

    Joined:
    Jan 12, 2021
    Posts:
    37
    @apkdev how were you able to reduce the allocation per frame by yourself? I'm still encountering crazy gc allocations like you were.
     
  10. apkdev

    apkdev

    Joined:
    Dec 12, 2015
    Posts:
    283
    I profiled the code, looked for the allocations and got rid of them, case by case. I didn't post the code because these were low-quality workarounds. They seemed to work well on BlazeFaceBarracuda but I could have introduced some bugs, didn't have the time to do extensive testing.

    If you're gonna do this, make sure you test each change; it's easy to break something and then you have to backtrack a lot. AFAIR it was mostly classic GC fixes: don't create temporary objects just to call one method, cache arrays for method arguments, basic stuff.

    If you ask me, Unity should have a rule that per-frame GC allocations are considered bugs that need to be fixed before a new package version is published. Maybe even have tests to catch regressions - it seems doable with the performance testing package.
     
  11. airoll

    airoll

    Joined:
    Jan 12, 2021
    Posts:
    37
    Thanks! Yeah I'm looking through the code now, it looks like for my use case (v3.0.0) the garbage comes from 3 places: SharedArrayTensorData.Download() where they initialize a new array every time, TensorCachingAllocator.Reset() where m_BusyTensor.Keys.ToList() is used to allocate a new array, and TensorCachingAllocator.AllocTensorInternal() where a new Tensor is created.

    I totally agree 100% around your comment around per-frame GC allocations. This is the last major source of GC allocations in my project.

    @fguinier do you happen to know if these GC allocations are fixed in the upcoming v4 release?