Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Feature Request baracuda GC zero allocation

Discussion in 'Barracuda' started by laurentlavigne, May 5, 2021.

  1. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,003
    I see in the ML blog that there is an allocation per episode, baracuda is the cause, is this something you'll end up turn to 0 alloc? if it's even possible
     
    apkdev likes this.
  2. fguinier

    fguinier

    Unity Technologies

    Joined:
    Sep 14, 2015
    Posts:
    146
    Hi @laurentlavigne,

    My understanding is that metrics in the blog post https://blogs.unity3d.com/2021/05/0...ports-training-complex-cooperative-behaviors/ are excluding alloc from Barracuda.

    Performance improvements
    These metrics exclude the memory used by Barracuda (the Unity Inference Engine that ML-Agents relies on for cross-platform inference):


    However at inference Barracuda is indeed allocating some memory in most case 88 Bytes per layer, aka the output Tensors. We definitively want to get rid of that however, 0 alloc is the goal!
     
    Last edited: May 6, 2021
    laurentlavigne likes this.
  3. fguinier

    fguinier

    Unity Technologies

    Joined:
    Sep 14, 2015
    Posts:
    146
    FYI @laurentlavigne 0 alloc at inference have been added to our backlog (no ETA however for now).
     
    laurentlavigne likes this.
  4. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,003
    Thanks for the heads up @fguinier. You're right, I misquoted. Is it 88 byte/layer/frame? I'm sure this isn't stopping anyone, just good to know ahead of time what to expect when I test performance.
     
    fguinier likes this.
  5. fguinier

    fguinier

    Unity Technologies

    Joined:
    Sep 14, 2015
    Posts:
    146
    Yes, that is current expectation 88B/layer being executed, so if you execute model everyframe 88B/layer/frame :).

    As a note: You might see some layers allocating more for some networks, those are usually quick fix. Feel free to report them we did a pass on recent releases (1.3/1.4) but i'm sure some remains.
     
  6. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,003
    Will do.
    Are there any published projects using barracuda? I'd like to get a sense of what's possible.
     
  7. alexandreribard_unity

    alexandreribard_unity

    Unity Technologies

    Joined:
    Sep 18, 2019
    Posts:
    53
  8. apkdev

    apkdev

    Joined:
    Dec 12, 2015
    Posts:
    264
    Hey guys, any chance to get some eyes on the GC allocs? I went through the code and there's tons of low-hanging fruits.

    I hacked at some allocations and went from 12.7kb to 1.1kb per frame (executing Keijiro's BlazeFaceBarracuda). And I don't even know a thing about Barracuda, I just followed the deep profiler.

    upload_2022-5-12_23-51-31.png
    upload_2022-5-12_23-51-46.png
     
  9. airoll

    airoll

    Joined:
    Jan 12, 2021
    Posts:
    37
    @apkdev how were you able to reduce the allocation per frame by yourself? I'm still encountering crazy gc allocations like you were.
     
  10. apkdev

    apkdev

    Joined:
    Dec 12, 2015
    Posts:
    264
    I profiled the code, looked for the allocations and got rid of them, case by case. I didn't post the code because these were low-quality workarounds. They seemed to work well on BlazeFaceBarracuda but I could have introduced some bugs, didn't have the time to do extensive testing.

    If you're gonna do this, make sure you test each change; it's easy to break something and then you have to backtrack a lot. AFAIR it was mostly classic GC fixes: don't create temporary objects just to call one method, cache arrays for method arguments, basic stuff.

    If you ask me, Unity should have a rule that per-frame GC allocations are considered bugs that need to be fixed before a new package version is published. Maybe even have tests to catch regressions - it seems doable with the performance testing package.
     
  11. airoll

    airoll

    Joined:
    Jan 12, 2021
    Posts:
    37
    Thanks! Yeah I'm looking through the code now, it looks like for my use case (v3.0.0) the garbage comes from 3 places: SharedArrayTensorData.Download() where they initialize a new array every time, TensorCachingAllocator.Reset() where m_BusyTensor.Keys.ToList() is used to allocate a new array, and TensorCachingAllocator.AllocTensorInternal() where a new Tensor is created.

    I totally agree 100% around your comment around per-frame GC allocations. This is the last major source of GC allocations in my project.

    @fguinier do you happen to know if these GC allocations are fixed in the upcoming v4 release?