Search Unity

Question Barracuda uses a lot of memory.

Discussion in 'Barracuda' started by mutsuyuki, Jun 25, 2021.

  1. mutsuyuki

    mutsuyuki

    Joined:
    Apr 16, 2015
    Posts:
    19
    I'm developing an application that runs a relatively small model (about 25MB) on Barracuda.

    The memory usage is reasonable on Unity.
    However, if you look at the memory usage of the iOS built application, it seems that Barracuda uses more than 1GB memory after start using Barracuda.

    The next picture shows the memory usage on Unity.
    I think System.Byte[] or System.Single[] are the areas of the Tensor of the model that is being loaded. They roughly match the model size.



    The next video shows the memory transition when used on iOS.
    Probably the moment app start using Barracuda, the memory usage increases about 2.5GB, and after that, it stays at more than 1GB of memory increasing.



    Is this a problem that can be solved by configuring Barracuda?

    I have experienced a similar problem when using Tensorflow.
    Tensorflow has a default behavior of allocating as much GPU memory as possible.
    This can be solved by setting options such as allow_growth and set_memory_growth to True.
    In other words, the memory usage could be made to allocate only what is needed.

    I would like to know if there is a setting similar to this.
    Or is there currently no way to avoid this increase in memory usage?
     
  2. Mantas-Puida

    Mantas-Puida

    Joined:
    Nov 13, 2008
    Posts:
    1,864
    Typically Neural Net memory allocations are dominated by the storage used to store intermediate layer outputs, which might be many times bigger than the model weights. We have have improved this temp memory handling in Barracuda `2.1.0-preview` is it the version you are using?
     
  3. mutsuyuki

    mutsuyuki

    Joined:
    Apr 16, 2015
    Posts:
    19
    Thank you very much for your answer.

    I forgot to write important information.
    I'm using version 1.4.0.

    I see that you mean to reserve the area used by the input, intermediate, and output layer of entire graph in advance.
    It helped me to understand the inference system.

    I am wondering how you reduced the memory usage in 2.1.0.
    Does it reuse the allocated memory used by the previous layer?

    Either way, I'll have to give 2.1.0 a try first!
     
  4. alexandreribard_unity

    alexandreribard_unity

    Unity Technologies

    Joined:
    Sep 18, 2019
    Posts:
    53
    So how we optimize for memory allocation is that we first do a pass to figure out what is the largest amount of intermediate layers needed to be held in memory at a given time.
    That gives us ~ the amount of memory needed.
    Which we allocate once. When a new layer is not used it is dumped and new layers can use that unused memory space.

    In 2.1.0 the big innovation was to avoid unnecessary GC allocation (mostly linked to 0-initialization) of those intermediate memory
     
  5. mutsuyuki

    mutsuyuki

    Joined:
    Apr 16, 2015
    Posts:
    19
    Thank you Mantas-Puida and alexandreribard_unity.
    Sorry for late to say thanks.

    I tried 2.1.0 and I could confirm that memory usage has been reduced.
    And I see how to archive reducing memory usage.