Search Unity

Question Academy Fixed Update Stepper Generating Garbage

Discussion in 'ML-Agents' started by WaxyMcRivers, Jun 22, 2021.

  1. WaxyMcRivers

    WaxyMcRivers

    Joined:
    May 9, 2016
    Posts:
    59
    Hello,

    I am working on running my agents in a virtual reality environment within the quest 2. I was doing some profiling and noticed that the Academy Fixed Update Stepper is generating some garbage. After going into the hierarchy mode of the profiler in unity, I found that a call to "GenerateTensors" within the fixed update stepper is calling a "LogStringToConsol". This log is taking up a majority of the time it takes to call this function. 18ms total for Fixed Update Stepper and 10ms just for that log.

    I have made sure that I am not generating any Debug.Log calls in my scripts and am at a loss as to why this log string is being generated and was hoping I could get some guidance on how to track it down and or turn it off. I have attached a screenshot of my profiler result.
     

    Attached Files:

  2. WaxyMcRivers

    WaxyMcRivers

    Joined:
    May 9, 2016
    Posts:
    59
    Here's another picture of the timeline.
     

    Attached Files:

  3. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    By any chance are you seeing anything in your console? It's possible that Barracuda is writing a warning or info to console and that's eating up a lot of time.
     
  4. WaxyMcRivers

    WaxyMcRivers

    Joined:
    May 9, 2016
    Posts:
    59
    Nothing in the logs about barracuda, but we found more GC allocation in the Size Allocator in barracuda (see attached images). More than willing to provide whatever information you need - appreciate the help and quick response.
     

    Attached Files:

  5. WaxyMcRivers

    WaxyMcRivers

    Joined:
    May 9, 2016
    Posts:
    59
    Ok I double checked the console and saw that I had warnings turned off - it was a size mismatch that was generating the console logs in Generate Tensor - I'm still seeing garbage being generated in the Size Allocator though.

    We are also curious if the ML-Agents team has plans to start leveraging the tensor hardware in XL2 chips?
     
  6. WaxyMcRivers

    WaxyMcRivers

    Joined:
    May 9, 2016
    Posts:
    59
    Here is the size allocator garbage
     

    Attached Files:

  7. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    I don't know about XL2 chips - that would depend on Unity compute shaders taking advantage of the hardware.

    As for the errors - are you still seeing the Size Allocator garbage after fixing the size mismatch, and does it affect the runtime as much as before? If it's still proving to be expensive, I'll ping the Barracuda team. Which version of Barracuda are you on?
     
  8. WaxyMcRivers

    WaxyMcRivers

    Joined:
    May 9, 2016
    Posts:
    59
    Interesting, I should learn more about how the compute shaders work. Will they still be useful if there's no visual observations and only vectors for input (buffer sensor, vector sensor, raycasts, etc)?

    I am still seeing the garbage in size allocator, the latest screen shot was taken after fixing the size mismatch. I am on Barracuda 2.0.0-pre.4.

    Thank you.
     
  9. alexandreribard_unity

    alexandreribard_unity

    Unity Technologies

    Joined:
    Sep 18, 2019
    Posts:
    53
    Hey @WaxyMcRivers I think the allocation you are referring too is a red hearing.
    Internally when executing a layer we need to allocate new memory for the layer output. So it is natural to see allocations here and there.
    So what is the troubling part? That Barracuda is generating too much gc-alloc?
    Can you test on 2.1.0? We did a pass to remove some useless allocs, I am not sure if it made it in 2.0.0.

    As so far as neural chips go, we are investigating it. So far we have found mixed results. The performance gains really depends on the driver/model.

    For compute work, it's pretty simple actually. Barracuda engine leverages the GPU for inference via compute shaders.
    However, data might need to be uploaded to and from the GPU depending on your usecase. So this overhead might come eat at the performance gains you get with a GPU worker.
     
  10. WaxyMcRivers

    WaxyMcRivers

    Joined:
    May 9, 2016
    Posts:
    59
    Thanks for the reply - I upgraded to 2.1.0-pre and took a look at the profiler. It looks great and I don't see any significant garbage being generated (attached image). I think at the time, we thought that the garbage was a primary source of milliseconds being accumulated in the Barracuda.Execute() call, but we've since ruled that out.

    Appreciate the update on the neural chips and agree that because our use case will require cpu inference only, the GPU overhead wont be worth it.
     

    Attached Files:

    ervteng_unity likes this.