Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Question How do I fix performance issues with ML Agents?

Discussion in 'ML-Agents' started by LiterallyCoder, Mar 23, 2023.

  1. LiterallyCoder


    Jul 25, 2020
    Hello everyone,

    Recently I've been playing around with Unity's ML Agents, and I've set up a basic scene where the agent is meant to walk up to a piece of "food" and touch it. The cube is the food.


    It's a very simple set up, but I'm experiencing some unexplainable performance issues regarding the FPS. Just for testing purposes, I set the Behavior Type to Heuristic Only on the agent. When I pressed the play button to see if the movement was working correctly, I got around 700-800 fps consistently:


    Then, after stopping the test and running it again, I noticed a slight FPS drop. Nothing in my scene changed and I didn't change any of the settings on the Behavior Parameters script. To see if this wasn't a coincidence, I decided to press the play and stop buttons a few more times. After five runs, I got this FPS:


    It nearly halved!! Then, on the 10th run, the FPS dropped to this:

    I have no idea about what is going on. Everything in the scene seems to be exactly the same as it was in the beginning, but the FPS just keeps getting lower with every test. It goes back up to 800 when I close the project and relaunch it from the Unity Hub. For some context, here is what the agent looks like, as well as the script on it:


    Code (CSharp):
    1. using System.Collections;
    2. using System.Collections.Generic;
    3. using UnityEngine;
    4. using Unity.MLAgents;
    5. using Unity.MLAgents.Actuators;
    6. using Unity.MLAgents.Sensors;
    7. using UnityEditor;
    9. public class AgentController : Agent
    10. {
    11.     [SerializeField] Transform agentSpawn;
    12.     [SerializeField] Transform targetTransform;
    13.     [SerializeField] float moveSpeed;
    14.     Rigidbody rb;
    16.     [SerializeField] Material winmat;
    17.     [SerializeField] Material losemat;
    18.     [SerializeField] MeshRenderer floorMeshRenderer;
    20.     public override void Initialize()
    21.     {
    22.         rb = GetComponent<Rigidbody>();
    23.     }
    25.     public override void OnEpisodeBegin()
    26.     {
    27.         transform.localPosition = agentSpawn.position;
    28.     }
    30.     public override void CollectObservations(VectorSensor sensor)
    31.     {
    32.         sensor.AddObservation(transform.localPosition);
    33.         sensor.AddObservation(targetTransform.localPosition);
    34.     }
    36.     public override void OnActionReceived(ActionBuffers actions)
    37.     {
    38.         float moveX = actions.ContinuousActions[0];
    39.         float moveZ = actions.ContinuousActions[1];
    41.         rb.velocity = new Vector3(moveX, 0, moveZ).normalized * moveSpeed;
    42.     }
    44.     private void OnTriggerEnter(Collider other)
    45.     {
    46.         if (other.CompareTag("Food"))
    47.         {
    48.             SetReward(+1f);
    49.             floorMeshRenderer.material = winmat;
    50.             EndEpisode();
    51.         }
    52.         else if (other.CompareTag("Wall"))
    53.         {
    54.             SetReward(-1f);
    55.             floorMeshRenderer.material = losemat;
    56.             EndEpisode();
    57.         }
    58.     }
    60.     public override void Heuristic(in ActionBuffers actionsOut)
    61.     {
    62.         ActionSegment<float> continuousActions = actionsOut.ContinuousActions;
    63.         continuousActions[0] = Input.GetAxisRaw("Horizontal");
    64.         continuousActions[1] = Input.GetAxisRaw("Vertical");
    65.     }
    66. }
    I also get this message in the console:
    "Couldn't connect to trainer on port 5004 using API version 1.5.0. Will perform inference instead."
    But, once again, the Behavior Type is set to Heuristic Only and there is no brain model on the agent so it cannot run inference. I'm not sure if that is relevant information, but there it is.

    I'm a complete noob with ML Agents and machine learning in general, but could you guys help me understand why the FPS keeps dropping every time I press the play button?
  2. spiney199


    Feb 11, 2021
    Perhaps consult with the profiler to properly determine where the cause of the overhead is.
  3. LiterallyCoder


    Jul 25, 2020
    I tried using the profiler and found that the memory does more frequently with each new run. I also trained the agent and did an inference run with the trained neural network model. The same exact thing happened. With each inference run in the editor, the FPS decreased by about 100 FPS. When I restart the project, it stays at 800 FPS for a while. Could it be a bug with ML Agents? It's most likely a memory leak, but I don't know what to do with it.

    I also tried force the garbage collector to clear everything after each run in the editor, but it had no effect:
    Code (CSharp):
    1. #if UNITY_EDITOR
    2. using UnityEditor;
    3. using UnityEngine;
    5. [InitializeOnLoad]
    6. public class EditorGarbageCollector
    7. {
    8.     static EditorGarbageCollector()
    9.     {
    10.         EditorApplication.playModeStateChanged += OnPlayModeStateChanged;
    11.     }
    13.     private static void OnPlayModeStateChanged(PlayModeStateChange state)
    14.     {
    15.         if (state == PlayModeStateChange.EnteredEditMode)
    16.         {
    17.             System.GC.Collect();
    18.             Debug.Log("Forced garbage collection after exiting play mode.");
    19.         }
    20.     }
    21. }
    22. #endif
    If it's not possible to fix this, I don't think ML Agents is usable at all. Losing 100 fps every time you want to run the trained model in the editor is ridiculous. I'll try remaking the entire project in an earlier Unity version and see if that does anything, since I'm using 2023.1.0a26 right now.
  4. spiney199


    Feb 11, 2021
    Does more what?

    And I don't know about you, but I only care about hitting 60fps, which is the maximum of what 99% of monitors can achieve anyway.
    All_American likes this.
  5. LiterallyCoder


    Jul 25, 2020
    Sorry, that was a typo. I meant to say that the memory does spike more frequently with each run in the editor. I agree that hitting at least 60 fps is really the only thing that matters, but that doesn't fix my problem. If I press the play button enough times in the editor, the fps will eventually decrease to 20 or even 10 fps. It's completely unusable...

    Here's what the FPS looks like after 20 ish times of pressing the play button:
    From 800 fps, it goes down to a whopping 47 fps.
    Last edited: Mar 23, 2023
  6. spiney199


    Feb 11, 2021
    I mean isn't that idea with this machine learning stuff? The more runs you do, the more complex the algorithm gets, so the more overhead it will incur.

    Idea being you train one that hits that balance of fidelity and performance considerations.

    This is well outside my expertise, but just making an educated guess here.
  7. LiterallyCoder


    Jul 25, 2020
    No, logically that should not happen. I am not training the agent. I was simply running the simulation using the Heuristic Only behavior type. In my case, it allowed me to control the agent manually and move around the scene. The agent had no neural network at that point and was not being trained at all. In the second case, I did train the agent, but I only tested this weird memory leak after it had been trained successfully. After training an agent, you get a neural network model you can use to run the simulation with the trained agent, so that also wouldn't have affected the FPS. My scene is also very very simple, so it wouldn't make sense for the FPS to go from 800 to 47 by simply pressing the play button a few times.

    Fortunately, I finally solved the issue by downgrading to ML Agents version 19 and Unity version 2021.3.20f1. I still have no idea why the memory leak was occurring in the first place, but I finally managed to dodge it.