Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

MLAgents training progress suddenly declines completely

Discussion in 'ML-Agents' started by -MeMa-, Feb 6, 2021.

  1. -MeMa-

    -MeMa-

    Joined:
    Aug 20, 2020
    Posts:
    3
    Hi. I am currently trying to teach an agent to throw a basketball into a hoop. For this I made a very simple setup, the basketball always spawns in the exact same location and also the hoop/the target stays in the exact same location (so no randomness at all) (see linked Github Repo for full details). Something very strange keeps happening during training, at the beginning the agent seems to improve just fine (the mean reward keeps going up) but then at some point the agent starts to completely forget what it has learned so far and the performance/the mean reward just keeps going down. (See attached Screenshot for further details.) I am not sure what is happening here and why. The task it is supposed to learn seems pretty simple, I hope someone can recreate this problem with the linked Unity Project and can point out to me what I am doing wrong and how to fix it.

    For the training I have used the RollerBallConfig.yaml (from the MLAgents Github (also included in the linked Github Repo)).
    Linked Github Repo: [Link Removed | Problem was resolved]

    Unity Version: 2021.1.0b5
    MLAgents Unity Package Version: 1.0.6

    So far what I have tried is:
    - use different Unity Version
    - use different MLAgents Unity Package Version
    - played with the hyperparamteres a little (no success so far)

    Thank you in advance and please let me know if you need more information.

    BasketBallAgent Script: (i am not sure if everything in here is correct)
    Code (CSharp):
    1. using System.Collections;
    2. using System.Collections.Generic;
    3. using UnityEngine;
    4. using Unity.MLAgents;
    5. using Unity.MLAgents.Sensors;
    6. using System.Threading;
    7.  
    8. public class BasketBallAgent : Agent
    9. {
    10.     [SerializeField] private Transform Target;
    11.     [SerializeField] private Material winMaterial;
    12.     [SerializeField] private Material loseMaterial;
    13.     [SerializeField] private MeshRenderer floorMeshRenderer;
    14.  
    15.     private Rigidbody BallBody;
    16.     private bool alreadyRequestedDecision = false;
    17.     // Start is called before the first frame update
    18.     void Start()
    19.     {
    20.         BallBody = transform.GetComponent<Rigidbody>();
    21.     }
    22.  
    23.     public override void OnEpisodeBegin()
    24.     {
    25.         base.OnEpisodeBegin();
    26.        
    27.         BallBody.constraints = RigidbodyConstraints.FreezeAll;
    28.  
    29.         /*//Move agent to a new random position
    30.         float random_z_value = Random.Range(3.0f, -8.0f);
    31.         transform.localPosition = new Vector3(0f, 5.0f, random_z_value);*/
    32.  
    33.  
    34.         RequestDecision();
    35.     }
    36.  
    37.  
    38.     public override void CollectObservations(VectorSensor sensor)
    39.     {
    40.         // Target and Agent positions
    41.         sensor.AddObservation(transform.localPosition);
    42.         sensor.AddObservation(Target.localPosition);
    43.     }
    44.  
    45.     public override void OnActionReceived(float[] vectorAction)
    46.     {
    47.  
    48.         Debug.Log($"vectorAction.Length: {vectorAction.Length}");
    49.  
    50.         // Actions, size = 1
    51.         Vector3 controlSignal = Vector3.zero;
    52.         controlSignal.y = 1;
    53.         controlSignal.z = 1;
    54.         float force = vectorAction[0];
    55.         force = Mathf.Abs(force);
    56.  
    57.         BallBody.constraints = RigidbodyConstraints.None;
    58.  
    59.         Debug.Log($"Force {force}");
    60.  
    61.         float implicitForce = 500f;
    62.         BallBody.AddForce(controlSignal * force * implicitForce);
    63.     }
    64.  
    65.     public override void Heuristic(float[] actionsOut)
    66.     {
    67.         actionsOut[0] = Input.GetAxis("Vertical");
    68.         //actionsOut[1] = Input.GetAxis("Horizontal");
    69.     }
    70.  
    71.     private void OnTriggerEnter(Collider other)
    72.     {
    73.         Debug.Log("Collision!");
    74.  
    75.         if (other.TryGetComponent<Target>(out Target target))
    76.         {
    77.             SetReward(1f);
    78.             floorMeshRenderer.material = winMaterial;
    79.             EndEpisode();
    80.         }
    81.         if (other.TryGetComponent<Wall>(out Wall wall))
    82.         {
    83.             SetReward(-1f);
    84.             floorMeshRenderer.material = loseMaterial;
    85.             EndEpisode();
    86.         }
    87.     }
    88.  
    89. }

    tensorboard_basketball.PNG basketball_pip_list.PNG
     
    Last edited: Feb 11, 2021
  2. awjuliani

    awjuliani

    Unity Technologies

    Joined:
    Mar 1, 2017
    Posts:
    69
    Hello. Thank you for sharing your tensorboard plots. It seems that the entropy term quickly converges to what seems to be a deterministic point. I would recommend increasing the `beta` hyperparameter in the yaml configuration file. This will encourage a more stochastic policy, and may prevent the collapse to determinism you are seeing in your environment.
     
  3. -MeMa-

    -MeMa-

    Joined:
    Aug 20, 2020
    Posts:
    3
    increased_beta_basketball.PNG
    Thank you very much for your answer. I tried out your suggestion of increasing the 'beta' hyperparamter but the result was pretty much the same. (See attached screenshot)
    The values I used for the 'beta' hyperparamter were as followed:
    - blue line: beta: 0.05
    - pink line: beta: 0.5
    - green line: beta 5.0
    - gray line: 0.0001 (decreased the beta value)

    So I am not sure if these values were what you had in mind.
    Please let me know if you have any further ideas or possible solutions and thanks again for your help.
     
  4. awjuliani

    awjuliani

    Unity Technologies

    Joined:
    Mar 1, 2017
    Posts:
    69
    Thank you for trying these experiments. I see that you also used a modified version of the RollerBall config, which is unfortunately not very general purpose. In particular, it seems you are using a batch and buffer size of 10 and 100 each. This can lead to unstable training. I would recommend adapting one of the configs such as that of PushBlock: https://github.com/Unity-Technologies/ml-agents/blob/master/config/ppo/PushBlock.yaml
     
  5. -MeMa-

    -MeMa-

    Joined:
    Aug 20, 2020
    Posts:
    3
    Thanks again for your help.
    I gave the PushBlock config a few tries and also played around with the hyperparamteters a little, overall I noticed that using the PushBlock config the learning process took quite sometime longer (as opposed to using the RollerBall config). But I also noticed that despite it taking longer to learn, it reached a (much) better/higher cummulative reward before declining and falling back to a steady -1 reward. (See attached Screenshot for details)

    In all three of the plots below I used the PushBall config or slight variations of it.
    Config parameteres used:
    - green line: batch_size: 128, buffer_size: 2048
    - gray line: batch:_size 256, buffer_size: 4096

    As you can see in the green/gray plot, the results look pretty much the same as when using the RollerBall config (see previous screenshots), with the small differences being that training took longer and the cummulative reward being higher before declining.

    Config parameteres used:
    - blue line: batch_size: 128, buffer_size: 2048

    For the blue line I made some minor changes to my enviornment instead of calling RequestDecision() manually in the BasketBallAgent C# script, I added a DecisionRequester component to the agent and set the DecisionPeriod parameter to 20 (the maximum). This of course leads to a different behaviour because now the agent does not learn to 'throw' the ball into the hoop/target, it sort of learns to 'guide' it into the hoop/target (because it now can correct multiple times before the ball reaches the target). Now this change/new behaviour (even though it is not really what I want the agent to do) resulted in an interesting 'discovery'. Using the DecisionRequester component the agent learning progress does not decline and does not fall back to a steady -1 reward, instead it essentially remains at peak performance/maximum positive reward +1. Also the 'Episode Length' (see plots) is not constantly at 0, as oppossed to when calling RequestDecision() manually. So this now raises the question has the 'Episode Length' something to do with this sharp decline in performance that I see when calling RequestDecision() manually? Am I even calling RequestDecision() correctly/at the correct time in my code? Or should further hyperparamter tweaks be made?

    Thank you again very much for help and please let me know if you have any further ideas.

    tensorboard_basketball_pushblock.PNG
     
  6. awjuliani

    awjuliani

    Unity Technologies

    Joined:
    Mar 1, 2017
    Posts:
    69
    I am glad you have been able to arrive at a model which can solve the task. The behavior of the model breaking during training is still a strange phenomena which should not be happening. There may be an issue in your environment itself which provides incorrect information to the agent at some point once the agent becomes successful.
     
  7. matheusverissimo

    matheusverissimo

    Joined:
    Jan 10, 2022
    Posts:
    1
    Guess I'm fighting the same problem. I'm training a very basic car agent, in a 2d top-down view enviroment. Basically, after the agent learns how to succesfully race without hitting the walls or drive in the wrong direction, it just get stuck at the very beggining of the track, running endlessly against the wall, restarting the episode over and over.

    upload_2022-1-10_0-23-21.png

    I am requesting a decision on every step and this is my hyperparameters:
    Code (CSharp):
    1. default_settings: null
    2. behaviors:
    3.   CarAgent:
    4.     trainer_type: ppo
    5.     hyperparameters:
    6.       batch_size: 2048
    7.       buffer_size: 20480
    8.       learning_rate: 0.0003
    9.       beta: 0.005
    10.       epsilon: 0.2
    11.       lambd: 0.95
    12.       num_epoch: 3
    13.       learning_rate_schedule: linear
    14.     network_settings:
    15.       normalize: false
    16.       hidden_units: 128
    17.       num_layers: 2
    18.       vis_encode_type: simple
    19.       memory: null
    20.       goal_conditioning_type: hyper
    21.     reward_signals:
    22.       extrinsic:
    23.         gamma: 0.99
    24.         strength: 1.0
    25.         network_settings:
    26.           normalize: false
    27.           hidden_units: 128
    28.           num_layers: 2
    29.           vis_encode_type: simple
    30.           memory: null
    31.           goal_conditioning_type: hyper
    32.     init_path: null
    33.     keep_checkpoints: 5
    34.     checkpoint_interval: 500000
    35.     max_steps: 5000000
    36.     time_horizon: 64
    37.     summary_freq: 50000
    38.     threaded: false
    39.     self_play: null
    40.     behavioral_cloning: null
    41. env_settings:
    42.   env_path: null
    43.   env_args: null
    44.   base_port: 5005
    45.   num_envs: 1
    46.   seed: -1
    47. engine_settings:
    48.   width: 84
    49.   height: 84
    50.   quality_level: 5
    51.   time_scale: 20
    52.   target_frame_rate: -1
    53.   capture_frame_rate: 60
    54.   no_graphics: false
    55. environment_parameters: null
    56. checkpoint_settings:
    57.   run_id: ppo
    58.   initialize_from: null
    59.   load_model: false
    60.   resume: false
    61.   force: true
    62.   train_model: false
    63.   inference: false
    64.   results_dir: results
    65. torch_settings:
    66.   device: null
    67. debug: false
    68.  
    Do you guys have any idea about what is happening?
     
    AppleviYunex likes this.