Search Unity

Always getting same continuous action value

Discussion in 'ML-Agents' started by amoebe, May 4, 2021.

  1. amoebe

    amoebe

    Joined:
    Jun 19, 2014
    Posts:
    9
    UPDATE: issue is fixed with Continuous actions when the target transform has a 0 value that is non negative. Does not fix the Discrete numbers repeating though.

    Hi,
    I've followed below tutorial to get started with MLAgents.

    It works fine when I reproduce it in a new project similar to his, but when I take the same steps within my own (not super elaborate) project, the continuous action values I get are consistenly the same: 1 or sometimes -1. Sometimes after a few retries I do get some in-between float values, but then only for one of the two continuous actions.
    Everything else seems to function as it should (the trainer connects fine), just the continuous actions keep returning the same value. For an example of the script see this time stamp: https://youtu.be/ zPFU30tbyKs?t=1533

    Unity version is 2020.3.3f1
    MLAgents version 1.9.0
    Still very new to this framework, am I missing something?

    Example debug log:
    Debug.Log(actions.ContinuousActions[0]);
    Debug.Log(actions.ContinuousActions[1]);
    upload_2021-5-4_16-43-33.png

    I start the training with --force every time.

    Once stuck on a permanent data, it's always at either -1 or 1, sometimes a mix.

    Same issue with Discrete values, which get stuck mostly on 1 specific integer.
     
    Last edited: May 5, 2021
  2. christophergoy

    christophergoy

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi, could you please provide more information about your environment such as:
    1. Your agents Action/Observation space
    2. Your reward functions
    3. The OS you are on
    4. The version of unity/mlagents(c# and python)/os
     
  3. amoebe

    amoebe

    Joined:
    Jun 19, 2014
    Posts:
    9
    It's a simple 2D environment with ships on a sea that can move along the X and Y axes.
    Rewards are just placeholders now: hit a nearby ship for success, hit another type of ship that is placed all around as a 'wall' as in the linked tutorial. The problem persists also if I remove the rewards from the OnCollisionEnter2D function.

    Working on Windows 10.
    Python 3.8.5, Unity 2020.3.3f1, ML agents both 19.0.0 and 20.0.0-exp1.

    It seems somehow linked to the positioning of both the GameObject that holds the agent script (setting its position in OnEpisodeBegin) and the reward-target-transform ( asin the Script I'll attach). Sometimes changing the target objects 'fixes' it consistenyl so that random values keep coming in. But it seems to like to go back to getting stuck on always getting -1 or 1.

    Tried with 1 or 2 continuous actions and 2 Discrete branches.

    Is it correct to assume that these values that are coming in through OnActionReceived should always be entirely random? Or is there something that influences them?

    Thanks!
     

    Attached Files:

  4. christophergoy

    christophergoy

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi,
    When you first start training, the values (in the continuous action space) will look random. As training progresses the combination of the observations/actions/reward function will help the neural network determine what the next action to take should be. Let me know if that makes sense.

    Taking a look at your observations:
    Code (CSharp):
    1.     public override void CollectObservations(VectorSensor sensor)
    2.     {
    3.         sensor.AddObservation(transform.position);
    4.         sensor.AddObservation(targetTransform.position);
    5.     }
    It's really hard for a neural network to understand non-normalized values. It might be better to observe the normalized distance from an object.

    For example, if you know the maximum distance you can be from a target, it might be best to observe the direction to that target as a normalized vector, and the normalized distance like so:
    Code (CSharp):
    1. var direction = (targetTransform.position - transform.position).normalized();
    2.  
    3. // k_MaxDistance is a hard-coded value based on what you think the max distance a ship can be away from its target.
    4. var normalizedDistance = Vector3.Distance(transform.position, targetTransform.position) / k_MaxDistance;
    5.  
    6. sensor.AddObservation(direction)
    7. sensor.AddObservation(normalizedDistance)
     
    TheDelhiDuck likes this.
  5. amoebe

    amoebe

    Joined:
    Jun 19, 2014
    Posts:
    9
    Thanks Christopher! That is useful information going forward.

    Just to clarify: when I see the delivered action getting 'stuck' on a specific number (-1 or 1 for Continuous, random specific ints for Discrete) that is due to that combination of observations/actions/reward?

    I was a little baffled why changing the Agent's position from Vector3.Zero to its position as recorded in Awake() caused this 'bug', but then that probably has to do with the observations as you descibe?
     
  6. christophergoy

    christophergoy

    Joined:
    Sep 16, 2015
    Posts:
    735
    Yes. Your reward function and your observations help the neural network decide which actions to take.