Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Question Agent Stopped learning and stuck at 0

Discussion in 'ML-Agents' started by yasinmasmoudi, Dec 27, 2022.

  1. yasinmasmoudi

    yasinmasmoudi

    Joined:
    Nov 15, 2021
    Posts:
    6
    Hello,
    I am trying to make an agent that can control a machine :
    upload_2022-12-27_15-25-6.png
    The aim is for the agent to place the segment on the yellow one asap.
    I made a server build and ran the following command :
    " mlagents-learn config/placeSegment.yaml --env=Build/servercollision/ML-Agent --num-envs=20 --run-id=pycollision5 --resume "

    my code is as following :
    if (epStarted)
    {
    // Rotation Reward

    segRotation = segment.transform.rotation.eulerAngles.z;
    if (segRotation < angRef)
    segRotation = segRotation + 360;
    angDistance = target.transform.rotation.eulerAngles.z - segRotation;
    if(allowRewards)
    AddReward(-Mathf.Abs(angDistance) / 360)

    //distance Reward

    linearDistanceZ = segment.transform.position.z - target.transform.position.z;
    if(allowRewards)
    AddReward(-Mathf.Abs(linearDistanceZ) / 20);
    }




    public override void CollectObservations(VectorSensor sensor)
    {
    sensor.AddObservation(linearDistanceZ);
    sensor.AddObservation(angDistance);
    sensor.AddObservation(linearDistanceX);
    sensor.AddObservation(segment.transform.position.z);
    sensor.AddObservation(segment.transform.position.x);
    sensor.AddObservation(segRotation);
    sensor.AddObservation(target.transform.localPosition);
    sensor.AddObservation(segment.transform.localPosition);
    sensor.AddObservation(Collision);
    }



    public override void OnActionReceived(ActionBuffers actions)
    {

    if ( allowAction)
    {
    Rotation = actions.ContinuousActions[0];
    TranslatePart2 = actions.ContinuousActions[1];
    TranlationPart4 = 0;
    debug_action_rot = Rotation;
    debug_action_x = TranlationPart4;
    debug_action_z = TranslatePart2;
    BluePart.GetComponent<Rotating>().GetActionFromMLAgent(Rotation);
    GreenPart.GetComponent<Translation_Part2>().GetActionFromMLAgent(TranslatePart2);
    YellowPart.GetComponent<Translation_Part4>().GetActionFromMLAgent(TranlationPart4);
    }

    }



    Since the machine is connected through joints, I can't just reset the env instantly when the episode ends, so when a new episode begins and put allowAction and allowRewards on false, move the machine back to the initial position, then put them back on true.

    I log the accumulated rewards every 6 seconds and here is now it should look like :
    -87.76775
    -262.5469
    -473.724
    -685.0331
    -896.2274
    -1107.533

    The problem is that out of the blue, some players start to get stuck at 0 :

    0
    0
    0
    0
    0
    0
    0
    0

    which should not happen.
    Does anyone have an idea what might be the cause ?
     
  2. yasinmasmoudi

    yasinmasmoudi

    Joined:
    Nov 15, 2021
    Posts:
    6
    These are the parameters that I am using :
    behaviors:
    PlaceSegment:
    trainer_type: ppo
    hyperparameters:
    batch_size: 5120
    buffer_size: 409600
    learning_rate: 1e-5
    beta: 5.0e-4
    epsilon: 0.2
    lambd: 0.99
    num_epoch: 10
    learning_rate_schedule: linear
    beta_schedule: constant
    epsilon_schedule: linear
    network_settings:
    normalize: false
    hidden_units: 128
    num_layers: 2
    reward_signals:
    extrinsic:
    gamma: 0.99
    strength: 1.0
    max_steps: 500000000
    time_horizon: 64
    summary_freq: 5000
     
  3. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191
    I assume you've solved this by now, but if you haven't, I reckon you want to simplify this down to the simplest possible scenario that reproduces the problem. This will have two advantages:
    1. it's reasonably likely this will make the problem obvious to yourself
    2. it will make the problem easier for us to see too
     
  4. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191
    (also note that if you use the [ code ] tags, your code would be much easier to read :)
     
  5. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191
    this bit of code:

    Code (csharp):
    1.  
    2. if (epStarted)
    3. {
    4. // Rotation Reward
    5.  
    6. segRotation = segment.transform.rotation.eulerAngles.z;
    7. if (segRotation < angRef)
    8. segRotation = segRotation + 360;
    9. angDistance = target.transform.rotation.eulerAngles.z - segRotation;
    10. if(allowRewards)
    11. AddReward(-Mathf.Abs(angDistance) / 360)
    12.  
    13. //distance Reward
    14.  
    15. linearDistanceZ = segment.transform.position.z - target.transform.position.z;
    16. if(allowRewards)
    17. AddReward(-Mathf.Abs(linearDistanceZ) / 20);
    18. }
    19.  
    presumably is part of afunction? but you don't say which one.
     
  6. yasinmasmoudi

    yasinmasmoudi

    Joined:
    Nov 15, 2021
    Posts:
    6
    Hey thank you for the answer, no I haven't solved it yet. Sadly I still couldn't pinpoint the problem so I can't simplify it further.


    This is part of the fixedUpdate funtion
     
  7. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191
    to simplify it:
    1. remove someting from your system
    ----- does the problem still exist?
    --------- yes: you have created a simpler system that reproduces the problem! Go back to step 1, using this new system
    --------- no: maybe the problem is caused by the thing you removed? Try going back to step 1, and removing something else instead this time
     
  8. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191