Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Resolved Ml-agent stuck on the same behaviour

Discussion in 'ML-Agents' started by meldeg, Feb 10, 2023.

  1. meldeg


    Jan 22, 2021
    I have made an simple Ml agent using Ml-agent. But I have come to a standstill. The environment it should learn is basically to pick ut a treasure in the opponent's base and then return it to its own base and repeat this 3 times. After around some time it either decieds to move against the opponent's base or its own base and it just do the same thing over and over again. (see attached pictures).
    Screenshot 2023-02-10 101557.jpg

    I have also tried to increase the max step of an agent from 5k to 10k without any change in behavior.

    When I run the game myself it takes around 50 sek (2500 steps) to complete it.

    I have also tried to give it a small negative reward when it is opponents base area with a treasure, but that didn't change the behavior. How could I do so the agent wants to go in the other direction and have a more stable training process?

    See gihub repo under PlayerAgent.cs the observations it does etc.

    Github repo:

    You need to import ML-agent 1.0.8 Package manually in the project.

    Rewards value:
    Code (CSharp):
    1.     public float rewardInsideTreasureChamber = 0.5f;
    2.     public float rewardtakingTreasureFromTreasureChamber = 2f;
    3.     public float rewardtakingTreasureToOwnTreasureChamber = 4f;
    4.     public float penaltyRunningIntoBoundary = -0.4f;
    5.     public float penaltyTreasureInEnemyAreaWithTreasure = -0.0001f;
    ML-agent config
    Code (CSharp):
    1. default:
    2.     trainer: ppo
    3.     batch_size: 1024
    4.     beta: 5.0e-3
    5.     buffer_size: 10240
    6.     epsilon: 0.2
    7.     hidden_units: 128
    8.     lambd: 0.95
    9.     learning_rate: 3.0e-4
    10.     learning_rate_schedule: linear
    11.     max_steps: 5.0e5
    12.     memory_size: 128
    13.     normalize: false
    14.     num_epoch: 3
    15.     num_layers: 2
    16.     time_horizon: 64
    17.     sequence_length: 64
    18.     summary_freq: 20000
    19.     use_recurrent: false
    20.     vis_encode_type: simple
    21.     reward_signals:
    22.         extrinsic:
    23.             strength: 1.0
    24.             gamma: 0.99
    25.         curiosity:
    26.             strength: 0.02
    27.             gamma: 0.99
    28.             encoding_size: 64
    29.             learning_rate: 3.0e-3
    31. PlayerAgent:
    32.     time_horizon: 256
    33.     batch_size: 4096
    34.     buffer_size: 40960
    35.     hidden_units: 512
    36.     max_steps: 5.0e6
    37.     beta: 7.5e-3
    Last edited: Feb 15, 2023
  2. meldeg


    Jan 22, 2021
    Can someone please let me know if there is anything I need to explain more to make my question easier to answer.
  3. lycettthomas94


    Jun 13, 2020
    one issue i see is you're only observing the direction of the chambers in the forward vector, the agent won't be able to tell if it's left or right since the dot product will be identical either way. you need to add an observation of the dot product of transform.right in the same way you've done it with transform.forward
  4. lycettthomas94


    Jun 13, 2020
    in this example you can see 90 degrees left and 90 degrees right will both equal 0:
  5. meldeg


    Jan 22, 2021
    I solved it by increasing the length of the Raycast.