Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question tips for getting agent to learn to push ball into goal

Discussion in 'ML-Agents' started by Digineaux, Aug 4, 2023.

  1. Digineaux

    Digineaux

    Joined:
    Jul 19, 2015
    Posts:
    18
    im having trouble getting the agent to make any progress and even more trouble figuring out exactly where im going wrong. are my rewards too sparse? my training config full of bad values? have i messed up the logic in my code somewhere? does it just need more time? etc..

    The enviroment is 4 rectangles with colliders forming a large box. At the end of an episode this arena resets. The player and ball are moved to random locations within it and one of the walls is picked to be the goal. The arena resets after 90 seconds or if the player scores.

    if the player scores they get 1 reward. every second they recieve -0.001 reward. I tried without the negative reward without success.

    The player observse it's position(vector2) and rotation(float) and thatof the ball. it also observes the current goal walls position.

    it has only 2 actions. move vertically and move horizontally.
    Code (CSharp):
    1. public class MLAPlayerBasic : Agent
    2. {
    3.     Rigidbody2D rb;
    4.    [SerializeField] GameMaster gm;
    5.     float resetTime = 90;
    6.         [SerializeField] float currentTimer = 0;
    7.     Vector2 nextForce;
    8.  
    9.     protected override void OnEnable()
    10.     {
    11.         base.OnEnable();
    12.         rb = GetComponent<Rigidbody2D>();
    13.        
    14.     }
    15.     private void Start()
    16.     {
    17.         gm = GetComponentInParent<GameMaster>();
    18.     }
    19.  
    20.     public override void OnEpisodeBegin()
    21.     {
    22.         base.OnEpisodeBegin();
    23.         currentTimer = 0;
    24.     }
    25.     public override void CollectObservations(VectorSensor sensor)
    26.     {
    27.         base.CollectObservations(sensor);
    28.         sensor.AddObservation(rb.position);//2
    29.         sensor.AddObservation(rb.rotation);//3
    30.         sensor.AddObservation(gm.currentBall.GetComponent<Rigidbody2D>().position);//5
    31.         sensor.AddObservation(gm.currentBall.GetComponent<Rigidbody2D>().rotation);//6
    32.         sensor.AddObservation(gm.currentGoal.transform.position);//9
    33.     }
    34.  
    35.     public override void OnActionReceived(ActionBuffers actions)
    36.     {
    37.         base.OnActionReceived(actions);
    38.  
    39.         nextForce = Vector2.zero;
    40.         nextForce.y = actions.ContinuousActions[0];
    41.         nextForce.x = actions.ContinuousActions[1];
    42.     }
    43.  
    44.     private void FixedUpdate()
    45.     {
    46.         rb.AddForce((nextForce *50) * Time.fixedDeltaTime);
    47.     }
    48.  
    49.     private void Update()
    50.     {
    51.         currentTimer += 1 * Time.deltaTime;
    52.         if (currentTimer > resetTime)
    53.         {
    54.             gm.ResetLevel();
    55.             currentTimer = 0;
    56.         }
    57.         AddReward(-0.001f * Time.deltaTime);
    58.     }
    59. }
    The training config:
    Code (Boo):
    1. default_settings: null
    2. behaviors:
    3.   MLAPlayerBasic:
    4.     trainer_type: ppo
    5.     hyperparameters:
    6.       batch_size: 5120
    7.       buffer_size: 409600
    8.       learning_rate: 0.0003
    9.       beta: 0.004
    10.       epsilon: 0.2
    11.       lambd: 0.95
    12.       num_epoch: 3
    13.       shared_critic: false
    14.       learning_rate_schedule: linear
    15.       beta_schedule: linear
    16.       epsilon_schedule: linear
    17.     network_settings:
    18.       normalize: true
    19.       hidden_units: 128
    20.       num_layers: 2
    21.       vis_encode_type: simple
    22.       memory: null
    23.       goal_conditioning_type: hyper
    24.       deterministic: false
    25.     reward_signals:
    26.       extrinsic:
    27.         gamma: 0.99
    28.         strength: 1.0
    29.         network_settings:
    30.           normalize: false
    31.           hidden_units: 256
    32.           num_layers: 2
    33.           vis_encode_type: simple
    34.           memory: null
    35.           goal_conditioning_type: hyper
    36.           deterministic: false
    37.     init_path: null
    38.     keep_checkpoints: 5
    39.     checkpoint_interval: 5000000
    40.     max_steps: 5000000000
    41.     time_horizon: 64
    42.     summary_freq: 50000
    43.     threaded: false
    44.     self_play: null
    45.     behavioral_cloning: null
    46. env_settings:
    47.   env_path: null
    48.   env_args: null
    49.   base_port: 5005
    50.   num_envs: 1
    51.   num_areas: 1
    52.   seed: -1
    53.   max_lifetime_restarts: 10
    54.   restarts_rate_limit_n: 1
    55.   restarts_rate_limit_period_s: 60
    56. engine_settings:
    57.   width: 2460
    58.   height: 1300
    59.   quality_level: 5
    60.   time_scale: 20.0
    61.   target_frame_rate: -1
    62.   capture_frame_rate: 60
    63.   no_graphics: false
    64. environment_parameters: null
    65. checkpoint_settings:
    66.   run_id: test4
    67.   initialize_from: null
    68.   load_model: false
    69.   resume: false
    70.   force: true
    71.   train_model: false
    72.   inference: false
    73.   results_dir: results
    74. torch_settings:
    75.   device: null
    76. debug: false
    77.  
    here is a short unlisted video fo the scene next tot the python/anaconda terminal. it may take some time to finish upload

     
  2. smallg2023

    smallg2023

    Joined:
    Sep 2, 2018
    Posts:
    130
    rather than give positions (giving world positions is especially unhelpful for AI learning) give the direction to the ball & goal (i.e. position2 - position1).
    also if your agent can't rotate why give it observations for the rotation?

    personally i would pass it
    1. direction from agent to the ball
    2. direction from agent to the goal
    3. direction from the ball to the goal (this one might not technically be needed as it could work it out from the other 2 but it is likely to make things easier)
     
  3. Digineaux

    Digineaux

    Joined:
    Jul 19, 2015
    Posts:
    18
    whoops rotation is typo, leftover from a previous system. Working with rigidbodies makes it diffcult to use local positions, the physics engine rarley gives or wants local positions, meaning you have to convert it yourself everytime. It also increases overhead when training hundreds of agents. but ill give it go.
    The concern i had is that passing just the position might not be helpful given it only points to the center of the object and the ai has no idea it spans the width of the field. So id hoped the raycast sensor would take care of this, admittedly i dont fully understand exactly what properties it observes.
    Passin the directions between the ball is a great idea tho!