Search Unity

Question Help with ML-Agents config file

Discussion in 'ML-Agents' started by dmitryponv, Dec 9, 2023.

  1. dmitryponv

    dmitryponv

    Joined:
    Mar 9, 2021
    Posts:
    19
    I made a simple Unity 3D scene with a quadcopter object that needs to fly to a target

    To simplify it, I turned off gravity on the agent

    To simplify it even further, I did not use force vectors on 4 motors, instead I used a force vector at the center of the agent, defined by a continuous action in magnitude

    To simplify it even further, I did not angle the force based on rotation of the agent, rather I used a Vector3, and each of the vector variables is a continuous action

    The reward is also simple, its (50-distance) to the target. The target is 50 units away.

    The problem is, when I train, it seems that all the copies of the agent accelerate in one direction, nowhere close to the target, and dont stop.

    Its like they dont even care about the reward at all,
    when I print continuous actions, they seem to gravitate toward -1 or 1

    Im thinking maybe its in the config file,

    What am I doing wrong here, in the code and in config?


    Any help would be appreciated

    This is what it looks like with them flying in every direction
    upload_2023-12-8_17-5-54.png


    The code is below

    Code (CSharp):
    1. using System.Collections;
    2. using System.Collections.Generic;
    3. using UnityEngine;
    4. using Unity.MLAgents;
    5. using Unity.MLAgents.Actuators;
    6. using Unity.MLAgents.Sensors;
    7. using Unity.VisualScripting;
    8.  
    9. public class DroneAgent : Agent
    10. {
    11.     [SerializeField] private GameObject target;
    12.  
    13.     float power_c = 0.0f;
    14.  
    15.     Vector3 startPosition = new Vector3(0, 0, 0);
    16.     Quaternion startRotation = new Quaternion();
    17.  
    18.     public override void CollectObservations(VectorSensor sensor)
    19.     {
    20.         sensor.AddObservation((Vector3)transform.localPosition);
    21.     }
    22.     public override void OnActionReceived(ActionBuffers actions)
    23.     {
    24.         power_c = actions.ContinuousActions[0];
    25.         float x = actions.ContinuousActions[1];
    26.         float y = actions.ContinuousActions[2];
    27.         float z = actions.ContinuousActions[3];
    28.  
    29.         gameObject.GetComponent<Rigidbody>().AddForceAtPosition(new Vector3(x,y,z) * power_c, transform.position);
    30.  
    31.         float dist = Vector3.Distance(target.transform.position, transform.position);
    32.         float dist_reward = (50 - dist);
    33.  
    34.         AddReward(dist_reward);
    35.  
    36.         if (dist > 100f)
    37.         {
    38.             EndEpisode();
    39.         }
    40.  
    41.     }
    42.  
    43.  
    44.     public override void OnEpisodeBegin()
    45.     {
    46.         transform.position = startPosition;
    47.         transform.rotation = startRotation;
    48.     }
    49.  
    50.     public override void Heuristic(in ActionBuffers actionsOut)
    51.     {
    52.     }
    53.  
    54.     void Start()
    55.     {
    56.         startPosition = transform.position;
    57.         startRotation = transform.rotation;
    58.     }
    59. }

    And config file is here

    Although I've also tried not using it with whatever default parameters mlagents-learn runs


    behaviors:
    My Behavior:
    trainer_type: ppo
    hyperparameters:
    batch_size: 2048
    buffer_size: 20480
    learning_rate: 0.0003
    beta: 0.005
    epsilon: 0.2
    lambd: 0.95
    num_epoch: 3
    learning_rate_schedule: linear
    network_settings:
    normalize: false
    hidden_units: 256
    num_layers: 2
    vis_encode_type: simple
    reward_signals:
    extrinsic:
    gamma: 0.99
    strength: 1.0
    keep_checkpoints: 5
    max_steps: 5000000
    time_horizon: 128
    summary_freq: 10000
    threaded: true