Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Problem on CollectObservations() of agents for learning to form the formation of UAVs.

Discussion in 'ML-Agents' started by mehdi_1234, Jun 16, 2023.

  1. mehdi_1234

    mehdi_1234

    Joined:
    Jun 16, 2023
    Posts:
    4
    I don't know what information from the agents should be used for the observation of agents.
    I use 6 Ray Perception Sensor 3D for agents, 3 for the front, and 3 for behind with different directions.


    Code (CSharp):
    1.     public override void CollectObservations(VectorSensor sensor)
    2.     {
    3.         sensor.AddObservation(Distance1);
    4.         sensor.AddObservation(Distance2);
    5.         sensor.AddObservation(Distance3);
    6.         sensor.AddObservation(Distance4);
    7.         sensor.AddObservation(Distance5);
    8.         sensor.AddObservation(Distance6);
    9.        
    10.        
    11.         //sensor.AddObservation(transform.InverseTransformDirection(area.AgentsList[0].Rb.position));
    12.         //sensor.AddObservation(transform.InverseTransformDirection(area.AgentsList[1].Rb.position));
    13.         //sensor.AddObservation(transform.InverseTransformDirection(area.AgentsList[2].Rb.position));
    14.         //sensor.AddObservation(transform.InverseTransformDirection(area.AgentsList[3].Rb.position));
    15.  
    16.         sensor.AddObservation(transform.InverseTransformDirection(area.AgentsList[0].Rb.velocity));
    17.         sensor.AddObservation(transform.InverseTransformDirection(area.AgentsList[1].Rb.velocity));
    18.         sensor.AddObservation(transform.InverseTransformDirection(area.AgentsList[2].Rb.velocity));
    19.         sensor.AddObservation(transform.InverseTransformDirection(area.AgentsList[3].Rb.velocity));
    20. }
    When I uncomment the lines related to the position, the results in learning get worse instead of better.
    I have two scripts code, one of them is for environment control, and the other is for the behavior of the agent.
    My configuration parameter is as follows:

    Code (Boo):
    1. behaviors:
    2.   FormationFly1:
    3.     trainer_type: poca
    4.     hyperparameters:
    5.       batch_size: 2048
    6.       buffer_size: 20480
    7.       learning_rate: 0.0003
    8.       beta: 0.005
    9.       epsilon: 0.2
    10.       lambd: 0.95
    11.       num_epoch: 3
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: false
    15.       hidden_units: 512
    16.       num_layers: 2
    17.       vis_encode_type: simple
    18.       memory:
    19.         sequence_length: 64
    20.         memory_size: 256
    21.     reward_signals:
    22.       extrinsic:
    23.         gamma: 0.99
    24.         strength: 1.0
    25.     keep_checkpoints: 5
    26.     max_steps: 15000000
    27.     time_horizon: 128
    28.     summary_freq: 50000
    I use the ml-agents v2.3.0-exp.3 package
     
    Last edited: Jun 16, 2023
  2. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    You're transforming the position into a local-relative directional vector which does not convey a local-relative position. You probably meant to use -
    sensor.AddObservation(transform.InverseTransformPoint(area.AgentsList[0].Rb.position));
     
  3. mehdi_1234

    mehdi_1234

    Joined:
    Jun 16, 2023
    Posts:
    4
    I used
    sensor.AddObservation(transform.InverseTransformPoint(area.AgentsList[0].Rb.position));
    sensor.AddObservation(transform.InverseTransformPoint(area.AgentsList[1].Rb.position));
    sensor.AddObservation(transform.InverseTransformPoint(area.AgentsList[2].Rb.position));
    sensor.AddObservation(transform.InverseTransformPoint(area.AgentsList[3].Rb.position));

    but unfortunately, the problem was not solved.
     
  4. Energymover

    Energymover

    Joined:
    Mar 28, 2023
    Posts:
    33
    The other piece missing here is the rewarding. For all we know you may be rewarding them for not being near each other. What kind of sticks/carrots you using for the model.