Search Unity

  1. We are migrating the Unity Forums to Unity Discussions by the end of July. Read our announcement for more information and let us know if you have any questions.
    Dismiss Notice
  2. Dismiss Notice

Question Agent Struggling to Learn Multi-Step Task

Discussion in 'ML-Agents' started by bahaagh7, May 23, 2023.

  1. bahaagh7


    Aug 29, 2019
    I have been training an agent to stack blocks in order to reach a target positioned on a higher surface. For example, the agent needs to stack three blocks in the shape of stairs and then jump on them to reach the target. However, despite trying various methods such as imitation learning and curriculum learning, the agent does not seem to perform the task successfully.

    Given the sparse nature of rewards in the task, where the agent must learn to stack three blocks correctly to ultimately receive a reward upon colliding with the target. To address this, I have provided rewards to the agent for correctly placing the blocks in their designated locations.

    I have trained the agents multiple times, with some of them reaching up to 200 million steps during the training process. Despite the extensive training, the agents still face challenges in successfully completing the task.

    Obsevations: (All observations are in the agents local space)
    Agent: Vector3 velocity (the agent's own velocity)
    bool isGrounded (whether the agent is mid air or not)
    bool isHoldingObject (whether the agent is holding an object or not)
    Code (CSharp):
    1. transform.InverseTransformDirection(agentRb.velocity);
    Target: Vector3 position
    Code (CSharp):
    1. transform.InverseTransformDirection(target.position - transform.position);
    Spawner: Vector3 position (The spawner is a GameObject that generates a new block when an existing block is removed or taken)
    Code (CSharp):
    1. transform.InverseTransformDirection(spawner.position - transform.position);

    I later added observations for the block positions to a BufferSensor

    Note: The environment I used for training is relatively small and has no obstacles, only consists of the objects mentioned earlier. So, the agent does not need to search for any objects within the environment.

    Tags: "target", "block", "spawner"

    Discrete Actions:
    Move: 0 to stop, 1 to move forward
    Rotate: 0 for no rotation, 1 to rotate right, 2 rotate left
    Jump: 0 for not jumping, 1 for jumping
    Pickup: 0 to drop if holding an object, 1 to pickup if not holding an object

    Config File:
    Code (CSharp):
    1. behaviors:
    2.   BlockBuilder:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 128
    6.       buffer_size: 2048
    7.       learning_rate: 0.0003
    8.       beta: 0.01
    9.       epsilon: 0.2
    10.       lambd: 0.95
    11.       num_epoch: 3
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: false
    15.       hidden_units: 512 # tried 256 with bigger layers
    16.       num_layers: 4 # tried 2, 3, and 8
    17.       vis_encode_type: simple
    18.     reward_signals:
    19.       extrinsic:
    20.         gamma: 0.99
    21.         strength: 1.0
    22.       rnd:
    23.         gamma: 0.99
    24.         strength: 0.01
    25.         network_settings:
    26.           hidden_units: 64
    27.           num_layers: 3
    28.         learning_rate: 0.0001
    29.     keep_checkpoints: 5
    30.     max_steps: 200000000
    31.     time_horizon: 128 # I also tried 500
  2. GamerLordMat


    Oct 10, 2019
    as a first thing try to change normalization to True. Also your problem seems very hard, IDK that it can work like this
  3. ice_creamer


    Jul 28, 2022
    hi,Recently i learned curriculum learning. if there are two behaviors, one is trained by CL, another not. How can achieve it? I found CL setting in trainer.yaml is global.
  4. GamerLordMat


    Oct 10, 2019

    idk sorry. The question is if it's problematic for your case to just read out the values in Unity when one of them reaches a certain threshold
  5. smallg2023


    Sep 2, 2018
    edit: posted in your thread instead
  6. ice_creamer


    Jul 28, 2022
    Hmmm... My mean there are two behavior(brains) to be trained. I want to train one brain with CL and another not. Usually, We configure it in a YAML file.Is that right? Currently with my knowledge and knowledge I don't know of other configuration methods.I tried a lot as follows:
    CL is subordinate to environmental parameters. In the training file, the level and behavior of the environment parameters are the same. I initially thought that the name of the behavior in the YAML determines which behavior uses the CL, but it turned out that was not the case. It affects both behaviors. After that, I tried to subordinate the environment variable to the behavior I wanted to train with the course learning, but the mlagent reported an error, and the document specifically emphasized that the course learning is only subordinate to the environment parameter, that is, it cannot be further subordinate to a certain behavior.
  7. Energymover


    Mar 28, 2023
    Case you hadn't seen it in the showcase sticky, someone did something a bit similar.