Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Move two boxes environment

Discussion in 'ML-Agents' started by YorYYi, Aug 2, 2022.

  1. YorYYi

    YorYYi

    Joined:
    May 2, 2022
    Posts:
    7
    Hello, I am developing an agent that learns to move two box near a platform where a coin is placed. The agent must jump on the smallest box first, then in the big box and then on the platform to collect the coin. I am using visual observation. I set a reward of 1 when the agent gets on the first box , and a reward of -1 if he falls off of it, and the same for the second box but set a -2 reward if it falls to the ground. The last reward is obtained when the agents collect the coin. If the agent does nothing I set a reward of -1/MaxStep. I am using PPO algorithm with curriculum learning progressively moving the boxes away from the platform. The problem is that the agent does not learn, any suggestion?

    I am working with a 3D environment.

    This is the config file:

    behaviors:
    SimpleCollector:
    trainer_type: ppo
    hyperparameters:
    batch_size: 128
    buffer_size: 2048
    learning_rate: 0.0003
    beta: 0.005
    epsilon: 0.2
    lambd: 0.95
    num_epoch: 3
    learning_rate_schedule: linear
    network_settings:
    normalize: false
    hidden_units: 256
    num_layers: 2
    vis_encode_type: simple
    reward_signals:
    extrinsic:
    gamma: 0.99
    strength: 1.0
    curiosity:
    strength: 0.02
    gamma: 0.99
    encoding_size: 256
    learning_rate: 3.0e-4
    keep_checkpoints: 5
    max_steps: 20000000
    time_horizon: 128
    summary_freq: 20000
    threaded: true
    environment_parameters:
    level:
    curriculum:
    - name: Lesson0 # This is the start of the second lesson
    completion_criteria:
    measure: reward
    behavior: SimpleCollector
    signal_smoothing: true
    min_lesson_length: 100
    threshold: 2.1
    value: 0.0
    - name: Lesson1
    completion_criteria:
    measure: reward
    behavior: SimpleCollector
    signal_smoothing: true
    min_lesson_length: 100
    threshold: 2.1
    value: 1.0
    - name: Lesson2
    completion_criteria:
    measure: reward
    behavior: SimpleCollector
    signal_smoothing: true
    min_lesson_length: 100
    threshold: 2.1
    value: 2.0
    - name: Lesson3
    completion_criteria:
    measure: reward
    behavior: SimpleCollector
    signal_smoothing: true
    min_lesson_length: 100
    threshold: 2.1
    value: 3.0
    - name: Lesson4
    completion_criteria:
    measure: reward
    behavior: SimpleCollector
    signal_smoothing: true
    min_lesson_length: 100
    threshold: 2.1
    value: 4.0
    - name: Lesson5
    completion_criteria:
    measure: reward
    behavior: SimpleCollector
    signal_smoothing: true
    min_lesson_length: 100
    threshold: 2.1
    value: 5.0
    - name: Lesson6
    completion_criteria:
    measure: reward
    behavior: SimpleCollector
    signal_smoothing: true
    min_lesson_length: 100
    threshold: 2.1
    value: 6.0
    - name: Lesson7
    value: 7.0