Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Question Behavior Name DOES match the one in the trainer configuration file but default config is still used?

Discussion in 'ML-Agents' started by theriser777, Nov 1, 2022.

  1. theriser777

    theriser777

    Joined:
    Feb 2, 2020
    Posts:
    11
    Sorry for the weird phrasing on the title, had to keep it short.

    I want to change the max_step parameter in the configurations, and despite the Behavior Name in Unity matching the one specified in configuration.yaml (that name being "WarshAgent"), MLAgents still insists on hitting me with this:

    [WARNING] Behavior name WarshAgent does not match any behaviors specified in the trainer configuration file. A default configuration will be used.

    And then uses a default configuration, so everytime I changed the max_step, it would get changed back to the default value.

    I did try using the command "python -m mlagents.trainers.upgrade_config configuration.yaml
    WarshAgent.yaml"
    but all that did was copy the content of configuration.yaml to WarshAgent.yaml, even when I deleted configuration.yaml it still created a new one and completely ignored WarshAgent.yaml.

    Here are more details:

    Behavior Parameters:


    configuration.yaml:
    default_settings: null
    behaviors:
    WarshAgent:
    trainer_type: ppo
    hyperparameters:
    batch_size: 1024
    buffer_size: 10240
    learning_rate: 0.0003
    beta: 0.005
    epsilon: 0.2
    lambd: 0.95
    num_epoch: 3
    learning_rate_schedule: linear
    beta_schedule: linear
    epsilon_schedule: linear
    network_settings:
    normalize: false
    hidden_units: 128
    num_layers: 2
    vis_encode_type: simple
    memory: null
    goal_conditioning_type: hyper
    deterministic: false
    reward_signals:
    extrinsic:
    gamma: 0.99
    strength: 1.0
    network_settings:
    normalize: false
    hidden_units: 128
    num_layers: 2
    vis_encode_type: simple
    memory: null
    goal_conditioning_type: hyper
    deterministic: false
    init_path: null
    keep_checkpoints: 5
    checkpoint_interval: 500000
    max_steps: 500000
    time_horizon: 64
    summary_freq: 50000
    threaded: false
    self_play: null
    behavioral_cloning: null
    env_settings:
    env_path: null
    env_args: null
    base_port: 5005
    num_envs: 1
    num_areas: 1
    seed: -1
    max_lifetime_restarts: 10
    restarts_rate_limit_n: 1
    restarts_rate_limit_period_s: 60
    engine_settings:
    width: 84
    height: 84
    quality_level: 5
    time_scale: 20
    target_frame_rate: -1
    capture_frame_rate: 60
    no_graphics: false
    environment_parameters: null
    checkpoint_settings:
    run_id: ppo
    initialize_from: null
    load_model: false
    resume: true
    force: false
    train_model: false
    inference: false
    results_dir: results
    torch_settings:
    device: null
    debug: false

    WarshAgent.yaml:
    behaviors:
    WarshAgent:
    trainer_type: ppo
    hyperparameters:
    batch_size: 1024
    buffer_size: 10240
    learning_rate: 0.0003
    beta: 0.005
    epsilon: 0.2
    lambd: 0.95
    num_epoch: 3
    learning_rate_schedule: linear
    beta_schedule: linear
    epsilon_schedule: linear
    network_settings:
    normalize: false
    hidden_units: 128
    num_layers: 2
    vis_encode_type: simple
    goal_conditioning_type: hyper
    deterministic: false
    reward_signals:
    extrinsic:
    gamma: 0.99
    strength: 1.0
    network_settings:
    normalize: false
    hidden_units: 128
    num_layers: 2
    vis_encode_type: simple
    goal_conditioning_type: hyper
    deterministic: false
    keep_checkpoints: 5
    checkpoint_interval: 500000
    max_steps: 900000000
    time_horizon: 64
    summary_freq: 50000
    threaded: false
    env_settings:
    base_port: 5005
    num_envs: 1
    num_areas: 1
    seed: -1
    max_lifetime_restarts: 10
    restarts_rate_limit_n: 1
    restarts_rate_limit_period_s: 60
    engine_settings:
    width: 84
    height: 84
    quality_level: 5
    time_scale: 20
    target_frame_rate: -1
    capture_frame_rate: 60
    no_graphics: false
    checkpoint_settings:
    run_id: ppo
    load_model: false
    resume: true
    force: false
    train_model: false
    inference: false
    results_dir: results
    torch_settings: {}
    debug: false

    What am I doing wrong? Or am is my understanding of how this works is just wrong?