Search Unity

Question Training does not resume correctly with SAC

Discussion in 'ML-Agents' started by MBZajac, Apr 8, 2022.

  1. MBZajac

    MBZajac

    Joined:
    Jul 22, 2021
    Posts:
    2
    Hello, I trained a model with SAC for 6 milion steps. When I want to continue the training using --initialize-from, the training is working properly for about 2 seconds (agent behaves as in the previous training with high reward +0.5) and after that it is like starting from the beginning, (the reward is below -0.2). It happens also to --resume. I also trained using PPO algorithm with exactly the same environment and rewards and it resumes correctly but SAC is starting from scratch.
    The blue chart is the initialize-from training and pink chart is the previous training. As you can see there is a big peak at the beginning of the blue and then huge drop to the level where the pink model started.
    upload_2022-4-8_14-37-14.png

    ml-agents: 0.28.0,
    ml-agents-envs: 0.28.0,
    Communicator API: 1.5.0,
    PyTorch: 1.7.1+cu110
    Unity ML-Agents Version 2.1.0-exp.1 - June 09, 2021

    My yaml file:
    default_settings: null
    behaviors:
    CarAgent:
    trainer_type: sac
    hyperparameters:
    learning_rate: 0.0003
    learning_rate_schedule: constant
    batch_size: 256
    buffer_size: 500000
    buffer_init_steps: 1000
    tau: 0.005
    steps_per_update: 18.0
    save_replay_buffer: true
    init_entcoef: 1.0
    reward_signal_steps_per_update: 18.0
    network_settings:
    normalize: false
    hidden_units: 512
    num_layers: 3
    vis_encode_type: simple
    memory: null
    goal_conditioning_type: hyper
    deterministic: false
    reward_signals:
    extrinsic:
    gamma: 0.97
    strength: 1.0
    network_settings:
    normalize: false
    hidden_units: 128
    num_layers: 2
    vis_encode_type: simple
    memory: null
    goal_conditioning_type: hyper
    deterministic: false
    init_path: null
    keep_checkpoints: 5
    checkpoint_interval: 6000000
    max_steps: 6000000
    time_horizon: 64
    summary_freq: 10000
    threaded: false
    self_play: null
    behavioral_cloning: null
    env_settings:
    env_path: null
    env_args: null
    base_port: 5005
    num_envs: 1
    num_areas: 1
    seed: -1
    max_lifetime_restarts: 10
    restarts_rate_limit_n: 1
    restarts_rate_limit_period_s: 60
    engine_settings:
    width: 84
    height: 84
    quality_level: 2
    time_scale: 20.0
    target_frame_rate: -1
    capture_frame_rate: 60
    no_graphics: false
    environment_parameters: null
    checkpoint_settings:
    run_id: null
    initialize_from: null
    load_model: false
    resume: false
    force: false
    train_model: false
    inference: false
    results_dir: results
    torch_settings:
    device: null
    debug: false
     
    Last edited: Apr 8, 2022
  2. MBZajac

    MBZajac

    Joined:
    Jul 22, 2021
    Posts:
    2
    There is a resumed training using -resume. As you can see the first mean reward is high and then drops.
    upload_2022-4-8_14-45-40.png
    upload_2022-4-8_14-46-58.png
     
  3. arrghhhh

    arrghhhh

    Joined:
    Jul 28, 2022
    Posts:
    2
    Hey, have you figured it out? I'm having the same issue.
    In a resumed training, After
    buffer_size
    number of steps passes, my model basically resets, and reward plunges. It looks kinda like this
    reward.png
    Around the step 2.53M here, the buffer is filled up, and the model loses its' mind.
    Also, at the same time this error got logged in the console:

    c:\programs\virtualenv\python-envs\ml-agents\lib\site-packages\mlagents\trainers\torch\utils.py:309: UserWarning: This overload of nonzero is deprecated:
    nonzero()
    Consider using one of the following signatures instead:
    nonzero(*, bool as_tuple) (Triggered internally at ..\torch\csrc\utils\python_arg_parser.cpp:882.)
    res += [data[(partitions == i).nonzero().squeeze(1)]]
    It's unlikely that a deprecated overload could cause this, but who knows.