Search Unity

Question Changes in config file

Discussion in 'ML-Agents' started by TulioMMo, Apr 30, 2021.

  1. TulioMMo

    TulioMMo

    Joined:
    Dec 30, 2020
    Posts:
    29
    Dear all,

    I have updated my version of ml-agents to 0.26.0 and noticed some changes to the configuration file. I have a couple of questions regarding the differences between the first network (see image) and the extra network that appears under "reward signals".

    For the case of PPO, does the first network represents the "actor" or policy network?

    Does the network for reward signals represents the "critic network"? If so, is the critic network parametrizing the value function V(s) or the state-action value function Q(a,s)? Are there any links available for the pytorch implementation? :)

    Should both these networks have similar design? i.e. number of layers, hidden units.

    From my understanding, "goal_conditioning_type = hyper" generates yet another neural network. In the image below, I understand that 2 hyper-networks are being generated (one for each network). Is that correct? If so, what is the configuration (num of layers/hidden units) of these hyper networks?


    Sorry for the long post and many thanks!

    upload_2021-4-30_17-0-2.png
     

    Attached Files:

  2. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    Hi,
    The #1 network settings is used for both the actor and the critic. #2 is unused in the case of extrinsic reward because the extrinsic reward is given by the environment. Other reward signals such as GAIL or RND use a neural network and the settings #2 are used for these networks.
    You can (and should) remove the whole #2 network settings as it is not used at all in your case.
    I hope this clarifies it and sorry for the confusion.
     
  3. TulioMMo

    TulioMMo

    Joined:
    Dec 30, 2020
    Posts:
    29
    Thank you for the reply! I have tried removing the second network from trainer_config.yaml file, however, when I start running Unity ML-Agents, the second network still appears...

    trainer_config.yaml:

    Code (CSharp):
    1. behaviors:
    2.   BehaviorPPO:
    3.     trainer_type: ppo
    4.  
    5.     hyperparameters:
    6.       # Hyperparameters common to PPO and SAC
    7.       batch_size: 256
    8.       buffer_size: 4056
    9.       learning_rate: 1.5e-4
    10.       learning_rate_schedule: linear
    11.  
    12.       # PPO-specific hyperparameters
    13.       # Replaces the "PPO-specific hyperparameters" section above
    14.       beta: 4.6e-2
    15.       epsilon: 0.2
    16.       lambd: 0.98
    17.       num_epoch: 6
    18.  
    19.     # Configuration of the neural network (common to PPO/SAC)
    20.     network_settings:
    21.       # vis_encoder_type: simple
    22.       normalize: true
    23.       hidden_units: 48
    24.       num_layers: 4
    25.       goal_conditioning_type: hyper
    26.       # memory
    27.       #memory:
    28.       #  sequence_length: 16
    29.       #  memory_size: 48
    30.  
    31.     # Trainer configurations common to all trainers
    32.     max_steps: 4.0e7
    33.     time_horizon: 54
    34.     summary_freq: 40000
    35.     # keep_checkpoints: 5
    36.     # checkpoint_interval: 50000
    37.     # threaded: true
    38.     # init_path: null
    39.  
    40.     reward_signals:
    41.       # environment reward (default)
    42.       extrinsic:
    43.         strength: 1.0
    44.         gamma: 0.97
    45.         #network_settings:
    46.           #normalize: false
    47.           #hidden_units: 128
    48.           #num_layers: 2
    49.           # vis_encode_type:  simple
    50.           # memory: None
    51.           #goal_conditioning_type: hyper
    Unity ML-Agents output:

    upload_2021-5-1_17-42-11.png
     
  4. TulioMMo

    TulioMMo

    Joined:
    Dec 30, 2020
    Posts:
    29
    I reinstalled everything (now with unity environment package version 2.0), but the issue persists... Not only that, but changing the number of hidden_units/num_layers for the #2 network is changing my training results, which doesn't make sense since I am only using extrinsic rewards...
     
  5. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    The network settings #2 will appear on the terminal at the start of training but will not be used. Changes to these settings should not impact the training performance. Are you sure that what you are seeing is not just stochasticity in the training?
     
  6. TulioMMo

    TulioMMo

    Joined:
    Dec 30, 2020
    Posts:
    29
    My mistake! Just ran the model again and indeed changing settings for the #2 network is not affecting training performance.