Help Wanted Meta Learning and RL-Trainer Implementation

Discussion in 'ML-Agents' started by seboz123, Jul 27, 2020.

  1. seboz123


    Mar 7, 2020

    so I am currently using ml-agents for my thesis and trying to implement MAML and Reptile meta-learning algorithms. See: and
    For that I want to get the weights of the NNs (and possibly the calculated gradients) and update them in a meta learning way. I am using the release-3 version.
    Where would be the best place to implement such an meta-learning algorithm? Do you have any experience with it?

    For now, I am trying to access the weights in the class.
    There I am calling
    Code (Python):
    1. weights_before_update = deepcopy(self.get_policy(self.brain_name).get_weights())
    Code (Python):
    1. with hierarchical_timer("_update_policy"):
    2.     if self._update_policy():
    3.         for q in self.policy_queues:
    4.             # Get policies that correspond to the policy queue in question
    5.             q.put(self.get_policy(q.behavior_id))
    and after the update I want to set the new weights to the old inital weights with:
    Code (Python):
    1. self.get_policy(self.brain_name).init_load_weights()
    2. self.get_policy(self.brain_name).load_weights(weights_before_update)
    Unfortunately I am getting the following error when trying to set the weights to the old weights:
    Code (Python):
    2. Exception in thread Thread-2:
    3. Traceback (most recent call last):
    4.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\", line 1365, in _do_call
    5.     return fn(*args)
    6.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\", line 1350, in _run_fn
    7.     target_list, run_metadata)
    8.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\", line 1443, in _call_tf_sessionrun
    9.     run_metadata)
    10. tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
    11.   (0) Invalid argument: You must feed a value for placeholder tensor 'Placeholder_61' with dtype float and shape [128]
    12.     [[{{node Placeholder_61}}]]
    13.     [[Assign_90/_677]]
    14.   (1) Invalid argument: You must feed a value for placeholder tensor 'Placeholder_61' with dtype float and shape [128]
    15.     [[{{node Placeholder_61}}]]
    16. 0 successful operations.
    17. 0 derived errors ignored.
    19. During handling of the above exception, another exception occurred:
    21. Traceback (most recent call last):
    22.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\", line 926, in _bootstrap_inner
    24.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\", line 870, in run
    25.     self._target(*self._args, **self._kwargs)
    26.   File "c:\users\\desktop\rlunity\ml-agents-release_3\ml-agents\mlagents\trainers\", line 340, in trainer_update_func
    27.     trainer.advance()
    28.   File "c:\users\\desktop\rlunity\ml-agents-release_3\ml-agents\mlagents\trainers\trainer\", line 211, in advance
    29.     self.get_policy(self.brain_name).load_weights(weights_before_update)
    30.   File "c:\users\\desktop\rlunity\ml-agents-release_3\ml-agents\mlagents\trainers\policy\", line 235, in load_weights
    31., feed_dict=feed_dict)
    32.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\", line 956, in run
    33.     run_metadata_ptr)
    34.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\", line 1180, in _run
    35.     feed_dict_tensor, options, run_metadata)
    36.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\", line 1359, in _do_run
    37.     run_metadata)
    38.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\", line 1384, in _do_call
    39.     raise type(e)(node_def, op, message)
    40. tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
    41.   (0) Invalid argument: You must feed a value for placeholder tensor 'Placeholder_61' with dtype float and shape [128]
    42.     [[node Placeholder_61 (defined at C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\framework\ ]]
    43.     [[Assign_90/_677]]
    44.   (1) Invalid argument: You must feed a value for placeholder tensor 'Placeholder_61' with dtype float and shape [128]
    45.     [[node Placeholder_61 (defined at C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\framework\ ]]
    46. 0 successful operations.
    47. 0 derived errors ignored.
    49. Original stack trace for 'Placeholder_61':
    50.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\", line 890, in _bootstrap
    51.     self._bootstrap_inner()
    52.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\", line 926, in _bootstrap_inner
    54.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\", line 870, in run
    55.     self._target(*self._args, **self._kwargs)
    56.   File "c:\users\\desktop\rlunity\ml-agents-release_3\ml-agents\mlagents\trainers\", line 340, in trainer_update_func
    57.     trainer.advance()
    58.   File "c:\users\\desktop\rlunity\ml-agents-release_3\ml-agents\mlagents\trainers\trainer\", line 210, in advance
    59.     self.get_policy(self.brain_name).init_load_weights()
    60.   File "c:\users\\desktop\rlunity\ml-agents-release_3\ml-agents\mlagents\trainers\policy\", line 222, in init_load_weights
    61.     assign_ph = tf.placeholder(var.dtype, shape=value.shape)
    62.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\ops\", line 2630, in placeholder
    63.     return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
    64.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\ops\", line 8041, in placeholder
    65.     "Placeholder", dtype=dtype, shape=shape, name=name)
    66.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\framework\", line 793, in _apply_op_helper
    67.     op_def=op_def)
    68.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\util\", line 507, in new_func
    69.     return func(*args, **kwargs)
    70.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\framework\", line 3360, in create_op
    71.     attrs, op_def, compute_device)
    72.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\framework\", line 3429, in _create_op_internal
    73.     op_def=op_def)
    74.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\framework\", line 1751, in __init__
    75.     self._traceback = tf_stack.extract_stack()
    Why is updating the policy changing the weights so I cannot set them anymore? Do you have some ideas?
  2. awjuliani


    Unity Technologies

    Mar 1, 2017
    Hello. Unfortunately we do not provide help with modifying the trainer in this way. I would actually recommend trying out a pre-existing implementation of MAML, using our gym wrapper. This is likely the most straightforward way of getting this working with a Unity environment. Is there a reason you wanted to work directly with the ML-Agents trainers codebase?
  3. seboz123


    Mar 7, 2020
    I was trying to use the ml-agents trainers because of their ability to parallelize - I can run multiple agents and multiple instances of the executable in contrast to the gym wrapper. Also ml-agents trainers allow the use of multiple agents behaviours.
    I already have a working example of DQN-Rainbow with the low-level python api. Probably I am going to implement other algorithms then with the python api.
    Thank you for your ideas!