Search Unity

  1. Unity 2020.1 has been released.
    Dismiss Notice
  2. We are looking for feedback on the experimental Unity Safe Mode which is aiming to help you resolve compilation errors faster during project startup.
    Dismiss Notice
  3. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

Help Wanted Meta Learning and RL-Trainer Implementation

Discussion in 'ML-Agents' started by seboz123, Jul 27, 2020.

  1. seboz123

    seboz123

    Joined:
    Mar 7, 2020
    Posts:
    18
    Hi,

    so I am currently using ml-agents for my thesis and trying to implement MAML and Reptile meta-learning algorithms. See: https://arxiv.org/pdf/1703.03400.pdf and https://openai.com/blog/reptile/
    For that I want to get the weights of the NNs (and possibly the calculated gradients) and update them in a meta learning way. I am using the release-3 version.
    Where would be the best place to implement such an meta-learning algorithm? Do you have any experience with it?

    For now, I am trying to access the weights in the RLTrainer.py class.
    There I am calling
    Code (Python):
    1. weights_before_update = deepcopy(self.get_policy(self.brain_name).get_weights())
    before:
    Code (Python):
    1. with hierarchical_timer("_update_policy"):
    2.     if self._update_policy():
    3.         for q in self.policy_queues:
    4.             # Get policies that correspond to the policy queue in question
    5.             q.put(self.get_policy(q.behavior_id))
    and after the update I want to set the new weights to the old inital weights with:
    Code (Python):
    1. self.get_policy(self.brain_name).init_load_weights()
    2. self.get_policy(self.brain_name).load_weights(weights_before_update)
    Unfortunately I am getting the following error when trying to set the weights to the old weights:
    Code (Python):
    1.  
    2. Exception in thread Thread-2:
    3. Traceback (most recent call last):
    4.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
    5.     return fn(*args)
    6.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
    7.     target_list, run_metadata)
    8.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
    9.     run_metadata)
    10. tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
    11.   (0) Invalid argument: You must feed a value for placeholder tensor 'Placeholder_61' with dtype float and shape [128]
    12.     [[{{node Placeholder_61}}]]
    13.     [[Assign_90/_677]]
    14.   (1) Invalid argument: You must feed a value for placeholder tensor 'Placeholder_61' with dtype float and shape [128]
    15.     [[{{node Placeholder_61}}]]
    16. 0 successful operations.
    17. 0 derived errors ignored.
    18.  
    19. During handling of the above exception, another exception occurred:
    20.  
    21. Traceback (most recent call last):
    22.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\threading.py", line 926, in _bootstrap_inner
    23.     self.run()
    24.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\threading.py", line 870, in run
    25.     self._target(*self._args, **self._kwargs)
    26.   File "c:\users\\desktop\rlunity\ml-agents-release_3\ml-agents\mlagents\trainers\trainer_controller.py", line 340, in trainer_update_func
    27.     trainer.advance()
    28.   File "c:\users\\desktop\rlunity\ml-agents-release_3\ml-agents\mlagents\trainers\trainer\rl_trainer.py", line 211, in advance
    29.     self.get_policy(self.brain_name).load_weights(weights_before_update)
    30.   File "c:\users\\desktop\rlunity\ml-agents-release_3\ml-agents\mlagents\trainers\policy\tf_policy.py", line 235, in load_weights
    31.     self.sess.run(self.assign_ops, feed_dict=feed_dict)
    32.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
    33.     run_metadata_ptr)
    34.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
    35.     feed_dict_tensor, options, run_metadata)
    36.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
    37.     run_metadata)
    38.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
    39.     raise type(e)(node_def, op, message)
    40. tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
    41.   (0) Invalid argument: You must feed a value for placeholder tensor 'Placeholder_61' with dtype float and shape [128]
    42.     [[node Placeholder_61 (defined at C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]]
    43.     [[Assign_90/_677]]
    44.   (1) Invalid argument: You must feed a value for placeholder tensor 'Placeholder_61' with dtype float and shape [128]
    45.     [[node Placeholder_61 (defined at C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]]
    46. 0 successful operations.
    47. 0 derived errors ignored.
    48.  
    49. Original stack trace for 'Placeholder_61':
    50.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\threading.py", line 890, in _bootstrap
    51.     self._bootstrap_inner()
    52.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\threading.py", line 926, in _bootstrap_inner
    53.     self.run()
    54.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\threading.py", line 870, in run
    55.     self._target(*self._args, **self._kwargs)
    56.   File "c:\users\\desktop\rlunity\ml-agents-release_3\ml-agents\mlagents\trainers\trainer_controller.py", line 340, in trainer_update_func
    57.     trainer.advance()
    58.   File "c:\users\\desktop\rlunity\ml-agents-release_3\ml-agents\mlagents\trainers\trainer\rl_trainer.py", line 210, in advance
    59.     self.get_policy(self.brain_name).init_load_weights()
    60.   File "c:\users\\desktop\rlunity\ml-agents-release_3\ml-agents\mlagents\trainers\policy\tf_policy.py", line 222, in init_load_weights
    61.     assign_ph = tf.placeholder(var.dtype, shape=value.shape)
    62.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\ops\array_ops.py", line 2630, in placeholder
    63.     return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
    64.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\ops\gen_array_ops.py", line 8041, in placeholder
    65.     "Placeholder", dtype=dtype, shape=shape, name=name)
    66.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 793, in _apply_op_helper
    67.     op_def=op_def)
    68.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
    69.     return func(*args, **kwargs)
    70.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3360, in create_op
    71.     attrs, op_def, compute_device)
    72.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3429, in _create_op_internal
    73.     op_def=op_def)
    74.   File "C:\Users\\.conda\envs\tf_2_torch_mlagents_3\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1751, in __init__
    75.     self._traceback = tf_stack.extract_stack()
    Why is updating the policy changing the weights so I cannot set them anymore? Do you have some ideas?
     
  2. awjuliani

    awjuliani

    Unity Technologies

    Joined:
    Mar 1, 2017
    Posts:
    37
    Hello. Unfortunately we do not provide help with modifying the trainer in this way. I would actually recommend trying out a pre-existing implementation of MAML, using our gym wrapper. This is likely the most straightforward way of getting this working with a Unity environment. Is there a reason you wanted to work directly with the ML-Agents trainers codebase?
     
  3. seboz123

    seboz123

    Joined:
    Mar 7, 2020
    Posts:
    18
    I was trying to use the ml-agents trainers because of their ability to parallelize - I can run multiple agents and multiple instances of the executable in contrast to the gym wrapper. Also ml-agents trainers allow the use of multiple agents behaviours.
    I already have a working example of DQN-Rainbow with the low-level python api. Probably I am going to implement other algorithms then with the python api.
    Thank you for your ideas!
     
unityunity