Search Unity

  1. We are migrating the Unity Forums to Unity Discussions. On July 12, the Unity Forums will become read-only. On July 15, Unity Discussions will become read-only until July 18, when the new design and the migrated forum contents will go live. Read our full announcement for more information and let us know if you have any questions.

Are ML-agents on policy?

Discussion in 'ML-Agents' started by smartriver97, Jun 9, 2020.

  1. smartriver97

    smartriver97

    Joined:
    Jun 3, 2020
    Posts:
    11
    In the code,I see the code and note on rl_rainer.py on 133 row

    with hierarchical_timer("process_trajectory"):
    for traj_queue in self.trajectory_queues:
    # We grab at most the maximum length of the queue.
    # This ensures that even if the queue is being filled faster than it is
    # being emptied, the trajectories in the queue are on-policy.
    _queried = False
    for _ in range(traj_queue.qsize()):
    _queried = True
    try:
    t = traj_queue.get_nowait()
    self._process_trajectory(t)
    except AgentManagerQueue.Empty:
    break

    I can't understand why the queue is on policy. The later trajectory when the policy hasn't been update seems to use the old policy.
     
    Last edited: Jun 9, 2020
  2. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    'on-policy' is a reinforcement learning technical term that means a policy update should only be computed with trajectories sampled from that policy.

    Does this answer your question or are you concerned with the implementation?
     
  3. smartriver97

    smartriver97

    Joined:
    Jun 3, 2020
    Posts:
    11
    I am concerned with the implementation. Because I don't know how to keep the on policy with the Queue. GA3C and the IMPALA uses the Queue to comunicate between the trainer and the actor are all off-policy.
    Does MLagent stop the step and use all trajecroy even the experience step nums dont reach the max to train the policy?