Search Unity

Are ML-agents on policy?

Discussion in 'ML-Agents' started by smartriver97, Jun 9, 2020.

  1. smartriver97

    smartriver97

    Joined:
    Jun 3, 2020
    Posts:
    11
    In the code,I see the code and note on rl_rainer.py on 133 row

    with hierarchical_timer("process_trajectory"):
    for traj_queue in self.trajectory_queues:
    # We grab at most the maximum length of the queue.
    # This ensures that even if the queue is being filled faster than it is
    # being emptied, the trajectories in the queue are on-policy.
    _queried = False
    for _ in range(traj_queue.qsize()):
    _queried = True
    try:
    t = traj_queue.get_nowait()
    self._process_trajectory(t)
    except AgentManagerQueue.Empty:
    break

    I can't understand why the queue is on policy. The later trajectory when the policy hasn't been update seems to use the old policy.
     
    Last edited: Jun 9, 2020
  2. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    'on-policy' is a reinforcement learning technical term that means a policy update should only be computed with trajectories sampled from that policy.

    Does this answer your question or are you concerned with the implementation?
     
  3. smartriver97

    smartriver97

    Joined:
    Jun 3, 2020
    Posts:
    11
    I am concerned with the implementation. Because I don't know how to keep the on policy with the Queue. GA3C and the IMPALA uses the Queue to comunicate between the trainer and the actor are all off-policy.
    Does MLagent stop the step and use all trajecroy even the experience step nums dont reach the max to train the policy?