Search Unity

Resolved PPO On-Policy or Off-policy

Discussion in 'ML-Agents' started by mateolopezareal, Jun 15, 2020.

  1. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    54
  2. vincentgao88

    vincentgao88

    Unity Technologies

    Joined:
    Feb 7, 2018
    Posts:
    21
    PPO is a on-policy algorithm, you can learn about it more by looking at this paper. https://arxiv.org/abs/1707.06347
     
    mateolopezareal likes this.
  3. TulioMMo

    TulioMMo

    Joined:
    Dec 30, 2020
    Posts:
    29
    Does the "actor" network for the agent in PPO shares its weights with the "critic" network? When estimating the baseline for the advantage function? Or does Unity generates a copy of the neural network with independent weights for actor and critic? Many thanks!
     
  4. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    In the latest version, ML-Agents generates a copy of the network with independent weights for actor and critic. In some older versions, they were shares but only for discrete action networks.
     
    TulioMMo likes this.
  5. TulioMMo

    TulioMMo

    Joined:
    Dec 30, 2020
    Posts:
    29
    Thank you for the reply!