Resolved PPO On-Policy or Off-policy

mateolopezareal · Jun 15, 2020

I have read the book: "Learn Unity ML-Agents – Fundamentals of Unity Machine Learning: Incorporate new powerful ML algorithms such as Deep Reinforcement Learning for games" and here it says that PPO is off-policy. However, in this link:
https://stats.stackexchange.com/que...PO are both,far from the underlying objective.
They say that is on-policy.
Someone can help me? Why is on or why is off policy?

vincentgao88 · Jun 15, 2020

mateolopezareal said: ↑

I have read the book: "Learn Unity ML-Agents – Fundamentals of Unity Machine Learning: Incorporate new powerful ML algorithms such as Deep Reinforcement Learning for games" and here it says that PPO is off-policy. However, in this link:
https://stats.stackexchange.com/questions/427140/is-proximal-policy-optimization-ppo-an-on-policy-reinforcement-learning-algori#:~:text=TRPO and PPO are both,far from the underlying objective.
They say that is on-policy.
Someone can help me? Why is on or why is off policy?
Click to expand...

PPO is a on-policy algorithm, you can learn about it more by looking at this paper. https://arxiv.org/abs/1707.06347

TulioMMo · Dec 30, 2020

Does the "actor" network for the agent in PPO shares its weights with the "critic" network? When estimating the baseline for the advantage function? Or does Unity generates a copy of the neural network with independent weights for actor and critic? Many thanks!

ervteng_unity · Jan 5, 2021

In the latest version, ML-Agents generates a copy of the network with independent weights for actor and critic. In some older versions, they were shares but only for discrete action networks.

TulioMMo · Jan 8, 2021

Thank you for the reply!

Search Unity

Resolved PPO On-Policy or Off-policy

mateolopezareal

vincentgao88

Unity Technologies

TulioMMo

ervteng_unity

Unity Technologies

TulioMMo

Search Unity

Unity ID

Useful Searches

Resolved PPO On-Policy or Off-policy

mateolopezareal

vincentgao88

Unity Technologies

TulioMMo

ervteng_unity

Unity Technologies

TulioMMo