Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Proximal Policy Optimisation - visualization with Hyperparameter and Tensorboard

Discussion in 'ML-Agents' started by Schnubbi, Apr 29, 2020.

  1. Schnubbi


    Apr 4, 2018
    Hello everybody,

    to visualize the context from hyperparameters to tensorboard output for PPO in ML-Agents, I created a graphic and need some feedback. Do you see a failures, especially with tensorboard?

    Here in short words is the explanation,

    First, the neural network creates a new policy based on samples from the buffer and estimates a new value function. It can be decided how much of the old policy(beta) and the old value estimation(lambda) should be included in the new one. Then the new policy is applied in the unity-environment. When the time_horizon is reached, the results will evaluated and delivered to the neural network. The policy ratio(policy_old/policy_new) and the advantage Â(Q-V) are calculated. If the advantage of the new policy is too large, the policy is cut off. The neural network starts to calculate a new policy on the basis of new data. The new clipped policy is then used for comparison with a newly calculated policy.

    Best regards

    Markus View attachment 609727

    I edit this post with an updated graphic

    Attached Files:

    Last edited: May 4, 2020