Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Proximal Policy Optimisation - visualization with Hyperparameter and Tensorboard

Discussion in 'ML-Agents' started by markus-weiss, Apr 29, 2020.

  1. markus-weiss

    markus-weiss

    Joined:
    Apr 4, 2018
    Posts:
    1
    Hello everybody,

    to visualize the context from hyperparameters to tensorboard output for PPO in ML-Agents, I created a graphic and need some feedback. Do you see a failures, especially with tensorboard?

    Here in short words is the explanation,

    First, the neural network creates a new policy based on samples from the buffer and estimates a new value function. It can be decided how much of the old policy(beta) and the old value estimation(lambda) should be included in the new one. Then the new policy is applied in the unity-environment. When the time_horizon is reached, the results will evaluated and delivered to the neural network. The policy ratio(policy_old/policy_new) and the advantage Â(Q-V) are calculated. If the advantage of the new policy is too large, the policy is cut off. The neural network starts to calculate a new policy on the basis of new data. The new clipped policy is then used for comparison with a newly calculated policy.


    Best regards

    Markus View attachment 609727


    I edit this post with an updated graphic
     

    Attached Files:

    Last edited: May 4, 2020
    brokenm likes this.