Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

PPO: What is the reward I see in Tensorboard?

Discussion in 'ML-Agents' started by fedetask, Apr 4, 2020.

  1. fedetask

    fedetask

    Joined:
    Jan 17, 2020
    Posts:
    7
    I cannot really understand the Cumulative Reward plot when training with PPO, especially when I have multiple agents.
    Does it plot the average cumulative reward of the N agents? Or does it plot the sum of the rewards of the agents? And when and how is it computed? Is it an average for some "validation" episodes? Or is it the "in training" reward?
     
  2. awjuliani

    awjuliani

    Unity Technologies

    Joined:
    Mar 1, 2017
    Posts:
    69
    Hello,

    It is the averaged episodic reward over all the agents. There are not separate validation episodes, and these are based on the same training episodes used to collect data to update the policy.

    Hopefully that clarifies everything for you.
     
  3. betike

    betike

    Joined:
    May 28, 2019
    Posts:
    18
    Hello,

    Is it possible to view the reward for each agent instead of all?

    Thanks
    B
     
  4. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    betike likes this.