Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Cummulative Reward goes down

Discussion in 'ML-Agents' started by mateolopezareal, Jul 12, 2020.

  1. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    54
    Taking this image into account:
    Environment_Cumulative Reward (5).png

    Is there a problem with this training, because the final result that my agents have is the one that I was looking for. What does it mean that at the 3,5M steps the rewards starts to go down? Is this a problem? How can I explain this behaviour in my document?
    Here you have the entrpoy if someone need it. Is this because the reward that it recieves from the entropy?
    Policy_Entropy (5).png
     
  2. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    32
    Entropy looks good but from my understanding I would expect the reward to increase when the entropy goes down. Is there any randomness in your environment and how many steps does one episode last on average? Was the training resumed or does it already reach a reward of 220 during the first 250k steps?
     
  3. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    54
    Yes there is randomness in my environment,, but it always happen at the end that the reward goes down. Episode length 250. It was not resumed, it is beacuse the summary_freq is 250000, so it reaches the reward in that step
     
  4. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    If the final behavior matches your expectations, I wouldn't say this is a problem. Is it possible that your environment is evolving in some way over time that this reward curve makes sense?
     
  5. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    54
    Yep, but how can I explain that this was a good training if the graphs look bad. It seems not to be stable.
     
  6. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    32
    Maybe it's because of the lack of exploration at the end of training with an environment that still changes to unknown shapes? Are you using PPO or SAC? Setting the learning rate to constant instead of linear might prevent the reward decrease.
     
  7. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    54
    PPO
     
  8. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    32
    does it help setting it to constant?