Search Unity

Learning Rate, Epsilon and Policy Loss

Discussion in 'ML-Agents' started by unity_-DoCqyPS6-iU3A, Aug 8, 2020.

  1. unity_-DoCqyPS6-iU3A

    unity_-DoCqyPS6-iU3A

    Joined:
    Aug 18, 2018
    Posts:
    26
    Hello everyone,

    this is a question to those users, who've managed to build up intuition about reading tensorboard-graphs.
    Or maybe those that have collected data from enough different runs that they can look back at their results for different parameters.

    How are learning-rate, epsilon and policy loss related?

    I understand that "learning-rate" dictates how much the policy changes.
    "Epsilon" will cap that change, so it doesn't change more than epsilon allows.

    And I though that "policy loss" in tensorboard will give an indication on how much the policy changed.

    However, with the settings I tried (increase learning rate from 0.0003 to 0.003, and increase epsilon from 0.2 to 0.4) the "policy loss" will always stay at an average of 6E-3. It will oscillate a bit, and my 20M steps are probably not enogh to get the policy loss to drop significantly for my environment. But I would expect the "policy loss" to start/stay at a higher level with higher values for "learning rate" and "epsilon".

    So, at what point in my thoughts am I wrong?
     
  2. ReinierJ

    ReinierJ

    Joined:
    Jul 10, 2020
    Posts:
    10
    Hi,

    Your understanding of the learning rate and epsilon are correct!

    Policy loss is not a direct indication of how much the policy changed. Instead, the policy loss is the loss of the prediction of the best move.

    The agent tries to predict what the best action would be in the current situation. This is a policy. The policy loss measures by how much it was wrong. If your agent makes very bad predictions, then the policy loss will be high. If your agent is perfect and it always makes the perfect decision, then the policy loss will be 0.

    Similarly, we also have the value loss. The value loss is the loss in the prediction what the final outcome will be. So if the value loss is lower, then the agent makes better predictions about what the final outcome will be.