Hello everyone, this is a question to those users, who've managed to build up intuition about reading tensorboard-graphs. Or maybe those that have collected data from enough different runs that they can look back at their results for different parameters. How are learning-rate, epsilon and policy loss related? I understand that "learning-rate" dictates how much the policy changes. "Epsilon" will cap that change, so it doesn't change more than epsilon allows. And I though that "policy loss" in tensorboard will give an indication on how much the policy changed. However, with the settings I tried (increase learning rate from 0.0003 to 0.003, and increase epsilon from 0.2 to 0.4) the "policy loss" will always stay at an average of 6E-3. It will oscillate a bit, and my 20M steps are probably not enogh to get the policy loss to drop significantly for my environment. But I would expect the "policy loss" to start/stay at a higher level with higher values for "learning rate" and "epsilon". So, at what point in my thoughts am I wrong?