Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Beta

Discussion in 'ML-Agents' started by mateolopezareal, Jul 7, 2020.

  1. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    54
    Hello,
    I am training some agents giving them complex observations and more simple ones. When I train the complex one I have to give them a big beta, 0.2, but when I train them with more simple observations with the same beta, the entropy drops more quickly. Why is this? Is it because the observation space is smaller so when it runs all the possibilities the entropy drops by a lot?
    I am quite confuse with beta, and I have already read https://github.com/Unity-Technologi...iguration-File.md#ppo-specific-configurations and also some articles about PPO.
     
  2. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    32
    I am quite sure that the agent needs to explore more with the complex observations because there are more possibilities, so it takes longer to find an optimal solution.
    I talk briefly about Beta in my latest video. Might help you.
     
  3. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    Can you clarify the difference between complex and simple observation spaces?

    Beta controls how much the agent will favor random actions in proportion to reward. With the 'simple' observations, it's possible the agent can discover highly rewarding states more easily so it becomes less random in favor of visiting those highly valuable states. However, this is just speculation and I cannot be sure without knowing what the actual observation spaces are.
     
  4. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    54
    The simple one are the surronidngs of the agent, this means the color of 9 GameObjects passing them as hot obervations (with four possible colours).
    The complex one is all the color of the 144 GameObjects passing them as hot obervations (with four possible colours).
     
  5. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    Ah. It might be my explanation then, that its easier for the agent to associate reward with states in the simpler case so that the policy becomes less random faster.