Search Unity

ML-Agents: How to implement it?

Discussion in 'ML-Agents' started by Demo_101, Apr 3, 2021.

  1. Demo_101

    Demo_101

    Joined:
    Jan 26, 2020
    Posts:
    8
    Hey, this is my first ML-Agents project, I do have some questions, regarding the implementation.
    • If the agents makes a decision, which is impossible (f.e. jumping with no stamina), how do I tell the agent, that it's impossible? Do I reward the agent with -1 or just leave it at all?
    • Assume a unity scene can have 5-20 enemies and the agent should attack one of them. What branch size do I choose? Do I choose a branch size of 20 and mask 15 out (if there are only 5 enemies) or is there a way of changing the branch size in runtime?
    • If the agent makes a decision, he can be rewarded. What if the reward can only be assigned 5 seconds after taking this decision? The agent may have taken other steps in the meantime, how does he allocate which reward belongs to which action?
    Thank you very much for taking the time to read.
     
  2. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    > If the agents makes a decision, which is impossible (f.e. jumping with no stamina), how do I tell the agent, that it's impossible? Do I reward the agent with -1 or just leave it at all?

    For discrete actions you can set the action_mask to prevent the agent from taking invalid action. Please see this doc.

    > Assume a unity scene can have 5-20 enemies and the agent should attack one of them. What branch size do I choose? Do I choose a branch size of 20 and mask 15 out (if there are only 5 enemies) or is there a way of changing the branch size in runtime?

    The action spec defines all the actions you can possibly do in the game and shouldn't change throughout the game. If an action is not valid at a certain step you should mask it using action_mask. Branch size defines all possible values for a certain action. So if what you want is "choose one between 20" then yes you should use branch size 20.

    > If the agent makes a decision, he can be rewarded. What if the reward can only be assigned 5 seconds after taking this decision? The agent may have taken other steps in the meantime, how does he allocate which reward belongs to which action?

    Delayed reward is very common in reinforcement learning. While the reward is not directly linked to that particular action, the agent should be able to learn that "taking this action in a certain state will lead me to a next state which has higher chance to get a reward" and therefore learn to take the right actions.
     
    Demo_101 likes this.
  3. Demo_101

    Demo_101

    Joined:
    Jan 26, 2020
    Posts:
    8
    Hey @ruoping_unity thank you very much for your answer, it really helped me out!

    But there is one thing I have not found out. For example I have 2 branches both delivering discrete values from 0 to 2 So values can be:
    0-0, 0-1, 0-2, 1-0, 1-1, 1-2, 2-0, 2-1, 2-2.

    How do I mask a combination of both branches? For example the combination of 1-0 should be masked out but 1-1 should work as well as 0-0.

    The only thing I can remember solving this, is creating many many more branches and use the branch when needed.
     
  4. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    The API design is assuming each branch is independent, so unfortunately there's no provided way to mask a combination of branches. If you really need to do that you would probably need to make each combination a separate branch.