Search Unity

  1. Looking for a job or to hire someone for a project? Check out the re-opened job forums.
    Dismiss Notice
  2. Unity 2020 LTS & Unity 2021.1 have been released.
    Dismiss Notice
  3. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

ML-Agents: How to implement it?

Discussion in 'ML-Agents' started by Demo_101, Apr 3, 2021.

  1. Demo_101

    Demo_101

    Joined:
    Jan 26, 2020
    Posts:
    7
    Hey, this is my first ML-Agents project, I do have some questions, regarding the implementation.
    • If the agents makes a decision, which is impossible (f.e. jumping with no stamina), how do I tell the agent, that it's impossible? Do I reward the agent with -1 or just leave it at all?
    • Assume a unity scene can have 5-20 enemies and the agent should attack one of them. What branch size do I choose? Do I choose a branch size of 20 and mask 15 out (if there are only 5 enemies) or is there a way of changing the branch size in runtime?
    • If the agent makes a decision, he can be rewarded. What if the reward can only be assigned 5 seconds after taking this decision? The agent may have taken other steps in the meantime, how does he allocate which reward belongs to which action?
    Thank you very much for taking the time to read.
     
  2. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    77
    > If the agents makes a decision, which is impossible (f.e. jumping with no stamina), how do I tell the agent, that it's impossible? Do I reward the agent with -1 or just leave it at all?

    For discrete actions you can set the action_mask to prevent the agent from taking invalid action. Please see this doc.

    > Assume a unity scene can have 5-20 enemies and the agent should attack one of them. What branch size do I choose? Do I choose a branch size of 20 and mask 15 out (if there are only 5 enemies) or is there a way of changing the branch size in runtime?

    The action spec defines all the actions you can possibly do in the game and shouldn't change throughout the game. If an action is not valid at a certain step you should mask it using action_mask. Branch size defines all possible values for a certain action. So if what you want is "choose one between 20" then yes you should use branch size 20.

    > If the agent makes a decision, he can be rewarded. What if the reward can only be assigned 5 seconds after taking this decision? The agent may have taken other steps in the meantime, how does he allocate which reward belongs to which action?

    Delayed reward is very common in reinforcement learning. While the reward is not directly linked to that particular action, the agent should be able to learn that "taking this action in a certain state will lead me to a next state which has higher chance to get a reward" and therefore learn to take the right actions.
     
    Demo_101 likes this.
unityunity