Search Unity

ml-agents, team self play, handling dead agents during episode.

Discussion in 'ML-Agents' started by m4l4, Jul 28, 2020.

  1. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    Hi everyone, i've been experimenting with the self play concept and the idea of a cooperative/competitive environment.
    First version of the env is basically a team based "food collector" (the one in the ml-agents examples).

    2 teams, 3 agents each.
    agents health decrease with time, there's food all over the ground, last team standing wins the match.
    Agents can shoot lasers to freeze other agents in place.

    Reward is really simple, agent gets -1 if health drop to zero, each agent of the winning team gets a +1.
    (probably it's a good idea to also give a -1 to the loosing team).

    My question is about the agents with 0 health. How should i handle them while the episode is still in progress?
    right now, i just use gameObject.SetActive(false), and i reactivate them OnEpisodeBegin().

    How does, being deactivated, affects the training of the agents? How does it perceives what happen when inactive? How can it interprete being deactivated for most of the match, and then getting rewarded at the end of the episode? i don't even know if it can get AddReward() while in that state.

    Should i leave them on the ground, change tag to "deadAgent", add an isDead bool to the observation space and just ignore their outputs while isDead (as i do for isFrozen)?

    put them in a cage till the end of the match :D ??? like hockey penalty box :D
     
  2. awjuliani

    awjuliani

    Unity Technologies

    Joined:
    Mar 1, 2017
    Posts:
    69
    Hello. If you deactivate an agent, then it no longer sends or receives observations, actions, and rewards. As long as you punish the dead agent before deactivating it, the reward should be received. The issue with keeping the agent around is that it might learn the wrong relationships between the observations and actions.
     
  3. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    that's what i was thinking. Punishing the agent before deactivation obviously it's not an issue (as for reactivating it before final reward).
    but the problems remain. How an agent "interprete" being deactivated?
    any suggestion on how to handle them properly?
     
  4. Hsgngr

    Hsgngr

    Joined:
    Dec 28, 2015
    Posts:
    61
    If you put an enum state for being dead and alive as an observation (or boolean) that will solve the problem. Just freeze the agent so it doesnt go anywhere, it can continue to train but it will know that its dead. Therefore it will seperate the states between being alive and dead.
     
  5. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    Thanks, that seems like good compromise.
    By "freezing" the agent you mean just ignore its outputs?

    OnActionReceived()
    {
    if(!alive){
    return;
    }

    }
     
  6. Hsgngr

    Hsgngr

    Joined:
    Dec 28, 2015
    Posts:
    61
    yes this and deactivate the gameobject colliders since you dont want any collision in the environment.
     
    m4l4 likes this.