Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Why does one team always dominate

Discussion in 'ML-Agents' started by csmehmetyazici99, Jan 10, 2022.

  1. csmehmetyazici99

    csmehmetyazici99

    Joined:
    Feb 1, 2021
    Posts:
    4
    I have created a capture the flag environment with fairly simple rewards. Getting the enemies flag to your base gives you 1 reward plus a time reward that can go up to 0.4, and gives the enemy -0.5 reward for losing and the environment starts again. Somewhere near the 40 minute mark the red team has won somewhere near 800 times and the blue team has won somewhere near the 500s. After 2 hours I see that the red team has won 80000 times and blues have won only 800. I was wondering what causes this, is my game too simple? Why isn't blue competitive with red even though it won 800 times ?

    I am a complete beginner and I would like any kind of help, I feel like I could be misunderstanding something real simple.
    PS. I have tried numerous yaml files and I have tried a lot of variations but mostly taken them from credible sources making similar games.
     
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Sounds like your agents' observations might not be fully symmetrical. Maybe they are observing positions / velocities, but you didn't invert the x/z vectors for one side?
     
    csmehmetyazici99 likes this.
  3. csmehmetyazici99

    csmehmetyazici99

    Joined:
    Feb 1, 2021
    Posts:
    4
    Thank you for your answer, I wasn't aware that I needed to do this, and after some digging I haven't been able to find a source that does this, Is this definitely necessary ?

    Below are my Observations and Actions I would really appreciate it if you could take a look at them and tell me if you see anything wrong with them.


    public override void CollectObservations(VectorSensor sensor)
    {
    sensor.AddObservation(transform.localPosition);
    sensor.AddObservation(BlueFlagTransform.localPosition);
    sensor.AddObservation(RedFlagTransform.localPosition);
    Vector3 dirToFlag = ((isBlue ? RedFlagTransform.localPosition : BlueFlagTransform.localPosition) - transform.localPosition).normalized;
    sensor.AddObservation(dirToFlag) ;
    sensor.AddObservation(rb.velocity);
    sensor.AddObservation(frozen);
    }



    public override void OnActionReceived(ActionBuffers actions)
    {
    //base.OnActionReceived(actions);
    float moveX = actions.ContinuousActions[0];
    float moveY = actions.ContinuousActions[1];
    float rotation = actions.ContinuousActions[2];
    if (frozen)
    {
    moveX = 0;
    moveY = 0;
    }

    if (transform.localPosition.y >= 5f && moveY > 0f)
    {
    moveY = 0f;
    AddReward(-0.00002f);
    }
    if (transform.localPosition.y <= -5f && moveY < 0f)
    {
    moveY = 0f;
    AddReward(-0.00002f);
    }
    if (transform.localPosition.x >= 8.7f && moveX > 0f)
    {
    moveX = 0f;
    AddReward(-0.00002f);
    }
    if (transform.localPosition.x <= -8.7f && moveX < 0f)
    {
    moveX = 0f;
    AddReward(-0.00002f);
    }


    transform.Rotate(Vector3.forward,rotation * Time.deltaTime * rotateSpeed);
    transform.localPosition += new Vector3(moveX, moveY, 0) * Time.deltaTime * moveSpeed;

    }
     
  4. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Let's say you have two agent instances on opposite sides of some playing field, facing each other. Perhaps tennis players, each right in the center of their half of the court. With the observations above, these agents would actually observe different values, although they are in equivalent positions as far as gameplay is concerned.
    For self-play, you'll need to make sure observations are identical if agents are in identical situations. Use positions and directions relative to the agent where possible, for instance agent_transform.InverseTransformVector(rb.velocity) instead of rb.velocity.
     
  5. csmehmetyazici99

    csmehmetyazici99

    Joined:
    Feb 1, 2021
    Posts:
    4
    Thank you for the quick replies. Yes that makes sense. My agents play in a symmetric field but can go into each others' fields and rotate aswell. I have changed my observations to the ones below.

    sensor.AddObservation(!isBlue ? transform.localPosition : transform.InverseTransformPoint(transform.localPosition));
    sensor.AddObservation(!isBlue ? transform.forward : transform.InverseTransformVector(transform.forward));
    sensor.AddObservation(Vector3.Distance(otherAgents[0].transform.localPosition, transform.localPosition));
    sensor.AddObservation( !isBlue ? (otherAgents[0].transform.localPosition - transform.localPosition).normalized : transform.InverseTransformDirection((otherAgents[0].transform.localPosition - transform.localPosition).normalized));
    sensor.AddObservation(othersAgentScript.carryingFlag);
    sensor.AddObservation(carryingFlag);

    if (isBlue)
    {
    sensor.AddObservation(Vector3.Distance(RedFlagTransform.transform.localPosition, transform.localPosition));
    sensor.AddObservation(transform.InverseTransformDirection((RedFlagTransform.transform.localPosition - transform.localPosition).normalized));
    }
    else
    {
    sensor.AddObservation(Vector3.Distance(BlueFlagTransform.transform.localPosition, transform.localPosition));
    sensor.AddObservation((BlueFlagTransform.transform.localPosition - transform.localPosition).normalized);
    }
    sensor.AddObservation(frozen);


    Do I have to change my movement functions accordingly? And do you think of any other reason why my trains could be failing?
     
  6. ChillX

    ChillX

    Joined:
    Jun 16, 2016
    Posts:
    145
    Code (CSharp):
    1.  
    2. private Vector3 TranslateRelativeToSelfOnAllAxis(Vector3 Movement)
    3. {
    4.     Vector3 TranslatedMovement;
    5.     TranslatedMovement = transform.rotation * Movement;
    6.     return TranslatedMovement;
    7. }
    8. private Vector3 TranslateRelativeToSelfOnYAxis(Vector3 Movement)
    9. {
    10.     Vector3 TranslatedMovement;
    11.     Vector3 PivotDirection;
    12.     PivotDirection = new Vector3(0f, transform.rotation.eulerAngles.y, 0f);
    13.     TranslatedMovement = Quaternion.Euler(PivotDirection) * Movement;
    14.     return TranslatedMovement;
    15. }
    16. // If using discrete actions Left, Right, Up, Down
    17. private Vector3 TranslateRelativeToSelfOnYAxisSnappedTo90Degrees(Vector3 Movement)
    18. {
    19.     Vector3 TranslatedMovement;
    20.     Vector3 PivotDirection;
    21.     PivotDirection = new Vector3(0f, Mathf.Round(transform.rotation.eulerAngles.y / 90f) * 90f, 0f);
    22.     TranslatedMovement = Quaternion.Euler(PivotDirection) * Movement;
    23.     return TranslatedMovement;
    24. }
    25.  
     
  7. ChillX

    ChillX

    Joined:
    Jun 16, 2016
    Posts:
    145
    With asymmetric simulations one team will usually dominate and suffocate the other teams ability to even start learning.

    To solve this what I do is I run the training for both agents as normal. But as soon as one agent starts winning 75% of the games I then do a new run --initialize-from previous run with training disabled for the winning agent.

    The way to do this is in the second run to set max_steps for the winning agent to a really small number. I use the buffer size. But leave the losing agent with a large number of max steps.

    Example First Run:
    WinningAgent:
    buffer_size: 10000
    max_steps: 1234000
    LosingAgent:
    buffer_size: 10000
    max_steps: 1234000

    Second Run:
    WinningAgent:
    buffer_size: 10000
    max_steps: 10000
    LosingAgent:
    buffer_size: 10000
    max_steps: 1234000

    Note: I used 1234000 for clarity to show a bigger number. Max Steps should be a number big enough to sufficiently train the agent.

    In the second run after 10,000 steps the winning agent switches to inference and ML Agents will report "Not Training" for that agent.

    Rinse and repeat switching the training between sides.

    However I generally find that on the second run I usually train the otherwise losing agent to its max performance (where it cannot improve any more). After that I train both agents simultaneously without one drowning out the other until their performance improvement flatlines.
     
  8. ChillX

    ChillX

    Joined:
    Jun 16, 2016
    Posts:
    145
    Another note. The curiosity module is very helpful for asymmetric adversarial games or simulations

    In this experiment without curiosity the blue line agent has to solve a puzzle while the red line agent has to prevent that. Without curiosity the blue agent learns howto solve the puzzle and then dominates. Eventually the red agent loses all learning and just sits there doing nothing useful and getting dumber and dumber with each step.

    However with curiosity module enabled this does not happen.
    Here the blue agent learns howto win but then the red agent learns howto counter.
    Capture_GenAdv2.JPG

    Notice the huge leap at the end. I moved to a new training run to capture what happens in detail after that. --initialize-from previous run.

    500K steps later the two have converged again and red starts winning slightly more. Then within another 100K steps Blue learns howto counter.

    Capture_GenAdv1.JPG
    And the cycle repeats with the agents getting better and better.

    Note: Buffer size is set to 10240
    Episode max steps is 300
    Average episode length ranges between 150 to 250
    This means the buffer will contain at minimum 34 full games but on average between 40 to 70 games
    Note: I'm using a Batch Size of 512 even though the agents are using Discrete actions. Only way to determine the best batch size is to test as it differs based on simulation dynamics.