Search Unity

Bug Training slows to a crawl after reaching team_change value

Discussion in 'ML-Agents' started by ClemCa, Sep 24, 2022.

  1. ClemCa

    ClemCa

    Joined:
    Mar 9, 2021
    Posts:
    1
    Hi,
    I could not find any reference to that issue, and I've checked and rechecked my project for a few hours now.
    I already confirmed my config file is bog-standard. I call manually an agent using RequestDecision(), and both CollectObservations and OnActionReceived, are correctly triggered.

    The issue seems specific to self-play. Besides changing the team ID, both agents have the same script, behavior name, and settings. RequestDecision is called once every 0.1s, alternating between both agents.

    Everything seems to be fine until it reaches the team_change value. Once it reached it, the training stopped. Stopping and resuming the training seems to increase the number of steps by a few tens of steps before behaving the same way, so I couldn't tell whether it actually stopped or slowed down close to infinitely slow.

    While this happens, Unity keeps collecting observations and receiving actions at the same pace without any sort of sign anything changed other than the lack of progress.
    The console doesn't log anything too, and continues to use the same exact amount of computing resources.
    The only difference is the lack of progress summary (as the steps don't change).

    I confirmed the team_change value was the value in question by reducing it to a low number and observing the same behavior.

    If anyone knows any step I missed while implementing self-play or already encountered a similar issue, please do help me.
     
    Menion-Leah likes this.
  2. Menion-Leah

    Menion-Leah

    Joined:
    Nov 5, 2014
    Posts:
    189
    Facing the same issue here: upon hitting the
    team_change
    steps number (no matter if it is 1000 or 50000), the step count stops increasing, and after a short while the mlagents-learn command fails with the
    mlagents_envs.exception.UnityTimeOutException
    .

    I've also noticed that when performing the swap (so every
    swap_steps
    steps), if the environment randomly selects the current policy instead of a previous one, as a Ghost Trainer, the ELO totally flattens for the entirety of the swap, despite the rewards mean/std behaving normally.
     
    Last edited: Aug 7, 2023