Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Question Is there any way to train models iteratively one-by-one?

Discussion in 'ML-Agents' started by xileFlex, Feb 10, 2023.

  1. xileFlex


    Oct 27, 2022
    USING ML-AGENTS(2.0.1)

    I am currently working on a project including two agents that work adversarially with each other. One agent works on generating the level while the other is trying to solve the level.

    Currently when I run a new training session, both models train in sync but the only one that learns and completes episodes is the solver because it's constantly trying new things whereas the level generator only completes an episode if the solver has progressed. So it takes a long time for the solver to train enough to progress and therefore the generator doesn't really get to do much. (The cumulative reward also get messed up for the generator as it doesn't have much chance to complete episodes).

    Therefore I would like to find a way to pause the training for the generator until the solver manages to progress and once the solver progresses, resume training for the generator and then pause it again once more of the level has been made.

    Is there any established way of doing this type of pause/resume training where I freeze one model and then unfreeze when I need it?

    I have seen that people do SetActive(true) or SetActive(false) for preventing a model from running which I haven't tried yet however I'm not sure if that restarts the training or hinders it in any way.

    Another solution I could think of is heavily increasing the reward for the model that doesn't complete many episodes and possibly reducing the gamma hyperparameter for more impactful responses and focus on short term reward?

    I would love to hear anyones thoughts on this problem and if anyone has some innovative solutions that have gone over my head.

    (Both models are running on default hyperparameters at the moment and I have only worked on reward scaping so far).