Search Unity

Question Training for adaptability

Discussion in 'ML-Agents' started by mcdenyer, Apr 15, 2021.

  1. mcdenyer

    mcdenyer

    Joined:
    Mar 4, 2014
    Posts:
    48
    I have been screwing around with ML-Agents on my 2D platformer for about a month with the goal of using agents to help in the level design process:
    -I'd like to make really hard levels and then use ML-Agents to test if they are even possible.

    My platformer involves some finesse mechanics where the player must swing from targets and such.
    I have had tons of success in training agents to learn the mechanics using imitation learning on a very simple level. Then once trained and transitioned from 100% Gail rewards to 100% extrinsic rewards on the simple level the agents are able to quickly learn how to complete more advanced levels.

    The problem I have had is getting the agents to figure out how to complete levels where the checkpoints are in different directions. Basically I can either get the agents to traverse the level to the Left OR the Right. I cannot seem to train them to tackle a level where they travel to the left at some points in the level and to the rights at other points.

    This issue leads me to my more general question: How do you train for adaptability of your agents? I seem to be able to train the core skills needed for the agents but cannot get them to combine them together.
     
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Do your basic levels and demos contain situations where checkpoints are on both sides? Agents should have an easier time generalizing their behaviour if they're introduced to moving left and right early on. Otherwise the policy might overfit to moving in one direction, making it harder to adapt later. I found that combining curriculum learning and environment randomization helps with adaptability. So that agents have to deal with incrementally more difficult, but not entirely different situations.
     
  3. mcdenyer

    mcdenyer

    Joined:
    Mar 4, 2014
    Posts:
    48
    -Thanks for the response mbaske and I have read many ML forum posts that you are a part of over the past month as I have been trying to learn ML-Agents.

    I am using lots of checkpoints yes. I trained them to clear an obstacle going to the left and and then an obstacle to the right. Would it then be better to use curriculum learning so that randomly the agent need to clear the obstacle to the left OR the obstacle to the right? Does the model essentially get dumber the more times it is in the same environment because it can only learn the environment EVEN if the environment becomes more and more complicated and introduces new problems that it learns to solve?
    My concern is that it takes a while to train the agent to learn a swing to a single direction and that if I make them learn the swing to the left and right at the same time they may never get it?

    When it comes to curriculum learning is it alright to completely change the environment instead of something simple like randomizing the position of a certain object?

    To teach the agents to 'swing' I had to use imitation learning for the left and the right. First I used Gail with demos only showing me swinging to the left and once they had that down I then switched to demos of me swinging to the right. If I randomize the direction of the environment do I need to have demonstrations of myself competing the levels randomly? As in, as I play each episode I randomly am given a Left or Right level or can I train 20 Episodes of going right in one demonstration and 20 episodes of me going right in another episode.
     
  4. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Yes, I think it would be better to randomize left and right obstacles early on, trying to prevent the agent from getting stuck in a preferred direction. I wouldn't say it get dumber, just that the agent is more likely to vary its actions around what it has already learnt, when it is confronted with different observations.
    If possible, I would try changing the environment incrementally. Like gradually increasing randomization ranges for spawning obstacles etc. If you can vary the swinging difficulty, then start with the simplest scenario and make it harder as the agent gets better.
    Not 100% sure about this. My guess is that the order shouldn't matter, as long as all relevant behaviour is included in the demonstration.
     
  5. mcdenyer

    mcdenyer

    Joined:
    Mar 4, 2014
    Posts:
    48
    What about how the checkpoints are set up for the varying enviornments. Do I need to keep the net reward( the most reward per episode) consistent? For example as the enviornments get harder I need to use more checkpoints...should I decrease the reward per checkpoint so the sum of all checkpoints cleared remains the same?
    Again appreciate the feedback :D