Search Unity

AI forgets that it learned, rollercoaster graph

Discussion in 'ML-Agents' started by hamik166, Sep 28, 2020.

  1. hamik166

    hamik166

    Joined:
    Oct 19, 2019
    Posts:
    31
    Hey, I'm tranning an AI to kick a ball as far as possible, whenever I train it it gets better and better for the first 15 minutes, then get worse, and starts a rollercoaster where it gets good and bad again graph!




    I'm using continous learning, with 15 inputs and 4 outputs. 16 agents at the same time, each with max_step of 1000.

    here is my config:

    behaviors:
    NAME(CENSORED):
    trainer_type: ppo
    hyperparameters:
    batch_size: 2000
    buffer_size: 56000
    learning_rate: 0.0003
    beta: 0.008
    epsilon: 0.2
    lambd: 0.95
    num_epoch: 3
    learning_rate_schedule: linear
    network_settings:
    normalize: false
    hidden_units: 96
    num_layers: 4
    vis_encode_type: simple
    memory:
    sequence_length: 20
    memory_size: 40
    reward_signals:
    extrinsic:
    gamma: 0.99
    strength: 1.0
    keep_checkpoints: 5
    max_steps: 20166166
    time_horizon: 1000
    summary_freq: 10000
    threaded: true



    What could cause this to happen? why would it suddenly become worse after it got so good?

    the reward it gets is for how far the ball is from where it began, it is normalized to 1, but will never get close to one unless the bot becomes a ronaldo.

    this is the result I get every time I run this.

    I suspected memory is to blame, but it's the worse without it (actually much MUCH worse because it needs to rememeber where the ball is when it can't see it)

    What could be the reason?
     
  2. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    It looks like your policy loss and value loss are not going down and the training is quite unstable. It might be the case that it's not become worse after it went well, but the reward are just pretty random. Without more detailed information it's hard to tell the cause of this, since there could be many possibilities of the things that went wrong. More details would helpful here, like how you set up your environment, the design of reward, etc.

    I would suggest you to first try to simplify your scene as much as possible to make sure your scene is properly designed for the agent to be able to learn the task. You should see a trend of losses going down with more training steps.
     
  3. hamik166

    hamik166

    Joined:
    Oct 19, 2019
    Posts:
    31
    It's the most simple thing ever, just a ball, the agent goes to the ball and kicks it, the reward is for how far the ball is from its original location.

    Meanwhile it got much much much much much better, and them worse again, and now it's learning again:



    I does learn, and it becomes quite good, but something makes it forget what it learned, the envairment, is static, the ball and the agent spawn at the same place everytime. and I made sure there are no bugs in the envairment.

    I feel like it has be something with my config, and what does the policy loss mean? I do see that it still unchanged, what does it mean? it because I don't give the agent punishments?
     
  4. TimHogklint

    TimHogklint

    Joined:
    Nov 24, 2017
    Posts:
    44
    I just want to chime in here too. I've seen this pattern while training a bipedal trying to keep upright while double supported and single supported while also controlling step height and step interval.

    The enviroments contain a perturbance platform which hits it with balls and a shifting platform. Im using articulations bodies. Im going to try upgrading to version 7 of mlagents.

    EDIT /

    In my case I think its the score policy as mentioned. Im going to investigate without having a policy for the steps ( stepheight and lift time ). I created it because I dident want agent to tap his feet or slide.
     
    Last edited: Oct 2, 2020