Search Unity

Inconsistent training with v0.14

Discussion in 'ML-Agents' started by akTwelve, Feb 24, 2020.

  1. akTwelve

    akTwelve

    Joined:
    Oct 7, 2013
    Posts:
    8
    I'm updating a project I teach in my AI Flight Udemy course from version v0.11 to v0.14 and I'm confused by how inconsistent my training runs have been since upgrading everything. I had to account for the Academy singleton and RayPerception changes, but that's basically all that has changed from a project that trained very reliably.

    The training seems to be working, and then it will spontaneously flatline for up to an hour before picking back up. I made no changes between the two runs in the Tensorboard graph below. (each 5M steps took about 1 hour and 10 minutes, a reward of -1 means an airplane crashed immediately, and a reward above 20 means they flew through 40 checkpoints without crashing)

    Does anyone know what would cause these strange flatline dips? Thanks!!

     
  2. akTwelve

    akTwelve

    Joined:
    Oct 7, 2013
    Posts:
    8
    I turned off curiosity and did two more training runs which both worked great. So maybe the agents got curious about what would happen if they crashed into rocks for an hour...?
     
  3. jeffrey_unity538

    jeffrey_unity538

    Unity Technologies

    Joined:
    Feb 15, 2018
    Posts:
    59
    hmm, interesting... let me post in our internal thread
     
  4. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    That type of behavior with Curiosity actually makes sense. Are you seeing a big spike in the Curiosity reward around those plateaus? Crashes and failures tend to be very unpredictable (and result in all sorts of weird states) so the Curiosity module tends to find them extremely interesting.
     
  5. akTwelve

    akTwelve

    Joined:
    Oct 7, 2013
    Posts:
    8
    Here's the same graph with Curiosity Inverse Loss overlaid. Obviously the scale on the Y-axis is different. Not sure if this explains much, but your logic does.
     
  6. akTwelve

    akTwelve

    Joined:
    Oct 7, 2013
    Posts:
    8
    Ohhh, now I see it. The Curiosity Value Estimate graph was hidden in Tensorboard for some reason. Yeah, it totally spikes during the troughs of the Cumulative Reward graph.
     
    MarkTension likes this.