Inconsistent training with v0.14

akTwelve · Feb 24, 2020

I'm updating a project I teach in my AI Flight Udemy course from version v0.11 to v0.14 and I'm confused by how inconsistent my training runs have been since upgrading everything. I had to account for the Academy singleton and RayPerception changes, but that's basically all that has changed from a project that trained very reliably.

The training seems to be working, and then it will spontaneously flatline for up to an hour before picking back up. I made no changes between the two runs in the Tensorboard graph below. (each 5M steps took about 1 hour and 10 minutes, a reward of -1 means an airplane crashed immediately, and a reward above 20 means they flew through 40 checkpoints without crashing)

Does anyone know what would cause these strange flatline dips? Thanks!!

akTwelve · Feb 24, 2020

I turned off curiosity and did two more training runs which both worked great. So maybe the agents got curious about what would happen if they crashed into rocks for an hour...?

jeffrey_unity538 · Feb 25, 2020

hmm, interesting... let me post in our internal thread

ervteng_unity · Feb 25, 2020

That type of behavior with Curiosity actually makes sense. Are you seeing a big spike in the Curiosity reward around those plateaus? Crashes and failures tend to be very unpredictable (and result in all sorts of weird states) so the Curiosity module tends to find them extremely interesting.

akTwelve · Feb 26, 2020

Here's the same graph with Curiosity Inverse Loss overlaid. Obviously the scale on the Y-axis is different. Not sure if this explains much, but your logic does.

akTwelve · Feb 27, 2020

Ohhh, now I see it. The Curiosity Value Estimate graph was hidden in Tensorboard for some reason. Yeah, it totally spikes during the troughs of the Cumulative Reward graph.

Search Unity

Inconsistent training with v0.14

akTwelve

akTwelve

jeffrey_unity538

Unity Technologies

ervteng_unity

Unity Technologies

akTwelve

akTwelve

Search Unity

Unity ID

Useful Searches

Inconsistent training with v0.14

akTwelve

akTwelve

jeffrey_unity538

Unity Technologies

ervteng_unity

Unity Technologies

akTwelve

akTwelve