Search Unity

agents behavior with different algorithms

Discussion in 'ML-Agents' started by m4l4, Sep 20, 2020.

  1. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    Hi everyone, after completing my car race minigame, i'm starting to train the agent to drive on the track.
    I'm quite new to the training algorithms so i'm doing my best to understand differences and properties.
    made some different track to avoid overfitting on the same level over and over.
    At first i've tried with PPO, it took 1 million steps to see the car moving for the first time, but once the agent started making some points (forward velocity = reward, if on track), every car (1*track, 6 total) slowly started moving and making progress. some of them even managed to complete the track 3-4 time before flipping on a side and EndEpisode. after another 1.5 million steps, the agent started braking after the first turn on every track, without ever recovering.
    I've read that SAC should be better for continuous control problems like this one.
    Switched to SAC, and another weird behavior emerged. After the usual million steps, the agent starts understanding how to use brake and throttle pedals. Problem is... only 3 out of 6 cars started improving, the other, just kept steering in place. now i'm at 4.5 million steps and nothing has changed. 3 cars are able to reach half of their track, the other 3 never moved. Tried this twice, and i got the same results, 3 move, 3 don't, but they are not the same car as the previous run....

    How could it be? have you ever experienced something like that? it's a single brain for 6 agents, why are the behaving so differently?