Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Reward stuck for continuous but nt discrete actions

Discussion in 'ML-Agents' started by fedetask, Jan 31, 2020.

  1. fedetask

    fedetask

    Joined:
    Jan 17, 2020
    Posts:
    7
    Hi, I'm training a car to drive around some obstacles. I'm running two versions of the agent one with continuous actions -steering and acceleration- and one with discrete actions, where steering and acceleration are the same values that I use in the continuous version, discretized in N values.

    I'm training with PPO, and I get these two plots for the reward: https://imgur.com/a/7ilFpNF

    As you can see, while discrete continues to improve, the continuous version is completely stuck.

    I'm not asking you anything specific to my code, but I'd like to have some insights about how can this happen. Could it be due to not enough exploration?

    If you need other information ask me.
     
  2. JPhilipp

    JPhilipp

    Joined:
    Oct 7, 2014
    Posts:
    56
    Interesting. Some questions:

    - Is it at all possible (maybe due to a bug or side effect) that -1 is the maximum lifetime achievable reward value for your Continuous version?
    - What's your observation stack size on the Continuous version? Today I had a variant of my training where, after adding 2 more observations, it started to act weirdly and plateau after a while with some random spikes, not unsimilar to your case (thought it plateaued on the bottom). It was fixed again by removing 2 observations (bringing it from 18 to 16 again, ray sensors notwithstanding).
    - Just in case it's relevant, how many num_layers do you have in your Yaml config settings for this training? (I have 3.)

    On a side note, in my tries for a target-finding helicopter, I ended up using Continous, as Discrete never properly worked (I had figured by restricting it to what are basically WASD keypress bools, I'd optimize the decision room, but that wasn't successful), and I also ended up using rather more ray sensors than less. I'm not 100% sure though which change it was that brought big improvements (the Continuous switch, or the Sensors additions).