Hi, I'm training a car to drive around some obstacles. I'm running two versions of the agent one with continuous actions -steering and acceleration- and one with discrete actions, where steering and acceleration are the same values that I use in the continuous version, discretized in N values. I'm training with PPO, and I get these two plots for the reward: https://imgur.com/a/7ilFpNF As you can see, while discrete continues to improve, the continuous version is completely stuck. I'm not asking you anything specific to my code, but I'd like to have some insights about how can this happen. Could it be due to not enough exploration? If you need other information ask me.