Hi, I am trying to train multiple behaviors at the same time. In my use case, the agent also needs to deal with cases of transition between behaviors. That is, behavior 1 may lead to a certain state, from which behavior 2 needs to recover and continue operating. I am currently randomly switching between behaviors by calling SetModel() during an episode. The performance of the behaviors significantly deteriorates compared to training them separately. Looking at the WallJump example, the behaviors are changed only when an episode begins (or not?). Is there any motivation to end an episode before changing behaviors? Does it affect the returns in any way? Am I missing anything related to PPO in general, or specifically in the ML Agents implementation? Thank you!