Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Reducing physics jitter caused by policy randomness?

Discussion in 'ML-Agents' started by mbaske, Nov 16, 2020.

  1. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Hi, I have a humanoid agent rigged with 20 configurable joints. The policy controls joint rotations and maximum forces with continuous actions, similar to the walker example environment. My agent starts out in T-pose and can use imitation learning with demonstrations generated from animation sequences.

    Ideally, what I would like to see is that it first learns how to keep its balance and then gradually explores the range of possible movement. But of course, due to the initial randomness of the policy, it loses balance quickly and then just bounces around on all fours. So far, I couldn't get my agent to recover again, once it falls over. If I reward it for keeping upright, it will just keep bouncing in such a way as to minimize its y-axis inclination, but it never ends up in a stable standing position again.

    Here's what I'm doing to reduce motion randomness:
    - Use relative rather than absolute rotation values. Agent actions control rotation deltas. Maximum rotation angles are constrained by joint limits.
    - Set maximum per-step rotation deltas for each joint and axis, based on what's happening in the animation clips. For instance, a lower leg joint can do a larger per-step rotation than an upper leg joint. Rotation steps can be larger on its x-axis than on its y and z-axes.
    - Exponential rather than linear mapping of actions to rotation deltas. The idea here is to shift the average rotation delta closer to zero, given a completely random action distribution. With an exponent of 2 for instace, an action value of 0.5 would rotate a joint by only 25%.

    The rigidbody masses are somewhat realistic. I thought about shifting the mass distribution towards the legs and feet, in order to make the agent more stable overall. But I'm afraid this might cause rather unrealistic motion in the long run.

    I think my general physics setup (solver iterations etc) is sound. In order to record demonstrations, I built a heuristic which attempts to sync the configurable joints to animation clips. With that, I tweaked the physics settings until the difference between heuristic and animated movement was minimal.

    Is there anything else I can do to stablize my agent and keep it from thrashing around in early training? Thanks!
     
  2. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    The things you described seems natural to me. The policy is a bunch of random numbers at first and the jitter is part of the process of the policy's exploration to find out what's good and what's bad. The fact that it never recovers once it falls over could be that it's too hard for the agent to ever reach a good state once. The steps you took could help a bit since you constraint the agent from doing something too crazy, but it still needs to go through the process of trying out randomly and learn the good ones.

    If you keep training for much longer when the agent has more chances to reach to a good state and receive good signal, the agent should be able to learn eventually. Another thing is your design of reward function. As you said the current design doesn't seems to drive to model in the right direction and the agent keeps bouncing, so tweaking the reward might help.

    If you want to help the training more explicitly, maybe you can try with imitation learning to bootstrap the model so the agent can get to the right direction much quicker at the beginning, or use curriculum learning to make it much easier at first (like shifting the mass distribution) and then gradually make it more difficult.
     
    mbaske likes this.
  3. sohojoe

    sohojoe

    Joined:
    Feb 13, 2015
    Posts:
    21
    @mbaske I have done a bunch of work on humanoids; it hard to start with them and I found it easier to start with a 2d hopper / walker and worked my way up to the humanoid. That way, I was making small incremental changes which made it easier to understand the causes of issues. That said, your general approach is good, I would expect it is something to do with either the joins you are using and/or the frequency you are updating. I found with the old joints I had to update at 200+ fps and have a frame skip of 5 frames or so. With the newer ArticualatedJoints I think I'm running at 40fps.

    you can see my code here - but note that the rig is a mess; if you are looking for something that will map to a more normal Unity character than we have some wip code here and you can follow along by joining the discord channel.

    Good luck.
     
    mbaske likes this.
  4. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Thanks for your feedback @unity_ruoping & @sohojoe!

    Joe, thanks for pointing out the ArticualatedJoints. I somehow managed to miss that new feature - I'll give it a try and see if it yields better results than using configurable joints.
    I did check out the marathon-envs a while back (great work!) and remember being a little overwhelmed by your style transfer approach at first. Definitely need to take closer look again.