Search Unity

Question Training an Agent to Compare Rotations

Discussion in 'ML-Agents' started by nom4d, Jul 4, 2020.

  1. nom4d

    nom4d

    Joined:
    Oct 11, 2018
    Posts:
    5
    Hello.

    We have a project, which uses simplified motion capture data to reconstruct the pose of a human arm in real time. We receive the data in unity as a collection of 3 rotations, each representing a major joint of the arm (shoulder, elbow, wrist).





    We are attempting to create a solution which, given a predefined set of reference poses, is able to tell us which pose the performer's arm is closest to. Our intention is to use Imitation Learning (GAIL and Behavioral Cloning), to observe the rotations of each joint in the input pose, and make a discrete selection of which reference pose is closest.

    We are working in iterative steps towards this solution, and currently attempting to have an ml-agent understand when its rotation is near another object's rotation.
    We’re having trouble training this ML agent, in our tests the agent is occasionally able to guess correctly but has a lot of noise and false positives.
    This is making it hard to continue working towards our overall goal. Any advice is appreciated.
    Our vector action space is discrete, as mentioned, and as of right now consists of 2 values. Those, within our logic, relating to “I am near the other objects rotation” and “I am far from the other objects rotation”.

    To create our demonstration for our agent to learn from we have a scene that consists of a sphere, on which two objects reside on the surface of. One of these objects is an ml agent, which we treat the goal, and one is an object we control the rotation of.
    We turn off automatic stepping within the academy instance.
    At the start of the episode we randomize the rotation of the goal on the surface of this sphere. Then we move the other objects to match the ml-agent. When the other object reaches the goal, we start to request steps, and feed the agent heuristic inputs telling the brain these two objects are near.
    We have experimented with ending the episode here, and collecting no far decisions. As well manually ending the episode after providing a handful of variations of both far and close decisions.
    We have tried models with a variety of different observations as well.
    Only observing the rotations of the goal and other game object.
    Observing the Dot Product on each plane after taking each object's rotation and applying them onto vectors.
    And a model that observed the rotations of both objects, the dot products, and the normalized angle in degrees from a Quaternion.Angle output given the rotations of the two objects.

    We have used demonstrations consisting between 150 and 30,000 steps.
    Our training or learning period normally goes past one million steps, with a handful of exceptions. And occasionally reaching over three million steps.
    Our config file is included below.

    behaviors:
    My Behavior:
    trainer_type: ppo
    hyperparameters:
    batch_size: 2024
    buffer_size: 20240
    learning_rate: 0.0003
    beta: 0.005
    epsilon: 0.2
    lambd: 0.95
    num_epoch: 3
    learning_rate_schedule: linear
    network_settings:
    normalize: true
    hidden_units: 512
    num_layers: 3
    vis_encode_type: simple
    reward_signals:
    gail:
    gamma: 0.99
    strength: 1.0
    encoding_size: 128
    learning_rate: 0.0003
    use_actions: false
    use_vail: false
    demo_path: Demos/Imitation1Dot.demo
    keep_checkpoints: 5
    max_steps: 10000000
    time_horizon: 1000
    summary_freq: 30000
    threaded: true
    behavioral_cloning:
    demo_path: Demos/Imitation1Dot.demo
    steps: 0
    strength: 1
    samples_per_update: 0