Search Unity

  1. Looking for a job or to hire someone for a project? Check out the re-opened job forums.
    Dismiss Notice
  2. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

Trying to train an agent to match velocity

Discussion in 'ML-Agents' started by wx3labs, Apr 10, 2021.

  1. wx3labs

    wx3labs

    Joined:
    Apr 5, 2014
    Posts:
    72
    In my latest experiments I'm trying to train an agent to reach and hold a specific velocity as quickly as possible using 3D physics constrained to a plane (so y position clamped and x/z rotation clamped).

    This seemed like a simple goal, but the agents never seem to get what I'd call a satisfactory result.

    Actions:
    • Torque -1 to +1
    • Forward force 0 to 1
    Observations:
    • Current velocity (split into normal and magnitude)
    • Target velocity (same)
    • Y angular velocity
    Goal:
    • Get dot product and magnitude within a small range of target, while angular velocity is low (without that requirement the agent can just twirl with increasing thrust)
    I've tried just having that goal with an existential penalty, but it seems too sparse for the agent to ever find it.

    I've also tried giving continuous small reward/penalties for matching the components of the goal. This leads to the agent finding the goal eventually, but not quickly.

    Any suggestions for different strategies? More generally, is there a good resource for understanding how to structure rewards?
     
  2. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    134
    AngrySamsquanch likes this.
  3. wx3labs

    wx3labs

    Joined:
    Apr 5, 2014
    Posts:
    72
    The agents' rigidbodies have drag, so they do slow down if their forward thrust isn't maxed out.

    I've gotten them to find the reward by making the goal very relaxed (e.g., dot product > 0.6, magnitude within 30%, angular v less than 0.5 radians per second), but as soon as I constrict the reward more than that, they never find it.

    Alternately, I've gotten mediocre success by setting continuous rewards/penalties for the 3 components, but the agents seem to always find an exploit. May be this task isn't a good fit for ML?
     
unityunity