Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Reward Agent finding target, but there's obstacles?

Discussion in 'ML-Agents' started by JPhilipp, Jan 30, 2020.

  1. JPhilipp


    Oct 7, 2014
    I have a conceptual question. When training an agent to find a target, but there's obstacles in the way -- in my case, it's a rigidbody helicopter in a city of skycrapers, trying to find a random transform -- how can I properly reward & punish the agent so that it's not punished for taking justified detours to get around a skycraper?

    My current approach is to reward as the agent gets closer to the target, but I see the challenge of it then constantly bumping against skyscrapers as it found a local maximum of sorts:

    Code (CSharp):
    1. if (isCloseToTarget && isSlow)
    2. {
    3.     SetReward(1f);
    4.     Done();
    5. }
    6. else if (previousDistanceToTarget != null)
    7. {
    8.     if (distanceToTarget < previousDistanceToTarget)
    9.     {
    10.         SetReward(0.01f);
    11.     }
    12.     else if (distanceToTarget > previousDistanceToTarget)
    13.     {
    14.         SetReward(-0.01f);
    15.     }
    16.     else
    17.     {
    18.         const float punishmentPerTimeWasted = -0.001f;
    19.         SetReward(punishmentPerTimeWasted);
    20.     }
    21. }
    I suppose another approach would be to not reward for getting closer or further at all (and just reward once on win), but I understand this can make training much longer.

    Many thanks!
  2. JPhilipp


    Oct 7, 2014
    Not sure if it'll help, but I'm adding a bit of curiosity into the reward signal mix. Still looking for help if anyone knows more.

    Code (CSharp):
    1. HelicopterSteerable:
    2.     max_steps: 500000
    3.     normalize: true
    4.     reward_signals:
    5.         extrinsic:
    6.             strength: 1.0
    7.             gamma: 0.99
    8.         curiosity:
    9.             strength: 0.02
    10.             gamma: 0.99
    11.             encoding_size: 256
  3. mbaske


    Dec 31, 2017
    I've tried something similar a while back. My reward was based on target direction and speed (assuming the agent is aware of the target position).
    Code (CSharp):
    1. Vector3 targetDirection = (target.position - agent.position).normalized;
    2. AddReward(Vector3.Dot(agent.rigidbody.velocity, targetDirection) * rewardFactor);
    For penalties, you could use collisions or raycasts or both. If you have raycast detection for your observations anyway, then you can set a proximity threshold. If hit.distance goes below that value, meaning the agent flies to close too an obstacle without hitting it, you can penalize inversely proportional to that distance.
    My agent managed to fly some minor detours through the city grid. But struggeled with larger ones. I guess it comes down to the target angle when using vector dot product. Flying orthogonal to the target direction doesn't yield any positive rewards in this case.
    JPhilipp likes this.
  4. JPhilipp


    Oct 7, 2014
    Thanks! Penalizing for obstacles makes sense. Maybe for starters I can collect isColliding as new observed bool signal, and then penalize for colliding -- after all that would also damage a real helicopter so it makes double sense.