Search Unity

Question Questions about Flight AI

Discussion in 'ML-Agents' started by InfinityGameDev, May 1, 2021.

  1. InfinityGameDev

    InfinityGameDev

    Joined:
    Jun 28, 2016
    Posts:
    54
    Hi all, I’m trying to make a flight AI that follows a target but I’m having trouble. I figured I would set the reward = distance from target at the end of each episode. But after over 1M steps all the planes do is loops. If they collide with the ground they lose 100 points. I’m confused why the AI hasn’t improved. I’ve tried random positioning, rotation, 1 target, random targets, and still nothing. I’ve also tried imitation learning. Is there a reason why SetReward() is adding to each self? I tried calling it each step, thinking it would override the Cumulative reward but it’s adding to itself. Does anyone have any ideas of how I can get this working? Thanks :)
     
  2. LexVolkov

    LexVolkov

    Joined:
    Sep 14, 2014
    Posts:
    62
    What Observations information do agents receive? and what can they do?
     
  3. InfinityGameDev

    InfinityGameDev

    Joined:
    Jun 28, 2016
    Posts:
    54
    Hi @LexVolkov , I observe the distance and dot product of the gameobject and target gameobject, as well as the transform positions and rotations of both. The agent can pitch up, down, or do nothing. As well as roll left, right, or none. I keep messing with the reward values but nothing seems to be improving 12+ hours into training.

    Edit: There are also RaycastSensors. One pointing down and one forward with about 10 rays each in an arc
     
    Last edited: May 1, 2021
  4. LexVolkov

    LexVolkov

    Joined:
    Sep 14, 2014
    Posts:
    62
    SetReward () overrides the episode reward.
    I think you need to remove unnecessary garbage data first. leave for example one sensor.
    Then remove unnecessary rewards. leave +0.1 for shortening the distance. There is a similar example in the examples. There the agent (worm) follows the target.
    You can also try to teach from recorded actions.
    As it was written in the documentation, you don't need to think for the agent and tell him what to do. Give him purpose and freedom =)