Search Unity

Profound agent behavior while training

Discussion in 'ML-Agents' started by Xysch, Apr 1, 2020.

  1. Xysch

    Xysch

    Joined:
    Sep 2, 2013
    Posts:
    9
    Hello everyone,
    Just to cover the basics, I am using ml-agents version 0.14, python 3.6.8, and unity 2019.2.8.
    Now, my current game is set up as such: I have an agent which I want to have jumping through rings along a path in an infinite runner. These rings, will have random heights and random distances between, within certain limitations of course, and the objective of the agent is to learn how to make it through this obstacle course.
    It is currently set up to take 14 observations:
    1. The player's position(Vector3)
    2. The closest ring(Vector3)
    3. The second closest ring(Vector3)
    4. The third closest ring(Vector3)
    5. The remaining jumps(for double jump)
    6. The current slope between the player and the nearest ring
    The rewards are +0.25 if the agent makes it through the ring perfectly, -0.25 if the agent misses the ring, +0.01 if the agent hits part of the ring, and +0.01 if the agent jump within +/- 10% of the linear height, so it may have chosen the correct jump value but incorrect timing. The environment I am training in has 4 rings and actions are discrete and only allowed when in the correct slope for the nearest ring.

    So for whatever reason, it always trains to have the agent action become either -1 or 1 and all the agents perform the exact same behavior in under 10,000 steps and then never changes it's behavior to improve its average. It simply either jumps at 100% or doesn't jump at all.
    What may cause this and how would I fix something like this?
    Thanks!
     
  2. Xysch

    Xysch

    Joined:
    Sep 2, 2013
    Posts:
    9
    If however, I reduce the number of ring observations to 2, it works perfectly fine and trains as normal
     
  3. LexVolkov

    LexVolkov

    Joined:
    Sep 14, 2014
    Posts:
    62
    Hi.
    If he jumps perfectly, then what is the problem? :D

    He can stand idle because he thinks it is more profitable.

    Would it be more interesting to use Rays Perception Sensors?
     
  4. Xysch

    Xysch

    Joined:
    Sep 2, 2013
    Posts:
    9
    When I say it just at 100% I mean that its output is 1 where is should be somewhere between 0 and 1.
    And it is least profitable to not jump at all as that case will result in the highest punishment but the behavior doesn't try to improve itself.
    Like I said it works when I give it the observations of ring 1 and 2, but as soon as I give it the third it does not try to train properly
     
  5. TreyK-47

    TreyK-47

    Unity Technologies

    Joined:
    Oct 22, 2019
    Posts:
    1,822
    I'll flag this for the team to have a look!