Search Unity

  1. We are migrating the Unity Forums to Unity Discussions. On July 12, the Unity Forums will become read-only. On July 15, Unity Discussions will become read-only until July 18, when the new design and the migrated forum contents will go live. Read our full announcement for more information and let us know if you have any questions.

Rewards Definition

Discussion in 'ML-Agents' started by dani_kal, Apr 26, 2020.

  1. dani_kal


    Mar 25, 2020
    I need your help for one more time!!!
    My goal is to train an agent to navigate from a starting point to a target point by avoiding obstacles that exist between the initial and final point so that there is difficulty for the agent to reach the goal-position.

    My mistake maybe is when I define rewards.

    The actions of the agent are by moving :

    1. Reward = -0.01 in each step of the algorithm
    2. Reward = -1, when the agent hits on the obstacle
    3. Reward = 1, when the agent reaches the goal-position.

    The agent resets with Done(), what he reaches the goal position.

    After 50,000 steps the result is that the agent cannot navigate from the initial point to the final, but he makes only the 1/4 of the way.Τhat is, he never reaches the final goal.

    I have tried also more steps, steps = 100000, but the result is exactly the same.

    I am grateful if anyone can help me, because I do not know what else to do.
    Thank you in advance!!!.
  2. unity_-DoCqyPS6-iU3A


    Aug 18, 2018
    Hello dani_kal,

    how does your agent "see" obstacles and goals?
    absolute positions? (i.e. "you are at this x,y position" and "your 2 closest obstacles are at these x,y-positions" and "your goal is at this x,y position")
    Or relative positions? (i.e. "your closest 2 obstacles are dx, dy away from you" and "your goal is dx, dy away from you")
    From what I've read, relative positions are preferable (agent learns faster).

    What other observations does your agent have to get to the goal?

    How many steps does it take you (as a human player) from start to goal position?

    Depending on the answers above you may need to tune your hyperparameters (baby steps. Please answer the questions above first)

    You can also try to set the obstacle-"punishment" to a smaller value (like -0.1). If the punishment is too high the agent is discouraged from trying anything at all.
  3. dani_kal


    Mar 25, 2020
    Thank you very much for your answer!!!
    I use absolute positions and only these three observation (x,y coordinates of the agents position, obstacle and goal).

    I tried to calculate every time the distance between agent-goal and if the distance increases means that the agent gets away from the goal position and give reward=-0.5, otherwise reward = 0.5.

    I tried your suggestion by defining the obstacle-"punishment" = -0.1.

    With these changes the result is more or less the same.

    Do you have anything else in mind to try it?
    It looks like the agent doesn't recognize the goal and that's why he never gets there.
  4. unity_-DoCqyPS6-iU3A


    Aug 18, 2018
    This is a good approch. It's considered a "dense reward"-system, since the agent receives feedback about his agents after every step (opposed to just at the end of the episode or when it bumps into an obstacle). "Dense Reward"-scenarions are usually training faster. Be sure to ALSO include a small punishment for every step taken in this case. Otherwise the final reward of an episode will always be the same - no matter how long the agent took to solve the problem.

    Do your positions for start, goal, obstacle always stay the same, or are they randomized?
    If they stay the same, absolute positions could work.
    If they are randomized, you're better off with relative positions.
    (Think of it as the number of experiences needed for the agent to succesfully learn every scenario. If he sees relative positions, he can use every single run to improve his network. If he sees absolute coordinates he's in a different scenario every time)

    Do you normalize your coordinates? (i.e. instead of giving your agent the position at coordinate 7, 3, it would be better to tell the agent: you're 70% of the playing field width of the left side of the area (0.7) and 30% of the height from the top of the area (0.3).
    Observations ranging from -1 to 1 improve the algorithms performance.
    Even better of course would be relative coordinates ("The distance between you and your goal is 20% of the playing field with in horizontal direction, and 10% of the playing field height in vertical direction (0.2 / 0.1))

    Your agent DOES move kind of randomly correct? It's not just standing at the start, or moving into the same corner everytime?
  5. dani_kal


    Mar 25, 2020
    Thank you indeed for the time you spent to help me!!!!

    The positions of the start,goal and obstacle are the same every time.
    Ohh no!! I don't normalize the coordinates...
    Yes you are right. My agent moves randomly every time
    Can I ask something else?

    I returned to Basic Example to implement it from the beginning, so to understand what I am doing wrong.
    I have deleted FixedUpdate() and WaitTimeInference( ) functions and make some other changes that I have done to mycode, so to see if it works.
    It seems that it is training but very very very very very fast.
    I think that I have to change something from here, so the steps are normal and not very very quick.

    I have deleted these two functions, because one of my changes in the Basic example is that I define also
    public GameObject cube, apart from
    public GameObject largeGoal;
    public GameObject smallGoal;

    and with cube.transform.position, I can move the cube.
    But with this definition an error occurs where the above two functions exist.

    NullReferenceException: Object reference not set to an instance of an object
    BasicAgent.WaitTimeInference () (at Assets/ML-Agents/Examples/Basic/Scripts/BasicAgent.cs:94)
    BasicAgent.FixedUpdate () (at Assets/ML-Agents/Examples/Basic/Scripts/BasicAgent.cs:89)

    so, I deleted them......
  6. unity_-DoCqyPS6-iU3A


    Aug 18, 2018

    the speed of the simulation can be changed with the "TimeScale"-slider in your picture above.
    You have two sliders (one for training, and one for inference).

    If you're still trying to train your model with mlagents, "Training Configuration" is used.
    If you've already got an *.nn model, "Inference Configuration" is used.

    But I think they removed the configuration from your picture in one of the latest releases.
    Which version of ML-Agents are you using?
    Look for a string called k_APIVersion in your Academy.cs
  7. dani_kal


    Mar 25, 2020
    OOo thank you very much for the explanation!!!
    I had confused them and I didn't know what is what.
    My version of ML-Agents is 0.10.
    I think I've almost managed to train.
    It helps to define "Max steps" in the Academy so to prevent him from going too far.