Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Advice on Navigation Research Project

Discussion in 'ML-Agents' started by asad133, Apr 1, 2020.

  1. asad133

    asad133

    Joined:
    Apr 1, 2020
    Posts:
    22
    I have been using Unity ML-Agents for a while. I am doing Reinforcement Learning Research. I am training agents to navigate to targets in environments with different obstacles. I am trying to show generalisation in the policy learned.


    This is a discrete action space where the agent can move Up-Down, Left-Right and diagonally by simultaneously choosing two actions. The agent observations are its position and the goal position. It is equipped with rays that go in 8 directions all around it.


    Here is an example of one of the difficult environments:

    upload_2020-4-1_15-44-10.png

    I am using curiosity as well as curriculum learning. Curiosity since the problem is a sparse rewards problem by continuously making the environment larger.


    My goal is to make an agent learn by placing the agent and target in random positions and across different environments (with different obstacles), and then test in unseen but similar environments.

    First issue:
    upload_2020-4-1_15-44-24.png

    Why is entropy behaving so weirdly?? In my previous round of experiments, I had it decrease and more importantly, it varied over a wide range of values. In this instance it is only around 1.95. I believe this is a major reason why my agents are not learning properly.


    Secondly, please advise on how to reward the agent. I have tried MANY MANY different reward functions with varied success. I am giving a larger reward for finding the target and penalising on every timestep to make it end the episode as fast as possible. I am giving it an additional reward that reward the agents as it moves closer to the target.


    I also looked at penalising if it moves further away, penalising collisions with obstacles and many others. The unity GitHub issues discussions indicated against penalising specific behaviour.


    Any other advice on the hyperparameters? The GitHub issues advised a larger buffer and batch size.


    I know there is a lot that I have asked but any guidance is greatly appreciated. I have looked at the GitHub issues at length and used those to guide me when I have issues.
     

    Attached Files:

  2. asad133

    asad133

    Joined:
    Apr 1, 2020
    Posts:
    22
    Environment Image
     

    Attached Files:

    • Env.PNG
      Env.PNG
      File size:
      10.5 KB
      Views:
      329
  3. awjuliani

    awjuliani

    Unity Technologies

    Joined:
    Mar 1, 2017
    Posts:
    69
    Hi asad133,

    Can you share a little more about your hyperparameters and reward function? There are a number of reasons why the entropy might not be dropping. They are likely related to a lack of learning performance as well, but that is difficult to tell without knowing the reward function either.
     
  4. asad133

    asad133

    Joined:
    Apr 1, 2020
    Posts:
    22
    These are the current parameters that I have tried to tune
    RollerBallBrainBranch:
    summary_freq: 5000
    batch_size: 512
    buffer_size: 5120
    learning_rate: 1.0e-5
    max_steps: 5.0e7
    hidden_units: 256
    time_horizon: 128
    beta: 1.0e-2
    reward_signals:
    extrinsic:
    strength: 1.0
    gamma: 0.99
    curiosity:
    strength: 0.1
    gamma: 0.99
    encoding_size: 64
    learning_rate: 1.0e-4
     
  5. asad133

    asad133

    Joined:
    Apr 1, 2020
    Posts:
    22
    Currently my reward function if +1 if find goal. -1/maxsteps as an existential penalty to encourage agent to finish as fast as possible. and a shaped reward based on distance:
    distance from the (target to the agent / maximum diagonal distance)/maxsteps
     
  6. asad133

    asad133

    Joined:
    Apr 1, 2020
    Posts:
    22
    I was inspired by the wall jump curriculum. First mode is only single obstacle. Second mode is maze like environments. Subsequent 4 modes increase the floor size by factor 2 each time. It randomly decides on which obstacles at each time step.
     

    Attached Files:

    • Env2.PNG
      Env2.PNG
      File size:
      65.2 KB
      Views:
      327
  7. asad133

    asad133

    Joined:
    Apr 1, 2020
    Posts:
    22
    There is so many different things that I have tried but I keep running into different issues each time. Maybe there is too much randomness in the environments for the agent to learn i.e random spawn positions, and different obstacles and then different floor size.

    The reason for this is that I wrote a paper showing a technique that has agents learn separately for each environment with fixed goal and agent spawn points. I want to add more flexibility to extend this work.