Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice

'Maze' traversing ML Agent problems

Discussion in 'ML-Agents' started by StewedHarry, Jul 18, 2020.

  1. StewedHarry

    StewedHarry

    Joined:
    Jan 20, 2020
    Posts:
    45
    I was wondering if I could get some help with my ML Agent project. I'm new to machine learning and know nothing in regards to ML theory.

    I have designed a very simple procedural environment made of blocks (tiles), with a perimeter and two exit points. Inside the perimeter I have place more blocks at even distances apart. Each time the environment resets, a chosen amount of blocks are added to this skeleton to block off certain paths.

    e.g. here is an empty skeleton:
    empty 2.png

    and here is one additional tile added to the skeleton:

    empty.png

    The agent is spawned randomly in the first row. Its observations are the position of the nearest exit, its own position, and a Ray Perception Sensor with a ray for each direction around the agent (N, NE, E , SE....). The agent is given +1 reward for reaching the exit, and this reward is tapered in accordance with how long the agent took to do it:

    Code (CSharp):
    1. AddReward(-1f/MaxStep);
    I have had some limited success using PPO, using the config file of the pyramid example from the ML-agents repo. However, I found that if the agent was to go down a dead-end, it would get stuck there until the end of the episode. To remedy this I tried to add the curiosity intrinsic reward, but the behaviour persisted.

    The training runs were not particularly long, although I didn't see any improvement in these dead-end scenarios after around 5 hours of training.

    I then came across a paper cited in the documentation which recommended SAC for maze like, or path finding situations. However, when I tried to use this algorithm with the equivalent SAC config from the pyramids examples the agents tended to lose any initial success in finding the exits, and proceeded to consistently jitter about in the corner.

    I was wondering reinforcement learning was inherently not conducive to these types of problems.

    Otherwise, what could I do to improve the agents learning, specifically for situations in which the agent may take a dead-end?

    Thanks for any help, and I am happy to provide more details regarding the project if what I have given is not enough to gauge where I may be going wrong.
     
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    If curiousity doesn't help with encouraging exploration, you can try something like leaving breadcrumbs at visited tiles. Since you're building the maze on a grid, you could track your agent's 2D position on it and update a 2-dimensional array every time the agent enters a new coordinate. You would then assign small rewards for entering tiles that weren't visited before. The agent needs to observe the array values of the neighbouring tiles or groups of tiles (relative to its own position), in order to get a sense of where it has and hasn't been so far. You might even count the number of visits to each tile and have the agent observe those values.
     
  3. StewedHarry

    StewedHarry

    Joined:
    Jan 20, 2020
    Posts:
    45
    I went back to PPO and gradually increased the complexity of the maze, they seem to be fairing better (although sometimes getting stuck). I'm going to create a curriculum and then mess around with the curiosity rewards to see what happens. Are there any other params that would be worth adjusting for this type of problem? I was also think about increase the step count for each episode.