Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We’re making changes to the Unity Runtime Fee pricing policy that we announced on September 12th. Access our latest thread for more information!
    Dismiss Notice
  3. Dismiss Notice

'Maze' traversing ML Agent problems

Discussion in 'ML-Agents' started by StewedHarry, Jul 18, 2020.

  1. StewedHarry


    Jan 20, 2020
    I was wondering if I could get some help with my ML Agent project. I'm new to machine learning and know nothing in regards to ML theory.

    I have designed a very simple procedural environment made of blocks (tiles), with a perimeter and two exit points. Inside the perimeter I have place more blocks at even distances apart. Each time the environment resets, a chosen amount of blocks are added to this skeleton to block off certain paths.

    e.g. here is an empty skeleton:
    empty 2.png

    and here is one additional tile added to the skeleton:


    The agent is spawned randomly in the first row. Its observations are the position of the nearest exit, its own position, and a Ray Perception Sensor with a ray for each direction around the agent (N, NE, E , SE....). The agent is given +1 reward for reaching the exit, and this reward is tapered in accordance with how long the agent took to do it:

    Code (CSharp):
    1. AddReward(-1f/MaxStep);
    I have had some limited success using PPO, using the config file of the pyramid example from the ML-agents repo. However, I found that if the agent was to go down a dead-end, it would get stuck there until the end of the episode. To remedy this I tried to add the curiosity intrinsic reward, but the behaviour persisted.

    The training runs were not particularly long, although I didn't see any improvement in these dead-end scenarios after around 5 hours of training.

    I then came across a paper cited in the documentation which recommended SAC for maze like, or path finding situations. However, when I tried to use this algorithm with the equivalent SAC config from the pyramids examples the agents tended to lose any initial success in finding the exits, and proceeded to consistently jitter about in the corner.

    I was wondering reinforcement learning was inherently not conducive to these types of problems.

    Otherwise, what could I do to improve the agents learning, specifically for situations in which the agent may take a dead-end?

    Thanks for any help, and I am happy to provide more details regarding the project if what I have given is not enough to gauge where I may be going wrong.
  2. mbaske


    Dec 31, 2017
    If curiousity doesn't help with encouraging exploration, you can try something like leaving breadcrumbs at visited tiles. Since you're building the maze on a grid, you could track your agent's 2D position on it and update a 2-dimensional array every time the agent enters a new coordinate. You would then assign small rewards for entering tiles that weren't visited before. The agent needs to observe the array values of the neighbouring tiles or groups of tiles (relative to its own position), in order to get a sense of where it has and hasn't been so far. You might even count the number of visits to each tile and have the agent observe those values.
  3. StewedHarry


    Jan 20, 2020
    I went back to PPO and gradually increased the complexity of the maze, they seem to be fairing better (although sometimes getting stuck). I'm going to create a curriculum and then mess around with the curiosity rewards to see what happens. Are there any other params that would be worth adjusting for this type of problem? I was also think about increase the step count for each episode.