Search Unity

Visual observations

Discussion in 'ML-Agents' started by andrzej_, Jun 26, 2020.

  1. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    81
    Quite a few people are asking about using visual observations with ML-agents and even how to train the example environments with visual obs (Hallway and Pyramids). I've spend some time expanding on the idea in the Hallway example and even managed to get some proper generalization.


    The agent was trained with a set of 8 different symbols and is then able to recognize new, unseen symbols (30 total used in testing), with varying success rate (but definitely better than random). It even manages to recognize the Unity logo, which the agent haven't seen before.

    As for the hyperparameters and RL model I've used PPO with LSTM memory and the most significant change is using the ResNet backbone for the visual observations.
    I've also used curriculum learning to ramp up the number of simultaneous symbols and some other environments properties (not shown in the video, but some domain randomization).

    I definitely haven't done a full hyperparameter search, so there's definitely room for improvement, but it took on the order of 10-15M steps to get decent results.
     
    celion_unity likes this.
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    That's awesome! Would you be ok with sharing the project files?
     
  3. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    81
    Sure, I'd have to do a lot of clean up there, but except the random placement of the symbols where I used poisson disc sampling (which I very lazily implemented - once in a while throws an error when it can't get coordinates matching the requirements) there isn't anything that special in the project. In terms of the ML-agent relevant parts I'd say are very similar to the Hallway example.
    One more new thing that I have there, and didn't use in the end, was experience recorder working with Unity's navigation system. My idea was to randomly spawn obstacles, record playthroughs with an agent avoiding those objects and then I could use that data with GAIL. But in the end I've left it for another project, as matching the possible actions of the agent to the navigation based system might be a bit problematic.
     
  4. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    This is fantastic work - thanks for sharing!
     
    andrzej_ likes this.
  5. Jincraftohk

    Jincraftohk

    Joined:
    Jun 24, 2020
    Posts:
    3
    I'm so surprised!
    How did you train the Hallway with visual obs? I tried many hyperparameters but it just didn't work. I have very few experiences with this. Would you please share the detail of your experiment? What is the main factor? The ResNet?
    Thank you for sharing!
     
  6. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    81
    I think the biggest change I saw was changing it to ResNet. Other than that unfortunately I haven't really done a proper documentation for the experiments, even removed most of the experimental runs, so can't give you specific hyperparameter values. In the end it was pretty similar to the example config files
     
  7. Jincraftohk

    Jincraftohk

    Joined:
    Jun 24, 2020
    Posts:
    3
    Thanks! I will try the ResNet right now.
     
  8. hestia_p

    hestia_p

    Joined:
    Aug 11, 2020
    Posts:
    5
    @andrzej_ Hello, What you did was perfectly consistent with what I was looking for. If you don't mind, could you share the project with me? I want to try it.
     
  9. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    81
    Hey ... haven't touched that project for a while and it was based on an old version of ML-agent (still using TF), so most likely a lot of stuff will not work now. Also the only 'trick' here was using ResNet for visual observations.