Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

What is the most complex AI agent you have trained and how long did it take?

Discussion in 'ML-Agents' started by lukem123, Jun 17, 2020.

  1. lukem123

    lukem123

    Joined:
    Oct 30, 2019
    Posts:
    5
    The example environments are great, but are very simple. I've managed to train a self driving car and a simple 3D drone that moves to a target, both using hundreds of vector observations. But stepping up to something more complex and starting to use visual observations seems to either always fail or take too long.
    I've detailed my situation and issue here: https://github.com/Unity-Technologies/ml-agents/issues/4129

    My question is: does anyone have working examples of environments and agents that are more complex than the examples? What is the realistic upper limit to what we can train with MLAgents and has anyone figured out a way to train that can do millions of steps in 12 hours? For me it takes 12 hours just for 1 million steps and I tried on my friend's gaming computer and it was only 2x faster (the RTX 2070 GPU didn't help as things seemed CPU bound).

    I'd really love to see real world examples of what can be done, and how it was trained! Are the simple example environments the limit of what MLAgents can do, or can we do more?
     
    Adham9 and kpalko like this.
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Hi, I just read the discussion over on github and couldn't agree more that details on how and when GPUs can be leveraged for training should feature more prominently in the docs. Admittedly, I haven't been training models much larger than the example ones yet, with the number of vector observations hardly going far beyond 100. I did get some decent results however by splitting complex behaviour into subtasks and training dedicated policies for those.
    I'm generally trying to avoid visual observations whenever possible, I usually have around nine agents in a scene and run eight executables in parallel. Training with 72 agents for a couple of million to tens of millions of steps takes my somewhat dated CPU a few hours, I mostly let them run overnight.
    I'm also avoiding the RayPerceptionSensor, because it blows up the observation space by creating one-hot encodings for each detectable object type. Like you suggested on github, I'm encoding detection results as float values, so for instance five object types would be represented as -1, -0.5, 0, +0.5, +1. I guess the drawback with this approach is that the agent might take longer (in terms of training time) to discern object types. But I haven't done any A/B comparisons yet in order to test which method is more economical overall.
    As for the number of raycasts, that can be reduced in some situations by hardcoding a detection method using e.g. OverlapSphere and then feeding the agent direction vectors. In my experience, it very much depends on how dynamic the environment is. This approach seems to work best for stationary objects that I can simply sort by distance and angle. For those, I'm reserving a fixed number of vector observations and assign them the sorted detection data. If less objects were detected, I just set neutral values.
     
    kpalko likes this.