Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question Training ML Agents in cloud

Discussion in 'ML-Agents' started by Pavel_Arkhipov, Aug 9, 2023.

  1. Pavel_Arkhipov

    Pavel_Arkhipov

    Joined:
    Jul 19, 2020
    Posts:
    12
    Hello everyone,

    I wonder what is the current state of cloud training. In other words, can I transfer the training process to another computer?
     
  2. CloudyVR

    CloudyVR

    Joined:
    Mar 26, 2017
    Posts:
    709
    I was successful in my attempt at rewriting the Python modules in the Python library folders:

    Project\venv\Lib\site-packages\mlagents\trainers\
    Project\venv\Lib\site-packages\mlagents_envs\


    I had to allow remote connections, basically by disabling the subprocess calls in ML agents, then wrote a custom subprocess manager that runs on each remote PC.

    I also had to modify the ML agents Unity package and added a additional -mlagents-address argument that can be passed when launching the build game that forces the the ML agents academy scripts to connect to a custom IP rather than using the default localhost address.

    This is my parallel trainer (you can see the custom mlagents-address command displayed in the prompts):
    IMG_0018.JPG
    It took a few days of work to rewrite the trainer for networked distributed training but was worth the investment. Having the ability to train in parallel has saved countless days of training time for me. When using all four I9 CPUs and with proper gigabit Ethernet connections what used to take 16 hours now takes only a few hours.

    So it should be possible for anyone to customize the trainer, I am surprised Unity does not already offer this, I had no idea how any of their academy code worked but was able to develop a workable solution in only a few days. I am pretty sure the developers would be able to do something similar?

    Please Unity add native support for parallel training!
     
    Last edited: Aug 9, 2023
  3. Pavel_Arkhipov

    Pavel_Arkhipov

    Joined:
    Jul 19, 2020
    Posts:
    12
    Wow! Very impressive, although I was thinking on renting copute powers in a third party clouds. But your approach looks inspirational.

    I stubmble upon AWS solution and I am currently running tests on their free micro PC. Everything works fine, although I wonder if there is any better solutions