Search Unity

Hyperparameters tuning in ML-Agents

Discussion in 'ML-Agents' started by Wolf00007, Jul 18, 2021.

  1. Wolf00007

    Wolf00007

    Joined:
    Jan 26, 2019
    Posts:
    24
    Hi there,

    Is there a good method to tune the hyperparameters using ML-Agents. I know the docs suggest some of the values and I can just try out some combinations, one by one. But maybe there is a way to automate this process and test all possible scenarios (from a limited amount of values of course). For example, I would like to test a few hyperparameters for three different values each. Am I wrong or the only way to do this (the easiest way I found) is to create a learning environment executable, prepare A LOT of configuration files and just run all of these trainings in the python virtual environment? Or maybe there is an easier way to test all these values?

    Thanks!
     
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    This is hopelessly outdated (https://github.com/mbaske/ml-agents-hyperparams), but maybe the basic idea still has some merit. Which is to write a batch runner that launches mlagents-learn python processes. Back then, I hacked trainer_controller.py and injected the hyperparameters directly. Although simply generating a bunch of config.yaml files automatically upfront should be a better approach. An advantage of hooking into the trainer_controller was that I could check for stop conditions being met. I'm not sure how you could do that with a batch runner which merely launches processes.

    Btw, I have no idea why all the ML-Agents folks are listed as contributors in my repo. I must have done some weird forking and re-basing at some point, and github somehow copied them over.
     
  3. Wolf00007

    Wolf00007

    Joined:
    Jan 26, 2019
    Posts:
    24
    So essentially, if I prepare some configs and I run the learning process through an executable (I can then launch all processes one by one), I will achieve the same result? Which is to have these trainings listed in Tensorboard so I can compare them? I'm pretty new to this topic so I'm wondering if my idea will give me similar results to what these search algorithms do.

    Regarding running mlagents-learn processes, I assume you can do so by pasting the commands in CMD/Powershell and put '&&' in between them, right?
     
  4. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Well, I thought it would be nice to have some front end where you can specify the search, like "do a grid search for hyperparameters x and y", and then have the batch runner create the config files and launch python for you. Perhaps one could use Tensorboard's HTTP API for tracking the training progress and conditional early stopping.
    https://github.com/tensorflow/tensorboard/blob/master/tensorboard/plugins/scalar/http_api.md
    Haven't looked into CMD/Powershell yet though.
     
  5. Wolf00007

    Wolf00007

    Joined:
    Jan 26, 2019
    Posts:
    24
    @mbaske When you were running your hyperparameter tuner, was the whole process taking a long time? I mean, if I were to test many different combinations of parameters for like 10 times, it would take probably days, considering one training session can last around 10 minutes. I'm asking because if I understand correctly, if I were to do a grid-search on my own for three parameters and three values for each of the parameters, I would end up having 27 training runs (x10 is 270...). Do you know how I could speed up this process? Lesser max_step value?

    Also, is there a different way of obtaining all the training data that gets fed to TensorBoard like average reward, step, etc? Currently, I would have to download each training result to a CSV file, convert it to .xlsx and get for example an average reward score for that run. This would take a very long time...

    @Edit I have just noticed there are two .json files located in the results folder after running trainings where you can see all the mean, max and min values of all the data involved in the training... I am learning something new everyday :) So you can disregard my second question :p
     
    Last edited: Jul 23, 2021
  6. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    @Wolf00007 You're right, total training time can increase exponentially for grid searches. I was hoping that setting stop conditions could filter out training runs which produce bad results early. I'm currently looking into updating the project and think it's possible to get it working without having to change any of the ml-agents files.
    You can get all training scalars from the TensorBoard HTTP API btw. A separate python process could query the API and write the metrics you're interested in to a file.
     
  7. Wolf00007

    Wolf00007

    Joined:
    Jan 26, 2019
    Posts:
    24
    @mbaske Sounds cool, please let me know in this thread if you manage to finish updating your project :)
     
  8. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
  9. Wolf00007

    Wolf00007

    Joined:
    Jan 26, 2019
    Posts:
    24
    @mbaske Looks good. I see you're using the values that get generated in TensorBoard. I found something weird when it comes to the timers.json file in run_logs folder. I was mainly looking there to see the average reward for the whole training run. But the values there seem to be incorrect sometimes:

    upload_2021-7-25_14-57-48.png

    How can the mean reward be the same as the lowest reward received? Are these values bugged? For some instances I also got mean rewards that did not look correct (when compared to their graphs in TensorBoard. Do you have the same problem in your training runs? Or do you only look at the graphs and decide on hyperparameters based on the final reward received?

    Other example: I did two runs for the same config file, one after the other. The first run showed mean cumulative reward as 3.77256006114184 in run_logs/timers.json but the second one showed it as -0.9996000453829765 which makes no sense.
     
  10. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Thanks, I haven't looked into run_logs/timers.json in detail yet, but I agree it looks a bit weird. For one of my runs, I'm seeing "value" being the same as "max" for an Environment.CumulativeReward.mean. Not sure how to interpret that really. For my batch runner, I'm only relying on the TensorBoard scalar values. Mainly because I wanted to track the training progess and abort runs when they meet some stop condition. I don't think this would be possible with timers.json, AFAIK it's only written when training is complete.
     
  11. Wolf00007

    Wolf00007

    Joined:
    Jan 26, 2019
    Posts:
    24
    Are you looking at the graphs only and picking the runs with the highest consistent rewards? Or do you look at the last reward only?
     
  12. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    I'm only checking the latest value for whatever TensorBoard tag is set in the a stop condition (my yaml opt_stop param). Doesn't have to be rewards necessarily, could just as well be some custom metric you're sending via the StatsRecorder. But yeah, it's basically a 'dumb' grid search, because it doesn't do any evaluation of the overall training performance. Should be interesting though to dynamically pick or even generate config params, in order to home in on the best value combinations. Well, maybe some other time.
     
    Wolf00007 likes this.
  13. MidnightGameDeveloper

    MidnightGameDeveloper

    Joined:
    Apr 26, 2014
    Posts:
    123
    Hi mbaske,
    that is a very interesting project. Is this still usable in current versions of ML-Agents (Release 19-20 of ML-Agents)?
     
  14. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Hi - I haven't used this for a while, and don't have Unity/ML-Agents set up right now. My guess is it should still work, can't make any promises though.
     
    MidnightGameDeveloper likes this.