Question Is there a general sequence in which the configurations should be tweaked?

mcdenyer · Jan 16, 2022

I am curious if anyone has thoughts on the general order of tweaking configurations?

For example: First keep all value default and try different values of batch_size. Once you have seem to have identified the optimal batch_size begin tweaking buffer_size....once that is optimized begin tweaking.... etc.

mbaske · Jan 16, 2022

I don't think there is any preferred order for doing manual config optimization. Some of the hyperparameters are interdependent, so unfortunately, it's hard to tweak them in isolation. I usually stick very close to the example project configurations that I feel best match what I'm trying to do. Then adjust the buffer size a bit depending on how many agents I'm training in parallel. If I'm only using PPO, I set time_horizon, num_layers and hidden_units, and that's about it.
Besides doing this manually, there are a couple of systematic approaches for finding hyperparameters, namely grid search, random search and bayesian optimization (https://towardsdatascience.com/a-co...ptimization-for-machine-learning-b8172278050f).
If you have a lot of time on your hands, you can try running grid searches with my python script for ml-agents:
https://github.com/mbaske/ml-agents-hyperparams

mcdenyer · Jan 16, 2022

Interesting, thanks for the input.

How are you calculating your buffer_size based on number of agents training in parallel?

I'm actually pretty happy with how quickly the ancients are learning but they hit a wall with some problems that they cannot get over so I possibly shouldn't be using the term 'optimal'.

mbaske · Jan 17, 2022

mcdenyer said: ↑

How are you calculating your buffer_size based on number of agents training in parallel?
Click to expand...

No fixed rule really, I just tend to increase the buffer_size if there are way more agents in my project than in the examples, or if I run multiple environments. Don't take my word on this though - I think I read something along those lines once, but haven't really checked if it makes that much of a difference.

mcdenyer · Jan 17, 2022

mbaske said: ↑

No fixed rule really, I just tend to increase the buffer_size if there are way more agents in my project than in the examples, or if I run multiple environments. Don't take my word on this though - I think I read something along those lines once, but haven't really checked if it makes that much of a difference.
Click to expand...

And what guides your of time_horizon setting?

When I originally looked at time_horizon i was excited because I need my agents to 'understand' that what they do at the start of the episode will effect them(position and velocity) at later stages in the level. I thought time_horizon made them basically look at a larger sequence of actions but upon posting questions about time_horizon here that does not seem to be the case.

Below is a gif of the obstacle that is the most advanced obstacle that i have not been able to get an agent to complete. In my game you have two fixed angle grappling hooks that can be fired at the colored rings which allows you to swing(while the grapple retracts). The tricky part about this game for (humans and ml-agents) is that your previous actions will determine your subsequent velocity and position vectors. In this particular case the agent needs to let it self fall a little longer before grappling to the first circle in order to allow itself to have an angle of attack when bouncing off the diagonal wall to get over the the tall hazard. So the agent is progressing through the earlier points and going through the checkpoints however the agents needs to go through these earlier checkpoints in a way that will allow them to have success in the future. Basically I need the agents to not be shortsighted about reward seeking. I need them try and alter the ways they pass through earlier checkpoints even though they are getting through those checkpoints and getting rewarded...if that makes sense.

mbaske · Jan 19, 2022

mcdenyer said: ↑

I thought time_horizon made them basically look at a larger sequence of actions but upon posting questions about time_horizon here that does not seem to be the case.
Click to expand...

Yes, that's how I used to think of time_horizon as well. There's a bit more explanation here:
https://forum.unity.com/threads/in-...e-horizon-hyperparameter.818169/#post-5444430

ChillX · Jan 20, 2022

Usually I get better results where I use complete time horizon. meaning Time Horizon matched to Max Steps for the agent. Keyword usually.

mbaske · Jan 27, 2022

mcdenyer said: ↑

How are you calculating your buffer_size based on number of agents training in parallel?
Click to expand...

BTW, just found this in the docs:

"Buffer Size - If you are having trouble getting an agent to train, even with multiple concurrent Unity instances, you could increase buffer_size in the trainer config file. A common practice is to multiply buffer_size by num-envs."

https://github.com/Unity-Technologi....md#training-using-concurrent-unity-instances

ChillX · Jan 29, 2022

I use buffer sizes from 5K all the way to 250K training on 10 environment instances.
However when the task is too complex for an agent to figure out I've found that the only option is to either manually record howto (using recorder) do it and then feed that to the agent via GAIL
Or to use the Curiosity Module.
or to do both Gail and Curiosity.

When using Gail I do one run with Gail Enabled. Then do a second run Initialize-From that first run with Gail disabled.

Also LSTM does help with complex path finding tasks.

However I have been having lots of issues trying to get Curiosity working when the built-in sensor (vector sensor) on the agent has stacking enabled. With Stacking disabled on the built-in sensor Curiosity works fine. Also Curiosity works really well with the GridSensor with or without stacking. But it does not do so well with the RayCast sensor.

Search Unity

Unity ID

Useful Searches

Question Is there a general sequence in which the configurations should be tweaked?