Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Asking for advice and best practices for ml-agents and my own project

Discussion in 'ML-Agents' started by Xiromtz, Mar 22, 2020.

  1. Xiromtz

    Xiromtz

    Joined:
    Feb 1, 2015
    Posts:
    65
    Hey guys,

    I'm writing here since I feel a bit overwhelmed and dumbfounded at many aspects of machine learning and implementation specifics for an agent I am currently writing for my game in Unity.

    I've tried to find some information on the net, but its very hard to find specific implementation details and best practices when implementing agents for a game, since this seems to be a quite recent thing.
    I am not very fluent when it comes to tensorflow and neural network specifics, though I do understand and know the theory behind most of it, though I lack experience.

    So I have a few specific questions:

    1. Using visual observations and how Unity handles it

    I understand that Unity uses some type of CNN to manage image input and using the sensor component in Unity to connect render texture input to the mlagents-learn interface was easy enough. I do not know though, how this CNN translates to a state observation. I've seen a few implementations use a classifier to classify things such as doors, keys, etc. within their python implementation and use that for state input. My question is, can I just use the Unity implementation as is? Is it best practice to implement this yourself and create a classifier or something else? There's so much I don't understand and I wouldn't even know how to compare results.

    2. Action masking

    So according to RL theory with tabular methods and such, for each state we have an array of possible actions we can perform. I'm guessing due to the nature of function approximation and neural network inputs, in practice, we must use a fixed set of actions, continuous or not, and stay with it. For this purpose i'm guessing, action masking might come in handy. I haven't seen any literature on action masking outside of Unity ml-agents and I'm currently questioning if I should even use it. For my use-case the agent has contextual actions, such as a "pull lever" action, that should only be possible when close enough to a lever. Would I use action masking for this?

    3. Unity implementations of PPO and SAC

    Unity provides implementations of PPO and SAC using the mlagents-learn interface. My question is on the usability of these algorithms. Is it best practice to implement ones own interface in python or do Unity's implementations suffice? As I said, I'm not very fluent with tensorflow and writing my own algorithm would require me to learn the whole libary and I'm not sure if it's worth it.

    I hope someone can help me here. Thanks in advance.
     
  2. celion_unity

    celion_unity

    Unity Technologies

    Joined:
    Jun 12, 2019
    Posts:
    289
    You can use this as it is. It doesn't directly classify things, but you can think of it as learning visual features that affect the reward. If you wanted, you could also have an additional network trained as a classifier, and feed the results of this in as a vector observation (for example, whether or not there's a dog or cat on screen right now).

    Yes, the lever pulling is a good example. Note that action masking on works with discrete actions (the naming was changed in 0.15.0 to help make this clearer). I'm not sure if there's any literature on this; I'll follow up with some folks who would know better than I do.

    These are meant to be usable without any additional RL knowledge. It's possible to implement your own trainer, but this requires a lot of additional background, and is only recommended for RL researchers.
     
  3. celion_unity

    celion_unity

    Unity Technologies

    Joined:
    Jun 12, 2019
    Posts:
    289
    Regarding action masking, I'm told that the approach that ML Agents takes is based on DeepMind's Starcraft II paper:
    https://arxiv.org/abs/1708.04782

    Some additional commentary:
     
  4. Xiromtz

    Xiromtz

    Joined:
    Feb 1, 2015
    Posts:
    65
    @celion_unity Thanks for the great replies! I did implement action masking and it looks to be working as intended, since I was always using discrete actions.

    Another problem I'm currently having is with the use of SAC with my game, this also seems to be a problem with PPO, but less noticeable. I believe the gradient descent update for SAC occurs almost every timestep, while PPO batches more experiences together before updating? For me, SAC freezes up almost completely and then runs at an FPS of barely 10, while PPO runs fine until an update comes, where it also freezes for up to a whole minute.

    I'm guessing the problem here is simply my neural network size? I have a 96x64 image as input, this is the smallest I could achieve without the whole image being distorted beyond recognition. Additionally i have about 5 raycasts and some 2 float observations and 3 action branches.

    I have another question regarding memory and recurrent networks. I want my agent to be able to navigate through a rather big, procedurally generated, level with a few enemies and obstacles to avoid. Is the agent even able to do these kinds of navigational tasks in the order of thousands of steps? The algorithm for generating levels is also rather simple, where the goal is always on the right-hand side of every level and the agent spawns on the left-hand side.
     
  5. LexVolkov

    LexVolkov

    Joined:
    Sep 14, 2014
    Posts:
    62
    share your mask implementation. I have similar actions, but I thought about the mask only now. perhaps it will be more appropriate for me.

    perhaps in this case it is better to use the NavMeshComponents.
     
  6. Xiromtz

    Xiromtz

    Joined:
    Feb 1, 2015
    Posts:
    65
    @LexVolkov
    Do you mean I should move the agent control to more high level? So instead of moving like a player would via Up/down/left/right, the agent would set a point it would like to navigate towards, while my pathfinding algorithm does all the low level work?
    That might help in lowering the action space and the amount of observations required, though I still don't know if that would help in finding a path to the goal.
    In practice, the problem I'm currently having is that it is impossible to add the whole level as a single visual observation, since the game is made in pixel art and compressing an image to below 500x500 pixels or so will render all features impossible to distinguish (the current max level size has a pixel size of about 2048x2048).
    I've also tried a sort of "minimap" as input, where the agent receives a basic outline of the level layout as an image input for navigation purposes. I'm not sure if this really works though..
     
  7. Xiromtz

    Xiromtz

    Joined:
    Feb 1, 2015
    Posts:
    65
    @LexVolkov
    The action masking example I gave was not one I am currently using. I have a rather large amount of actions, where the agent can do stuff like interacting with stuff the player would normally be able to interact with and use items in its inventory. This means I use a lot of code from my base player implementation, making it very specific to my use-case.
    The documentation for action masking is very well made if you look at the github docs for Agent Design in the mlagents github. All you need to do is add "mask.Add(branch_num,action_num)" or so and the agent can't do that action anymore. Most of the work for action masking would be specific to your use-case.
     
  8. LexVolkov

    LexVolkov

    Joined:
    Sep 14, 2014
    Posts:
    62
    Yes, I saw the documentation.
    Then I have a question for you. Should I Mask Actions in OnActionReceived or CollectObservations?
     
  9. LexVolkov

    LexVolkov

    Joined:
    Sep 14, 2014
    Posts:
    62
    At your level it is impossible to using NavMesh? Then try to do the like NavMesh. Make waypoints for the bot to navigate.
    Or divide a difficult level to simpler Rooms.
    Or is there a problem in training?
     
  10. Xiromtz

    Xiromtz

    Joined:
    Feb 1, 2015
    Posts:
    65
    @LexVolkov If you update to v0.15, they added a new function "
    CollectDiscreteActionMasks(DiscreteActionMasker actionMasker)". In older versions, I added it in CollectObservations. Since the collected observations are input into the neural network -> neural network output is the parameter in OnActionReceived, masking there would be useless, since the neural network has already done calculations.
     
  11. Xiromtz

    Xiromtz

    Joined:
    Feb 1, 2015
    Posts:
    65
    @LexVolkov The levels are procedurally generated, there is no way to make static waypoints or split up the level into smaller rooms, since the path to the goal is always random and there are multiple possible paths. I also don't want to manipulate the original code too much, since everything should be able to work in an actual runtime of the game.