Search Unity

Question How to train MARL for more complex tasks

Discussion in 'ML-Agents' started by Jinwoobeen, Jun 26, 2023.

  1. Jinwoobeen


    Nov 1, 2022
    Hello, I am a student studying multi-agent.
    During the multi-agent learning process, I have a question that I would like to ask.

    The experiment is conducted in a 3 vs 3 naval battle environment, and the yaml file is as follows.

    During this learning process, the agent seemed to perform well for fairly simple movements, but performed poorly on relatively complex tasks. So I experimented with deepening the network or increasing the number of units per layer. However, the performance was not much better.

    Wouldn't it be worthwhile to make the network bigger to get good performance on both simple and complex tasks?
    If it doesn't work, which methods can I use to make multi-agent training work well for more complex tasks?
    If it works, how should I change the network configuration and experiment? Currently, we experiment with increasing the number of layers by one or doubling the number of units per layer.

    Thanks for reading.
  2. Energymover


    Mar 28, 2023
    I venture to guess it isn't the network size. Network size would be related to the number of observations and detail of actions. First things that come to my mind is training time, 30 million isn't that much, though that is relative and it may be depending on complexity of task. That might be visible if the tensor board is showing that Entropy is low and it has converged on a maximum rewards. For more complex task I would guess that it is either "lack of observations" to detect the chance of an improved reward, or a "lack of reward" to promote the behavior's.

    Example: A grid with varying sized boxes and the agent gets rewards based on collecting boxes.
    • You need to "observe the box size" or the agent would not know that the reward isn't random and is based on box size.
    • You need to "reward based on box size" otherwise the agent would see the box sizes, but wouldn't care because it either gets the same reward for all box sizes and only go after the closest box, or no reward for boxes and not search for boxes at all because it got no reward.
    • To get an even better reward you might add in the "distance to box", now the agent could factor in the distance to box and decide if a larger box further away is ACTUALLY less reward cause it could have gotten two smaller boxes in a shorter distance traveled that totaled more reward.
    Side note, I use 3 layers and 512 nodes to train a NASCAR style racing game with 16 cars doing 3 laps around a 2 minute track and have been training it for over two weeks now. It's still learning tricks with drafting and wrecking.

    Sorry it's not a solution, but maybe some things to think about.
    Jinwoobeen likes this.