Search Unity

  1. Looking for a job or to hire someone for a project? Check out the re-opened job forums.
    Dismiss Notice
  2. Unity 2020 LTS & Unity 2021.1 have been released.
    Dismiss Notice
  3. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

Need moba example

Discussion in 'ML-Agents' started by pinsir, Apr 2, 2021.

  1. pinsir

    pinsir

    Joined:
    Dec 15, 2020
    Posts:
    2
    Hi, I am studying the AI for MoBa. Although I have learned the example of ML, I feel that it's difficult to try to achieve it myself. I hope that there are similar official examples for reference. Maybe 2v2 with normal attack and skills damage(have cool down time) in a ground.
    I'll be very appreciate for helping though It's kind of embarrassing. >_<|||
     
  2. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    77
    Hi,

    Your question comes at a perfect time. We just released support for cooperative multi-agent training in our latest release(Release 15) along with several example environments.
    Although we don't have examples that's exactly the MoBa games you described, the elements are all there in different examples.The Dungeon Escape has a two-agent team that's attacking a dragon, and there's 2v2 soccer who can play against each other. These are both good references for you to start on your own game.
     
    mamaorha likes this.
  3. mamaorha

    mamaorha

    Joined:
    Jun 16, 2015
    Posts:
    25
    I wonder about reward cycle tho, currently yhe reward is linked to the last action while in moba usually single action doesnt reward you but a set of actions does.

    + how do u suggest enforcing ai to try and win instead of farm? (Killing mobs gives rewards, if u finish a game faster it usually better but ai will try to faem and win later to get higher rewards).

    I might be completley wrong with what i stated above its just what i understood from the ml framework
     
  4. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    77
    > reward is linked to the last action while in moba usually single action doesnt reward you but a set of actions does.
    This is kind of the credit assignment problem. The reward is given at a particular step while that step is not the sole reason for getting the reward. Although the reward is not directly linked to the previous few steps that might also contribute to the reward, the agent could still learn that taking an action could lead to a next state with better chance of getting reward and therefore learn to perform a sequence of good action. It's a more difficult problem for the agent to learn though.

    > how do u suggest enforcing ai to try and win instead of farm
    The intermediate reward should be much smaller than the reward for winning the game. Sometimes you might also want a small time penalty for each step to encourage the agent finish the game faster. However adding time penalty could be tricky sometimes. If the winning condition is really hard and the agent might never get to that by exploring randomly, meanwhile the agent keeps getting time penalties every step, the agent might learn to end the game by killing itself right away since it doesn't think there's a way to get positive reward. You might need to bootstrap the training or make it easier at the beginning so that it can learn how to win.
     
  5. mamaorha

    mamaorha

    Joined:
    Jun 16, 2015
    Posts:
    25
    Thanks for the detailed response, 2 followup questions:
    1. are there any plans to add a feature to dynamically stack "actions" based on 1 observation?
    2. the 2nd part is indeed tricky as he rather loose quickly than get to the "win stage"
     
  6. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    77
    > are there any plans to add a feature to dynamically stack "actions" based on 1 observation?

    The whole reinforcement learning formulation models the problem as collections of <observations, actions, rewards>, so having multiple sequential actions in one step will be a harder problem. I'd say it's not in our road map in the near future as there's no known effective way of doing this.
     
unityunity