Search Unity

  1. Looking for a job or to hire someone for a project? Check out the re-opened job forums.
    Dismiss Notice
  2. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

When learning Match3, the award in the built-in Unity ML-Agents algorithm is different from the awar

Discussion in 'ML-Agents' started by i1baranov9, Apr 24, 2021.

  1. i1baranov9

    i1baranov9

    Joined:
    Mar 1, 2021
    Posts:
    3
  2. TreyK-47

    TreyK-47

    Unity Technologies

    Joined:
    Oct 22, 2019
    Posts:
    1,238
    I'll bounce this off the team for some guidance!
     
  3. i1baranov9

    i1baranov9

    Joined:
    Mar 1, 2021
    Posts:
    3
    Thank you! Waiting for an answer.
     
  4. celion_unity

    celion_unity

    Unity Technologies

    Joined:
    Jun 12, 2019
    Posts:
    286
    Hi,
    Are you using the example scene here? I think what's happening is
    1. gym has no concept of "masked" discrete actions. We use these a lot for the match3 integration, because many moves are not valid.
    2. The example scene ignores any moves that are not valid. The code for this is here. Since it ignores the move, no points are awarded.
    If you needed this to work in gym, you'd need to convert the action mask to an observation, and probably need to apply a penalty when trying to make an invalid move; the agent should eventually learn what the action mask observation means and start avoiding those moves, but it will be harder for it to learn.
     
  5. i1baranov9

    i1baranov9

    Joined:
    Mar 1, 2021
    Posts:
    3
    I didn't quite understand where exactly it is necessary to change the action mask.
    Should I just remove the return from the code and give the agent a negative reward instead?
     
  6. celion_unity

    celion_unity

    Unity Technologies

    Joined:
    Jun 12, 2019
    Posts:
    286
    You don't need to change the action mask, but I don't think there's any way for gym to use it, so you should provide it as an observation instead.

    Giving a negative reward instead will still be very hard to learn from, since the agent has to basically learn the rules of what's a valid move, instead of just being told what's valid or not.
     
unityunity