Search Unity

Self Play and Action mask problems on Turn Based Game

Discussion in 'ML-Agents' started by Nicolas-Vidal-Iscool, Feb 22, 2020.

  1. Nicolas-Vidal-Iscool

    Nicolas-Vidal-Iscool

    Joined:
    Mar 4, 2015
    Posts:
    6
    Hey there,

    I wanted to give another shot at Unity ML Agents for multi agent turn based game and to start simply, I tried to make a simple TicTactoe.

    With one Agent versus a random opponent controlled automatically by the environment (the random opponent plays immediately and synchronously after the Agent Action of the learner, I managed with SAC to get a near optimal policy. (I used the Decision Requester)

    But then I wanted to replace the Random agent by another learner agent as "Self Play" was recently introduced.

    For testing purposes I used a Random Heuristic for both agents, and with some manual calls to Request Decision I managed to have them play a bunch of games in Heuristic mode.

    Then I tried to let them play in Learn mode (but in the same team and without Self Play hyperparameters in the config), It kinda works (even if I had to sync the request Decision calls with Environment OnAgentStatus callback).

    Then I tried to enable self play by setting one of the agent to Team 1 (instead of 0) and setting some self play hyperparameters in config. And that's where I'm stuck. If I do this, one of the agents completely ignores the action mask and seems to play randomly. The other agent doesn't seem to learn anything, ELO reports is flat at 1200.

    I tried this on latest release branch and master branch.

    Any idea of what I should do to be able to make some tests with self play on turn based multi agent games ?

    Thanks a lot.

    Cheers.

    Nicolas
     
  2. TreyK-47

    TreyK-47

    Unity Technologies

    Joined:
    Oct 22, 2019
    Posts:
    1,822
    Hey there @Nicolas-Vidal-Iscool - we'll circulate this for the team to review. Could you tell us which C# and Python versions you're running?
     
  3. p2b

    p2b

    Joined:
    Jan 29, 2018
    Posts:
    2
    Hi @Nicolas-Vidal-Iscool, I'm working on a similar project (as a matter of fact, it's the exact same idea) and I faced the same issue as you did, sadly I do not have a solution to the same and am looking for one myself. I was working with ml-agents v13.1 and in a similar fashion one of the agents would completely ignore the mask and act randomly, I few Debug statements revealed that the masks were being sent to the python API correctly for only one agent.
    I switched to the v14.0 latest-release today and tried the same, but now I am facing a new issue, the first agent collects observations and makes a move, but when it's the turn of the second agent, it collects observations and then stops indefinitely. It seems as if it does not receive the actions from the python API from this point forward for some reason.

    I would really appreciate some help and guidance regarding the issue.
    The project can be found here: https://github.com/Prabhav2B/Tac-Tic
     
    Last edited: Feb 27, 2020
  4. Nicolas-Vidal-Iscool

    Nicolas-Vidal-Iscool

    Joined:
    Mar 4, 2015
    Posts:
    6
    Hi @p2b and @TreyK-47 indeed, with 0.14 Python and C# the second agent was stopping indefinitely. Then I switched to the master version of C# and Python (commit ff3c5e013ee0b585b1a7713baecaa46baa9a7da4) I ran into the issue I'm describing (mask is ignored).

    I will try to upload the issue on a github tomorrow.

    Thank you all for looking into this.
     
  5. p2b

    p2b

    Joined:
    Jan 29, 2018
    Posts:
    2
    Hey @TreyK-47, any updates regarding this issue?
     
    bobchalmers likes this.
  6. bobchalmers

    bobchalmers

    Joined:
    Jun 2, 2013
    Posts:
    6
    Unity ml-agents version 1.2, Python 0.18; Basically same issue. Self-play with correct fields in the config .yaml file, two teams, turn-based game so I don't really want the decision requester. Only team 0 acts. No replies to RequestDecision come back for the second player (team 1).
     
  7. ScrubRasta

    ScrubRasta

    Joined:
    Jun 18, 2016
    Posts:
    7
    @TreyK-47 Any updates on this? Running into the same issues. Actionmask is ignored.
     
  8. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi, I've reached out to our research team to see if we can diagnose and solve this.
     
  9. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    also, are you sure that that the agent on gameobject 2 is actually writing to the action mask?
     
  10. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    we did some internal testing and found that action masks on agents in self play were going through
     
  11. R0b_g

    R0b_g

    Joined:
    Jul 14, 2019
    Posts:
    3
    Hi there, I bet this is too late and might not even work for you but I sort of had a similar problem and setting
    threaded: false
    in the .yaml/configuration file worked for me