Search Unity

  1. Get the latest news, tutorials and offers directly to your inbox with our newsletters. Sign up now.
    Dismiss Notice

Best way to design a player for trained agents to play with?

Discussion in 'ML-Agents' started by Alca-Lynn, Apr 12, 2021.

  1. Alca-Lynn


    Sep 8, 2020

    I am doing a project about players playing against each other, like a death match. I have trained agents to play against each other using self play and now I have to implement the actual player. Before i do this i want to ask some question on what is the best way to implement a player.

    Is it ok to implement a player just like other games that just uses stuff in mono behavior or better use methods in Agent class like using heuristic method to control the player? I am not quite sure if adding a human player into the game will make difference on how the agents behave?

    Probably the question becomes: Dose reward function still matters in trained networks?

    The description maybe confusing, but i hope you do understand me. Feel free to ask if i haven't make things clear.

    Any advise will help. Thanks a lot. :)
  2. andrewcoh_unity


    Unity Technologies

    Sep 5, 2019
    Hi @Alca-Lynn

    Once the agent has been trained and the network is run in inference, the reward function no longer matters. This is because the agent's policy is no longer being updated, just executed.

    In self-play environments, the agent adjusts its behavior based on what the opponent is doing in order to give itself the. greatest chance of winning. From that perspective, yes, it will make a difference in how the agent behaves since it will be reacting to the human player. It may be possible for the human to find ways of fooling the agent though and if this is the case, you can possible retrain with a larger window size so that the agent will be forced to be more 'opponent-agnostic'.

    That being said, it's totally fine to implement your player with the Heuristic function of ML-Agents. It won't affect the learned policy in anyway and it will likely be convenient because all the logic is already in place.