Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Question Self-Play Win Condition

Discussion in 'ML-Agents' started by GamerLordMat, Dec 27, 2022.

  1. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    180
    Hello all,

    I do not understand how to set up Self-play rewards. I am designing a boxing game, every hit should give points. But having only positive rewards leads to consistent increase in ELO, I wont mess with negative rewards.
    So how I just say: if rewards of agent1 > of agent2 agent, a1 wins? ELO should have nothing to do with rewards.
     
  2. smallg2023

    smallg2023

    Joined:
    Sep 2, 2018
    Posts:
    107
    Self-play in mlagents treats a reward of 1 as a win, so you don't need to worry about rewarding hits etc and give a single reward at the end of the match is fine
     
  3. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    180
    Thanks for the answer, that works I know, but it doesnt seem optimal. I know what kind of behaviour I want. When I made my soccer game, you just cant give point for anything bc the goal is to score, not to touch the ball etc. But here I want to hit them as hard and as many times as possible.

    Giving both just positive points ends up ELO saying both won.

    . It really depends what you want to achieve. Giving points for hits makes the Agent learn faster and he does what you want (hitting). Giving points at the end it optimizes for longterm getting points, it often produces more elaborate results
    I end up using poca with only agent. So I set the agents score to something and when he wins he gets one point. Let see if it will work
     
  4. NanushTol

    NanushTol

    Joined:
    Jan 9, 2018
    Posts:
    121
    what is the "punch" reward value you are giving?
    you could try initializing your agent with a simple ppo curriculum to learn the basic rules of the game, and then move on to a different setup (self play) with an already trained model that knows the very basic to "optimize" for competitive behavior
     
    hughperkins likes this.
  5. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    180
    Hello,
    thanks for your idea! Problem is the goal is to punch the other player (I use relative Velocity). It works also, they have the goal to hit each other after 5 hours of training. The hits give about 0.1 Reward at average, with them being symetric (same amount of minus points for getting hit).

    At the end of the timer, I look who has scored more points (the rewards without the minus points) and add him 1 more point. So the points are between 1 and 2. I dont now if that is okay to do.

    Now they train, but the ELO doesnt improve.
     
  6. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191
  7. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    180
    Hello, Thanks for the asnwer, I think that I should start with this, you guys are right. But it is unfortunate, bc I exactly know what is worth points. In their soccer example they used smaller winning rewards to reward faster wins;
    I will start with one hit and then maybe try it with: waiting for one to get enough MyPoints and then end the episode with a win or lose.
     
  8. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191
    Seems like you need to at least make sure the loser gets negative reward. You're saying they both get between 1 and 2 right now right?