Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Reward setting in POCA algorithm

Discussion in 'ML-Agents' started by Dream_Surpass, Jun 5, 2023.

  1. Dream_Surpass

    Dream_Surpass

    Joined:
    Dec 2, 2022
    Posts:
    18
    I have created one 2v1 env and tried to use POCA to train agents(Just like Soccer Example env). But I found that the policy did not convere in the direction I wanted it to.

    So I want to know how to deisgn rewards properly in POCA. I use `AddReward` in process to guide the single agent, and use `SetGroupReward` when one side win or lose. Is it proper to use the same `AddReward` method as which in 1v1 env?

    Thanks for any ideas.
     
  2. Dream_Surpass

    Dream_Surpass

    Joined:
    Dec 2, 2022
    Posts:
    18
    Which reward should I focus on? Cumulative reward or the group reward?

    During Training the cumulative rewards keep going up to zero, and the group reward curve declines.And finally the policy didn't converge to the optimal.(Agents in team suicide rapidly).How could this happen?

    upload_2023-6-8_11-4-3.png
    upload_2023-6-8_11-7-51.png
     
  3. kokimitsunami

    kokimitsunami

    Joined:
    Sep 2, 2021
    Posts:
    25
    This page says that "A positive group reward indicates the whole group's accomplishments or desired behaviors. Every agent in the group will receive the same group reward no matter whether the agent's act directly leads to the reward. Group rewards are meant to reinforce agents to act in the group's best interest instead of individual ones. Group rewards are treated differently than individual agent rewards during training, so calling AddGroupReward() is not equivalent to calling agent.AddReward() on each agent in the group."

    In my multi-agent env, I use `AddReward` for each agent, and use `AddGroupReward` for each agent group. I used the single agent rewards to encourage the desired behavior, but basically refer to the group reward as metric because it correspond to the final outcome.

    I would check if the reward is expectedly provide, if I should use AddGroupReward instead of SetGroupReward. Also, I found that when an agent was destroyed during an episode, the agent was removed from the group when the episode ends. So I had to add it back to the group after each episode.

    Hope this helps.
     
  4. Dream_Surpass

    Dream_Surpass

    Joined:
    Dec 2, 2022
    Posts:
    18
    Thanks for your reply.

    Well I have seen the official doc and understood 'Group rewards are meant to reinforce agents to act in the group's best interest instead of individual ones. Group rewards are treated differently than individual agent rewards during training, so calling AddGroupReward() is not equivalent to calling agent.AddReward() on each agent in the group'.

    But after training my agent converged to suicide behavior. I will check my env rewards later. So which curve should I pay attention to if I use both 'AddReward' and 'AddGroupReward'?
     
  5. kokimitsunami

    kokimitsunami

    Joined:
    Sep 2, 2021
    Posts:
    25
    The both rewards are important to understand agents' behavior, but I would pay attention to the group reward more because it's directly tied to the win or lose of the team.