Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice

Question Reward structure for an agent that uses objects to refill its needs (Sims agent)

Discussion in 'ML-Agents' started by sunnyCallum, Apr 4, 2024.

  1. sunnyCallum

    sunnyCallum

    Joined:
    Nov 6, 2021
    Posts:
    8
    Hello there,

    For my University final year project, I am using the Unity ML-Agents package to train an agent (in my scenario, a cat) with needs (like Sims characters, such as hunger, thirst, etc) to use objects within its environment to keep its needs high. The agent makes discrete decisions such as drinking from a water bowl or eating from a bag of food and also makes continuous decisions on navigating around its environment.

    I was wondering if anyone has any tips on creating an optimal reward structure for this scenario. Currently, I am rewarding the agent based on the level of its needs (punishment for letting needs drop too low, rewards for its needs being high). Below is an example of the rewards an agent can gain from its health need.

    Code (CSharp):
    1. private float HealthReward(float health)
    2.     {
    3.         float reward = 0f;
    4.         // Punish the agent for having low health
    5.         if (agentThirst <= 0 || agentHunger <= 0 || agentFun <= 0)
    6.         {
    7.             if (agentHealth > 0)
    8.             {
    9.                 AddReward(-0.2f);
    10.             }
    11.         }
    12.         // Reward the agent for having full health
    13.         if (health >= maxValue)
    14.         {
    15.             reward += 0.1f;
    16.         }
    17.         // Punish the agent if health is critically low
    18.         if (health <= criticalValueNormalised)
    19.         {
    20.             reward -= 0.1f;
    21.         }
    22.        
    23.         return reward;
    24.     }
    In addition to this, I am providing it with rewards or punishments based on its distance to the target object. When the agent selects an action (e.g. drink water), if it has moved closer to the bowl since the last step it will receive a reward, otherwise it will receive a punishment for moving further away.

    Code (CSharp):
    1. private float GuideAgent(Vector2 currentAgentPosition, Vector2 targetPosition, Vector2 previousAgentPosition)
    2.     {
    3.         float currentDistanceToTarget = Vector2.Distance(currentAgentPosition, targetPosition);
    4.         float previousDistanceToTarget = Vector2.Distance(previousAgentPosition, targetPosition);
    5.  
    6.         float distanceChange = previousDistanceToTarget - currentDistanceToTarget;
    7.  
    8.         return distanceChange / 1000;
    9.     }
    As of now, at the start of training an agent will run off into the distance never to be seen again and subsequently die. After some time it begins to hover around the objects it has to use to regenerate its needs, but then reverts back to frantically running away. My assumption is that the reward structure for guiding the agent towards its target isn't correct, though if anyone has any other suggestions, I would appreciate it.

    Many thanks,
    SunnyCallum