Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Question MLagent Warehouse robot not learning at all

Discussion in 'ML-Agents' started by Braeze, Dec 6, 2022.

  1. Braeze

    Braeze

    Joined:
    Mar 4, 2019
    Posts:
    2
    Hello, we're having trouble with our environment and have tried everything we could think of.

    Environment:
    The agent should pick up a package (by touching one of the coloured racks) and then deliver the package to the corresponding port at the end of the building. Problem is that the agent just doesn't seem to learn anything! We started with pure PPO, but have added imitation learning, as we thought it was maybe too complex. So far we have tried tuning different parameters and changing strength of GAIL (and adding behaviour cloning)

    Rewards:
    The agent gets a reward of 0.5 for picking up a package and then 0.5 for delivering the package. It can get a negative value of -1 based on how many steps it takes and we have a max step size of 5000

    Vision/Observations:
    Our observations for the agent itself include a Boolean if it picked up a package, an integer which equal the value of the package it picked up(Defaults to -1 if holding no package) and lastly the agent transform(sensor.AddObservation(transform.InverseTransformDirection(m_AgentRb.velocity))
    Then there are the objects it needs to interact with. We have tried a lot of different ways. Firstly by giving the agent the target location of packages and ports. Then by giving the distance between agent and package. Now we have switched to using raycast/RayPerceptionSensor. One sensor is detecting walls, and another is detecting racks (with packages) and ports. They are working on different layers so it can see through the racks.
    Github to the agent script: https://github.com/WeAreVR/UnityRL/blob/main/Assets/Warehouse/Scripts/AgentMover.cs
    Latest config and graph be seen below
    Screenshot 2022-12-06 155048.png image.png
     
    Last edited: Dec 6, 2022
    seifmostafa7347 likes this.
  2. Braeze

    Braeze

    Joined:
    Mar 4, 2019
    Posts:
    2
    After a week of trying different things i figured out that the problem was config, it is learning now. Changed a few things but curiosity had the biggest impact
     
    GamerLordMat likes this.
  3. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    177
    Could you elaborate, I am curious (pun intended)?
    so first off, your big negative reward just hinders it to learn it seems. If anything it should get point for holding it bc that is the goal right(Holding it and put it in place)? I would give it reward = 1 - (timeItNeedeed/MaxSteps) as reward when coming to the goal combined with curiosity. . Also ask your self if you need ML-Agents for that simple problems. I know you just learn but try to pick something you have no chance to program (like moving ragdolls, simpler AI enemies, anything with physics works out fine )
     
  4. ice_creamer

    ice_creamer

    Joined:
    Jul 28, 2022
    Posts:
    33
    hi,i train my agent to arrive some position, two action(forward and rotation,continue). In Heuristic mode, it can move according to my control,but when i start to train it, it continue rotation and not to go.what should i do? Thank you!
     
  5. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    177
    hey,
    Seems like a bug in OnActionReceive?
     
  6. ice_creamer

    ice_creamer

    Joined:
    Jul 28, 2022
    Posts:
    33
    Hmm...during the training, there is a error:

    ArgumentException: NaN increment passed to AddReward.
    Unity.MLAgents.Utilities.DebugCheckNanAndInfinity (System.Single value, System.String valueCategory, System.String caller) (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Utilities.cs:58)
    Unity.MLAgents.Agent.AddReward (System.Single increment) (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Agent.cs:729)
    PurseAgent.MoveAgent (Unity.MLAgents.Actuators.ActionBuffers actionBuffers) (at Assets/Cooperation/Scripts/PurseAgent.cs:283)
    PurseAgent.OnActionReceived (Unity.MLAgents.Actuators.ActionBuffers actions) (at Assets/Cooperation/Scripts/PurseAgent.cs:239)
    Unity.MLAgents.Actuators.VectorActuator.OnActionReceived (Unity.MLAgents.Actuators.ActionBuffers actionBuffers) (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Actuators/VectorActuator.cs:76)
    Unity.MLAgents.Actuators.ActuatorManager.ExecuteActions () (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Actuators/ActuatorManager.cs:295)
    Unity.MLAgents.Agent.AgentStep () (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Agent.cs:1344)
    Unity.MLAgents.Academy.EnvironmentStep () (at
    D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Academy.cs:589)
    Unity.MLAgents.AcademyFixedUpdateStepper.FixedUpdate () (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Academy.cs:43)

    In scripts.I debug.log if reward ==infinity,but no message appear. Whether it's related with my setting of .yaml?
    my action part as follow:
    Code (CSharp):
    1. var forwardGo = Vector3.zero;
    2.         var rotationGo = Vector3.zero;
    3.  
    4.         var continueAction = actionBuffers.ContinuousActions;
    5.         var a1 = Mathf.Clamp(continueAction[0], 0, 1);
    6.         var a2 = Mathf.Clamp(continueAction[1], -1, 1);
    7.      
    8.       forwardGo = transform.InverseTransformVector(transform.forward)*a1;
    9.        
    10.        rotationGo = transform.InverseTransformVector(transform.up) * a2;
    11.      
    12.         forwardForce = forwardGo * m_Setting.agentSpeed;
    13.      
    14.          rotationTorque = rotationGo * m_Setting.agentAngularSpeed;
    15.      
    16.         m_AgentRb.AddRelativeForce(forwardForce);
    17.        
    18.         m_AgentRb.AddRelativeTorque(rotationTorque );
    19.        
    20.         while (m_AgentRb.velocity.z>=10)
    21.         {
    22.          
    23.             m_AgentRb.velocity *= 0.85f;
    24.         }
    25.    
    26.         while(m_AgentRb.angularVelocity.y>=0.9f)
    27.         {
    28.            
    29.             m_AgentRb.angularVelocity *= 0.95f;
    30.         }
    31.  
    32.      
    33.         //dense reward
    34.         DisReward();
    35.         AngleReward();
    36.         float rewardSingle = 0.7f * Rd + 0.3f * Rthet;
    37.         AddReward(rewardSingle);
    38.      
    39.         if (rewardSingle == Mathf.Infinity)
    40.         {
    41.             Debug.Log("rewardSingle:" + rewardSingle);
    42.         }
    43.     }
     
  7. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    177
    @ice_creamer NaN can be also caused by deviding by zero etc;
    your speed contoll shouldnt work bc rb.velocity is in world coords, so testing for z speed only works for 1D cases.

    I did not knew that you have your reward function in your onActionReceive, your reward seems to be broken.
     
    Last edited: Jun 11, 2023
  8. ice_creamer

    ice_creamer

    Joined:
    Jul 28, 2022
    Posts:
    33
    For the problem of Nan,I check my reward in Console inspector in order to whether dividing zero.it's none.This errror occured intermittently.
    About my reward,it as follows(single:distance,angle, group:collision,boundary,achieve,Decay penalty):
    Code (CSharp):
    1. void DisReward()
    2.     {
    3.  
    4.         float sum = 0;
    5.         sum = Mathf.Pow(distanceBetween-range- dsum/3, 2) + Mathf.Pow(distanceU1-range - dsum/3, 2) + Mathf.Pow(distanceU2-range- dsum/3, 2);
    6.        
    7.         var meanSqr = Mathf.Pow(sum / 3, 0.5f);
    8.  
    9.         Rd = -(0.05f * (agentTt.magnitude - range) + 0.2f * Mathf.Exp((agentTt.magnitude - dsum / 3) / meanSqr)-1);
    10. //close target and harmonization among three agent
    11.  
    12.     }
    13.  
    14. void AngleReward()
    15.     {
    16.         var Rthet1 = Mathf.Exp(-Mathf.Abs(Mathf.Acos(adAngle[0]) -( 2 * Mathf.PI / 3))) + Mathf.Exp(-Mathf.Abs(Mathf.Acos(adAngle[1]) - 2 *( Mathf.PI / 3))) - 2;
    17.         var Rthet2 = Mathf.Exp(-(Mathf.Abs(Mathf.Acos(adAngle[0])) - Mathf.Abs(Mathf.Acos(adAngle[1])))) - 1;
    18.         Rthet = 0.3f * Rthet1 + 0.1f * Rthet2;
    19.   // agents round target up in a certain angle    
    20.        
    21.     }
    22.  
     
  9. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    177
    I am sorry but I cant spot a bug. Often it are little things that break the code like you forgot to set one object in the inspector or something like this.