Search Unity

Fewer Observations (0) made than vector observation size

Discussion in 'ML-Agents' started by 0rigin93, Apr 28, 2020.

  1. 0rigin93

    0rigin93

    Joined:
    Dec 11, 2016
    Posts:
    8
    Hi! I am currently trying to train an agent to play my tower defense game with ml-agents-0.14.0. The agent should observe 10 different values:
    1. His own health (float)
    2. The enemy's health (float)
    3. His gold (float)
    4. The lane the agent will spawn towers or units on (float)
    5. Amount of enemy units on enemy lane0 (typecasted float)
    6. Amount of enemy units on enemy lane1 (typecasted float)
    7. Amount of enemy units on enemy lane2 (typecasted float)
    8. Amount of towers on enemy lane 0 (typecasted float)
    9. Amount of towers on enemy lane 1 (typecasted float)
    10. Amount of towers on enemy lane 2 (typecasted float)
    Yet, I always get the message "Fewer Observations (0) made than vector observation size (10). The observations will be padded". When I double click the Console Log, I get to a line in my TrainingEnvironment script in which agent.Done() is called. If start the game without having called the --train command in anaconda (mlagents-learn config/trainer_config.yaml --curriculum=config/curricula/TDLearningCurriculum.yaml --run-id), I get no message.

    Any help would be very appreciated. Thanks a lot!

    upload_2020-4-28_16-53-17.png

    upload_2020-4-28_16-56-19.png
     
  2. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    Hi, Can you give us the full callstack? This will help us understand where the message comes from. If the message only appears once, I would not be too worried, it could be the first observation is not ready at the beginning of the simulation and the training would not suffer too much of this.
     
  3. 0rigin93

    0rigin93

    Joined:
    Dec 11, 2016
    Posts:
    8
    Hey thanks for the fast reply! Do you mean this?
    Code (CSharp):
    1. Fewer observations (0) made than vector observation size (10). The observations will be padded.
    2. UnityEngine.Debug:LogWarningFormat(String, Object[])
    3. MLAgents.Sensor.VectorSensor:Write(WriteAdapter) (at C:/Users/den-n/Desktop/ml-agents-0.14.0/com.unity.ml-agents/Runtime/Sensor/VectorSensor.cs:41)
    4. MLAgents.GrpcExtensions:GetObservationProto(ISensor, WriteAdapter) (at C:/Users/den-n/Desktop/ml-agents-0.14.0/com.unity.ml-agents/Runtime/Grpc/GrpcExtensions.cs:217)
    5. MLAgents.RpcCommunicator:PutObservations(String, AgentInfo, List`1) (at C:/Users/den-n/Desktop/ml-agents-0.14.0/com.unity.ml-agents/Runtime/Grpc/RpcCommunicator.cs:256)
    6. MLAgents.RemotePolicy:RequestDecision(AgentInfo, List`1) (at C:/Users/den-n/Desktop/ml-agents-0.14.0/com.unity.ml-agents/Runtime/Policy/RemotePolicy.cs:34)
    7. MLAgents.Agent:NotifyAgentDone(Boolean) (at C:/Users/den-n/Desktop/ml-agents-0.14.0/com.unity.ml-agents/Runtime/Agent.cs:279)
    8. MLAgents.Agent:Done() (at C:/Users/den-n/Desktop/ml-agents-0.14.0/com.unity.ml-agents/Runtime/Agent.cs:377)
    9. TrainingEnvironment:ResetTrainingEnvironment() (at Assets/Scripts/ML/TrainingEnvironment.cs:161)
    10. MLAgents.Academy:EnvironmentReset() (at C:/Users/den-n/Desktop/ml-agents-0.14.0/com.unity.ml-agents/Runtime/Academy.cs:415)
    11. MLAgents.Academy:ForcedFullReset() (at C:/Users/den-n/Desktop/ml-agents-0.14.0/com.unity.ml-agents/Runtime/Academy.cs:370)
    12. MLAgents.Academy:EnvironmentStep() (at C:/Users/den-n/Desktop/ml-agents-0.14.0/com.unity.ml-agents/Runtime/Academy.cs:383)
    13. MLAgents.AcademyFixedUpdateStepper:FixedUpdate() (at C:/Users/den-n/Desktop/ml-agents-0.14.0/com.unity.ml-agents/Runtime/Academy.cs:32)
    14.  
    On another note: So far I haven't found an answer to this question, maybe you could help me. Do you think it is possible to adapt the Reward Function for each stage/threshhold of the curriculum training? In my case, this could for example mean that in Stage1 of the curriculum training, the agent only gets rewarded for being defensive (defending one particular lane), to learn to protect his own base. In Stage3 etc. the agent will also be rewarded for being offensive and winning matches while also defending his base.

    Unfortunately, I haven't made much progress in training my agent so far, because the environment/game has some complexity to it.

    Thanks a lot!
     
    Last edited: Apr 28, 2020
  4. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    Hi,

    From the logs, it seems the warning comes from the first time the agent resets. I would not worry too much about it.
    Adapting rewards is hard because the critic might be confused if the reward system changes. When using curriculum, you should try to make the task easier, not the reward higher.
     
  5. 0rigin93

    0rigin93

    Joined:
    Dec 11, 2016
    Posts:
    8
    Hey thanks for the answer!

    If I shouldn't adapt the reward function, is it possible to adapt the vector actions? So that the agent can not do stuff that he shouldnt do at this stage of the training. Unfortunately, I can't find anything in the docs that might be so niche. The reason for that is that I want to train the agent to be defensive first. A hacky solution I came up was to create cheat tower for its enemy - but that just meant he would waste a lot of actions and gold etc. to send units that would inevitable die to those cheat towers. Therefore I cut his actions in half so that he could only spawn and upgrade tower. Would it later, if he mastered his defense, be possible to adapt the actions to include spawning units as well ? Or would that confuse the hell out of him?

    I have another problem concerning the --load of training atm. I trained for about 1 million steps and wanted to resume the training afterwards with

    mlagents-learn config/sac_trainer_config.yaml --curriculum=config/curricula/TDLearningCurSac.yaml --run-id TDAgent14_CurSac --train --load --lesson 0


    When I pressed play in Unity, this came up and play closed immediately.


    WARNING:tensorflow:From C:\Users\den-n\.conda\envs\ml-agents1\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
    Instructions for updating:
    non-resource variables are not supported in the long term
    INFO:mlagents_envs:Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
    INFO:mlagents_envs:Connected new brain:
    TDLearningCurSac?team=0
    INFO:mlagents.trainers:Hyperparameters for the SACTrainer of brain TDLearningCurSac:
    trainer: sac
    batch_size: 128
    buffer_size: 100000
    buffer_init_steps: 2000
    hidden_units: 128
    init_entcoef: 0.15
    learning_rate: 0.0001
    learning_rate_schedule: linear
    max_steps: 1.0e6
    memory_size: 256
    normalize: False
    num_update: 1
    train_interval: 4
    num_layers: 2
    time_horizon: 64
    sequence_length: 64
    summary_freq: 10000
    tau: 0.005
    use_recurrent: False
    vis_encode_type: simple
    reward_signals:
    extrinsic:
    strength: 1.0
    gamma: 0.99
    summary_path: TDAgent14_CurSac_TDLearningCurSac
    model_path: ./models/TDAgent14_CurSac/TDLearningCurSac
    keep_checkpoints: 5
    beta: 0.005
    epsilon: 0.2
    lambd: 0.95
    num_epoch: 3
    2020-04-30 16:53:14.997979: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
    INFO:mlagents.trainers:Loading Model for brain TDLearningCurSac?team=0
    C:\Users\den-n\.conda\envs\ml-agents1\lib\site-packages\numpy\core\fromnumeric.py:3335: RuntimeWarning: Mean of empty slice.
    out=out, **kwargs)
    C:\Users\den-n\.conda\envs\ml-agents1\lib\site-packages\numpy\core\_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
    ret = ret.dtype.type(ret / rcount)
    INFO:mlagents.trainers:Saved Model
    INFO:mlagents.trainers:List of nodes to export for brain :TDLearningCurSac?team=0
    INFO:mlagents.trainers: is_continuous_control
    INFO:mlagents.trainers: version_number
    INFO:mlagents.trainers: memory_size
    INFO:mlagents.trainers: action_output_shape
    INFO:mlagents.trainers: action
    Converting ./models/TDAgent14_CurSac/TDLearningCurSac/frozen_graph_def.pb to ./models/TDAgent14_CurSac/TDLearningCurSac.nn
    GLOBALS: 'is_continuous_control', 'version_number', 'memory_size', 'action_output_shape'
    IN: 'vector_observation': [-1, 1, 1, 10] => 'policy/encoder/hidden_0/BiasAdd'
    IN: 'action_masks': [-1, 1, 1, 5] => 'policy_1/strided_slice_1'
    OUT: 'policy_1/action_probs/action_probs', 'policy_1/concat_2/concat', 'action'
    DONE: wrote ./models/TDAgent14_CurSac/TDLearningCurSac.nn file.
    INFO:mlagents.trainers:Exported ./models/TDAgent14_CurSac/TDLearningCurSac.nn file

    This came in a similar fashion for every --load I tried after it.
    I would be immensly greatful if you could help me! Currently trying to train the agent for my thesis and this is kind of demotivating.

    Thanks a lot!
     
  6. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    There is no way to add new actions during training.
    I guess if the reward is small enough, you could be adapting it during training.
    An alternative would be to train a first Agent with defensive rewards all the way, record some demonstrations and then use GAIL to start a new training (the agent will have the full reward but be encouraged to imitate the defensive agent).
    You can also modify the environment (for example, reducing the amount of gold the Agent has for attacking). But if attacking is not useful early in the curriculum, the Agent should learn not to do that action.

    Regarding the second error, there is this issue on github https://github.com/Unity-Technologies/ml-agents/issues/3278 that explains that --load does not reset the number of training steps. If you use --load after the environment is done training, you need to increase the number of max_step in the config.
     
  7. 0rigin93

    0rigin93

    Joined:
    Dec 11, 2016
    Posts:
    8
    Does that mean that if I use --train without --load on a pre-trained model, the previously trained stated of the model is retained and the model will continue to learn through training? I understood --load as in "continue where your training left you, instead of training from new". The alternative with GAIL sounds reasonable. I will try that out.


    Anyway, that helped me greatly, thank you so much!
     
  8. MapleMeese

    MapleMeese

    Joined:
    May 1, 2020
    Posts:
    18
    Hi, I'm currently trying to train my agent, and I'm getting the same error. "Fewer observations(0) made than vector observation size(10). The observation will be padded. The thing that confuses me the most, is that this happens every time around 30 minutes into training. Never at the beginning. Any insight would be appreciated. Here's the callstack
    Code (CSharp):
    1. Fewer observations (0) made than vector observation size (10). The observations will be padded.
    2. UnityEngine.Debug:LogWarningFormat(String, Object[])
    3. Unity.MLAgents.Sensors.VectorSensor:Write(ObservationWriter) (at Library/PackageCache/com.unity.ml-agents@1.0.3/Runtime/Sensors/VectorSensor.cs:51)
    4. Unity.MLAgents.GrpcExtensions:GetObservationProto(ISensor, ObservationWriter) (at Library/PackageCache/com.unity.ml-agents@1.0.3/Runtime/Communicator/GrpcExtensions.cs:258)
    5. Unity.MLAgents.RpcCommunicator:PutObservations(String, AgentInfo, List`1) (at Library/PackageCache/com.unity.ml-agents@1.0.3/Runtime/Communicator/RpcCommunicator.cs:326)
    6. Unity.MLAgents.Policies.RemotePolicy:RequestDecision(AgentInfo, List`1) (at Library/PackageCache/com.unity.ml-agents@1.0.3/Runtime/Policies/RemotePolicy.cs:33)
    7. Unity.MLAgents.Agent:NotifyAgentDone(DoneReason) (at Library/PackageCache/com.unity.ml-agents@1.0.3/Runtime/Agent.cs:487)
    8. Unity.MLAgents.Agent:OnDisable() (at Library/PackageCache/com.unity.ml-agents@1.0.3/Runtime/Agent.cs:463)
     
  9. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    Looking at the stack trace, it looks like the warning appeared when the Agent was disabled. Maybe there is something that disables the Agent after 30 minutes of training. When the Agent is disabled, Unity will send a message to the trainer to signal an Agent was disabled, in that message, the observations are collected. Maybe the Agent was unable to properly collect the observations and had to pad them.
     
  10. MapleMeese

    MapleMeese

    Joined:
    May 1, 2020
    Posts:
    18
    Thanks, do you have any ideas what could be disabling the agent? In my game, nothing is being destroyed, only moved. Also, if the observations are padded, is the training supposed to be stopped? My training stops when I get the error.
     
  11. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    I do not know what could be disabling the Agent, it could be due to the environment closing (due to an error?). This message is a warning (not an error) and should not cause any crashes or stop the training. I think there might be another bug somewhere that is causing a crash (and disables the Agent). Maybe there is something else happening higher in the stack trace or in Python.
     
  12. MapleMeese

    MapleMeese

    Joined:
    May 1, 2020
    Posts:
    18
    I looked at it again and the root of the warning is
    Code (CSharp):
    1. MissingReferenceException: The object of type 'Rigidbody' has been destroyed but you are still trying to access it.
    2. Your script should either check if it is null or you should not destroy the object.
    3. UnityEngine.Rigidbody.get_velocity () (at <f8918d0cc5fc41a1baba01b207a4c016>:0)
    4. Roll.CollectObservations (Unity.MLAgents.Sensors.VectorSensor sensor) (at Assets/Scripts/Roll.cs:95)
    5. Unity.MLAgents.Agent.NotifyAgentDone (Unity.MLAgents.Agent+DoneReason doneReason) (at Library/PackageCache/com.unity.ml-agents@1.0.3/Runtime/Agent.cs:483)
    6. Unity.MLAgents.Agent.OnDisable () (at Library/PackageCache/com.unity.ml-agents@1.0.3/Runtime/Agent.cs:463)
    . Since it says .get_velocity(), I'm assuming the rigidbody refers to a sphere rigidbody. However, it never gets destroyed in my code. Additionally, when I try to resume training, I immediately get this MissingReferenceException error again. To emphasize again, I don't have destroy anywhere in my code. no rigidbody is every getting destroyed, I have no idea how I'm getting this error


    Adding on to this, on another run of training, I didn't get any rigidbody destory error, and it stopped after getting the vector observation error. There were no errors in the terminal
     
    Last edited: Oct 19, 2020
  13. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    Hi,

    I cannot reproduce this error with the information at hand. Can you send me you project so I can inspect it. The error seems to come from the Agent being disabled for some reason, this triggers a call to collect observations that uses a destroyed rigidbody.
    A quick fix would be to check if rigidbody is null in collect observations, butt I will need the code if we want to get to the bottom of this.
     
  14. MapleMeese

    MapleMeese

    Joined:
    May 1, 2020
    Posts:
    18
    Thanks. How would you prefer that I send the project? And which parts of it would you like?
     
  15. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    You can send a .zip. Send a minimal project that has the error you describe and a way to reproduce it.
     
  16. MapleMeese

    MapleMeese

    Joined:
    May 1, 2020
    Posts:
    18
    Thanks. Here it is. At around 40-60min into training, it has consistently stopped at produced the errors that I previously mentioned.
    https://www.dropbox.com/s/orzy7ziqd7w7ieg/Ball.zip?dl=0
     
  17. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    I tried to reproduce the error, and I can see similar error messages as the ones you reported when I stop the training. I would not worry about these errors when they occur at the end of training, as it simply means that the scene is closing and there might be some loose callbacks in the loop. This makes me believe that these errors and warnings are not the cause of the crash, but a consequence of it.
    Are there ANY other error messages, in Unity OR in Python that could explain a crash? Also, are you sure the trainer did not reach the maximum number of steps and it trying to exit training normally?
    I will keep investigating with the code you provided.
     
  18. MapleMeese

    MapleMeese

    Joined:
    May 1, 2020
    Posts:
    18
    Wow, that's embarrassing. It was max steps. Another error that I'm getting is that the agent is producing NaN values. This doesn't always happen, but sometimes it will start producing NaN for a few seconds/minutes, then go back to normal. Sometimes this happens at the beginning of the training session, sometimes in the middle, sometimes not at all. Additionally, sometimes the agent just stops outputting values. What would be the cause of this?
     
    Last edited: Jan 11, 2021
  19. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    I think you should update to the latest ML-Agents, we have added some checks to makes sure the trainer catches NaNs as early as possible and raises a warning. Since the NaNs sometimes return to normal values, this means that there is probably a NaN being feed to the network through observations or rewards (If it was the network, it would not be able to recover at all). Make sure you do not have NaNs as observations/rewards, like a division by 0.
     
  20. MapleMeese

    MapleMeese

    Joined:
    May 1, 2020
    Posts:
    18
    I'll update versions. Thanks for all the help. can't believe it was cause I had max_steps set to something else