Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Bug MLAgents always crashes after 500000 steps

Discussion in 'ML-Agents' started by Yavvn, Aug 2, 2020.

  1. Yavvn

    Yavvn

    Joined:
    May 8, 2019
    Posts:
    18
    I'm having an issue where, no matter what I do, MLAgents seems to crash at 500,000 steps. I have tried messing with the yaml file but haven't had any luck.

    While it crashes, it correctly outputs the NN file at 500,000 steps.

    Code (csharp):
    1. mlagents-learn config.yaml --env=build/game --num-envs=6 --no-graphics --run-id=HunterAug1d
    Code (csharp):
    1. max_steps: 1.0e6 #also tried 1e6
    Code (csharp):
    1. 'is_continuous_control', 'trainer_major_version', 'trainer_minor_version', 'trainer_patch_version', 'version_number', 'memory_size', 'action_output_shape'
    2. IN: 'vector_observation': [-1, 1, 1, 741] => 'policy/main_graph_0/hidden_0/BiasAdd'
    3. IN: 'action_masks': [-1, 1, 1, 6] => 'policy_1/strided_slice'
    4. IN: 'action_masks': [-1, 1, 1, 6] => 'policy_1/strided_slice_1'
    5. OUT: 'action'
    6. DONE: wrote results\HunterAug1d\PreyAgent.nn file.
    7. 2020-08-02 03:36:20 INFO [model_serialization.py:83] Exported results\HunterAug1d\PreyAgent.nn file
    8. 2020-08-02 03:36:20 INFO [model_serialization.py:203] List of nodes to export for brain :HunterAgent
    9. 2020-08-02 03:36:20 INFO [model_serialization.py:205]   is_continuous_control
    10. 2020-08-02 03:36:20 INFO [model_serialization.py:205]   trainer_major_version
    11. 2020-08-02 03:36:20 INFO [model_serialization.py:205]   trainer_minor_version
    12. 2020-08-02 03:36:20 INFO [model_serialization.py:205]   trainer_patch_version
    13. 2020-08-02 03:36:20 INFO [model_serialization.py:205]   version_number
    14. 2020-08-02 03:36:20 INFO [model_serialization.py:205]   memory_size
    15. 2020-08-02 03:36:20 INFO [model_serialization.py:205]   action_output_shape
    16. 2020-08-02 03:36:20 INFO [model_serialization.py:205]   action
    17. Converting results\HunterAug1d\HunterAgent/frozen_graph_def.pb to results\HunterAug1d\HunterAgent.nn
    18. GLOBALS: 'is_continuous_control', 'trainer_major_version', 'trainer_minor_version', 'trainer_patch_version', 'version_number', 'memory_size', 'action_output_shape'
    19. IN: 'vector_observation': [-1, 1, 1, 741] => 'policy/main_graph_0/hidden_0/BiasAdd'
    20. IN: 'action_masks': [-1, 1, 1, 6] => 'policy_1/strided_slice'
    21. IN: 'action_masks': [-1, 1, 1, 6] => 'policy_1/strided_slice_1'
    22. OUT: 'action'
    23. DONE: wrote results\HunterAug1d\HunterAgent.nn file.
    24. 2020-08-02 03:36:20 INFO [model_serialization.py:83] Exported results\HunterAug1d\HunterAgent.nn file
    25. 2020-08-02 03:36:20 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
    26. 2020-08-02 03:36:21 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
    27. 2020-08-02 03:36:21 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
    28. 2020-08-02 03:36:22 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
    29. 2020-08-02 03:36:22 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
    30. 2020-08-02 03:36:22 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
    31. 2020-08-02 03:36:23 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
    32. 2020-08-02 03:36:23 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
    33. 2020-08-02 03:36:24 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
    34. Traceback (most recent call last):
    35.   File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 241, in _feed
    36.     send_bytes(obj)
    37.   File "C:\Program Files\Python38\lib\multiprocessing\connection.py", line 200, in send_bytes
    38.     self._send_bytes(m[offset:offset + size])
    39.   File "C:\Program Files\Python38\lib\multiprocessing\connection.py", line 290, in _send_bytes
    40.     nwritten, err = ov.GetOverlappedResult(True)
    41. BrokenPipeError: [WinError 109] The pipe has been ended
    42. 2020-08-02 03:36:24 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).

    Code (csharp):
    1. ML Agents Package: Release 4
    2. Windows 10 Environment
    3. Python               3.8.5
    4. mlagents               0.18.0
    5. mlagents-envs          0.18.0
    6. tensorboard            2.3.0
    7. tensorboard-plugin-wit 1.7.0
    8. tensorflow             2.3.0
    9. tensorflow-estimator   2.3.0
    10. numpy                  1.18.5
     
  2. celion_unity

    celion_unity

    Unity Technologies

    Joined:
    Jun 12, 2019
    Posts:
    289
    It sounds like we're not reading the number of steps correctly, and using the default instead. Can you post the contents of your
    config.yaml
    file?

    Also, at the start of training, you should see something like this:
    2020-08-03 11:39:36 INFO [stats.py:131] Hyperparameters for behavior name HunterAug1d


    Can you post the info from there too?

    Thanks!