Search Unity

Resolved invalid API version number and recv failed

Discussion in 'ML-Agents' started by charli51, Sep 17, 2020.

  1. charli51

    charli51

    Joined:
    Feb 2, 2019
    Posts:
    4
    Hi, first i'm sorry for my poor english.
    I tried to follow the micro-game kart tutorial to learn how Ml-agents works; first i tried with tensorflow cpu but my intel proc does'nt support avx/avx2 , so i followed a tutorial to install tensorflow gpu.

    I use anaconda in a virtual environment ,i'm using python version 3.7.9 and tensorflow-gpuv2.1.0 ; ml-agents v0.19.0

    Steps i did in Unity: i downladed the micro-game kart project ; opened the ml agent training scene

    Steps i did in python: with anaconda3

    -> create a virtual environment (python 3.7.9)
    ->install in that environment tensorflow-gpu, after installing cudatoolkit & cudnn ,
    ->install mlagents from the latestbranch ( with pip install -e . in each directory , mlagents-enves and mlagents
    ->run the ml agnts learn command from the kart microgame folder

    i runned first the python command and then pressed play . I got the error below .
    thanks in advance for your help .

    Code (CSharp):
    1. 2020-09-17 15:22:32.529850: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
    2. WARNING:tensorflow:From C:\Users\-\.conda\envs\chameleon\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:88: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
    3. Instructions for updating:
    4. non-resource variables are not supported in the long term
    5.  
    6.  
    7.                         ▄▄▄▓▓▓▓
    8.                    ╓▓▓▓▓▓▓█▓▓▓▓▓
    9.               ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
    10.            ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
    11.           ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
    12.         ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
    13.         ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
    14.           ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
    15.             '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
    16.                ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
    17.                    `▀█▓▓▓▓▓▓▓▓▓▌
    18.                         ¬`▀▀▀█▓
    19.  
    20.  
    21. Version information:
    22.   ml-agents: 0.19.0,
    23.   ml-agents-envs: 0.19.0,
    24.   Communicator API: 1.0.0,
    25.   TensorFlow: 2.1.0
    26. 2020-09-17 15:22:38 WARNING [learn.py:256] The --train option has been deprecated. Train mode is now the default. Use --inference to run in inference mode.
    27. 2020-09-17 15:22:38 INFO [learn.py:271] run_seed set to 2703
    28. 2020-09-17 15:22:39.414271: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
    29. WARNING:tensorflow:From C:\Users\-\.conda\envs\chameleon\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:88: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
    30. Instructions for updating:
    31. non-resource variables are not supported in the long term
    32. 2020-09-17 15:22:41 INFO [environment.py:199] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
    33. Process Process-1:
    34. Traceback (most recent call last):
    35.   File "C:\Users\-\.conda\envs\chameleon\lib\multiprocessing\process.py", line 297, in _bootstrap
    36.     self.run()
    37.   File "C:\Users\-\.conda\envs\chameleon\lib\multiprocessing\process.py", line 99, in run
    38.     self._target(*self._args, **self._kwargs)
    39.   File "h:\tutos\ml-agents-latest_release\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 139, in worker
    40.     worker_id, [env_parameters, engine_configuration_channel, stats_channel]
    41.   File "h:\tutos\ml-agents-latest_release\ml-agents\mlagents\trainers\learn.py", line 207, in create_unity_environment
    42.     log_folder=log_folder,
    43.   File "h:\tutos\ml-agents-latest_release\ml-agents-envs\mlagents_envs\environment.py", line 220, in __init__
    44.     aca_params.package_version,
    45.   File "h:\tutos\ml-agents-latest_release\ml-agents-envs\mlagents_envs\environment.py", line 85, in _check_communication_compatibility
    46.     unity_communicator_version = StrictVersion(unity_com_ver)
    47.   File "C:\Users\-\.conda\envs\chameleon\lib\distutils\version.py", line 40, in __init__
    48.     self.parse(vstring)
    49.   File "C:\Users\-\.conda\envs\chameleon\lib\distutils\version.py", line 137, in parse
    50.     raise ValueError("invalid version number '%s'" % vstring)
    51. ValueError: invalid version number 'API-13'
    52. 2020-09-17 15:22:57 INFO [trainer_controller.py:192] Learning was interrupted. Please wait while the graph is generated.
    53. 2020-09-17 15:22:57 INFO [trainer_controller.py:76] Saved Model
    54. Traceback (most recent call last):
    55.   File "C:\Users\-\.conda\envs\chameleon\lib\multiprocessing\connection.py", line 312, in _recv_bytes
    56.     nread, err = ov.GetOverlappedResult(True)
    57. BrokenPipeError: [WinError 109] Le canal de communication a été fermé
    58.  
    59. During handling of the above exception, another exception occurred:
    60.  
    61. Traceback (most recent call last):
    62.   File "h:\tutos\ml-agents-latest_release\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 88, in recv
    63.     response: EnvironmentResponse = self.conn.recv()
    64.   File "C:\Users\-\.conda\envs\chameleon\lib\multiprocessing\connection.py", line 250, in recv
    65.     buf = self._recv_bytes()
    66.   File "C:\Users\-\.conda\envs\chameleon\lib\multiprocessing\connection.py", line 321, in _recv_bytes
    67.     raise EOFError
    68. EOFError
    69.  
    70. During handling of the above exception, another exception occurred:
    71.  
    72. Traceback (most recent call last):
    73.   File "C:\Users\-\.conda\envs\chameleon\Scripts\mlagents-learn-script.py", line 33, in <module>
    74.     sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
    75.   File "h:\tutos\ml-agents-latest_release\ml-agents\mlagents\trainers\learn.py", line 276, in main
    76.     run_cli(parse_command_line())
    77.   File "h:\tutos\ml-agents-latest_release\ml-agents\mlagents\trainers\learn.py", line 272, in run_cli
    78.     run_training(run_seed, options)
    79.   File "h:\tutos\ml-agents-latest_release\ml-agents\mlagents\trainers\learn.py", line 149, in run_training
    80.     tc.start_learning(env_manager)
    81.   File "h:\tutos\ml-agents-latest_release\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    82.     return func(*args, **kwargs)
    83.   File "h:\tutos\ml-agents-latest_release\ml-agents\mlagents\trainers\trainer_controller.py", line 201, in start_learning
    84.     raise ex
    85.   File "h:\tutos\ml-agents-latest_release\ml-agents\mlagents\trainers\trainer_controller.py", line 177, in start_learning
    86.     self._reset_env(env_manager)
    87.   File "h:\tutos\ml-agents-latest_release\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    88.     return func(*args, **kwargs)
    89.   File "h:\tutos\ml-agents-latest_release\ml-agents\mlagents\trainers\trainer_controller.py", line 113, in _reset_env
    90.     env_manager.reset(config=new_config)
    91.   File "h:\tutos\ml-agents-latest_release\ml-agents\mlagents\trainers\env_manager.py", line 66, in reset
    92.     self.first_step_infos = self._reset_env(config)
    93.   File "h:\tutos\ml-agents-latest_release\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 290, in _reset_env
    94.     ew.previous_step = EnvironmentStep(ew.recv().payload, ew.worker_id, {}, {})
    95.   File "h:\tutos\ml-agents-latest_release\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 94, in recv
    96.     raise UnityCommunicationException("UnityEnvironment worker: recv failed.")
    97. mlagents_envs.exception.UnityCommunicationException: UnityEnvironment worker: recv failed.
     
  2. charli51

    charli51

    Joined:
    Feb 2, 2019
    Posts:
    4
    Hi again,
    I've tried to test the ml-agents examples( 3DBall test ) with a new conda virtual env with ml-agents 0.19.0 , ml-agents-envs 0.19.0 tensorflow2.1.0 and tensorflow-gpu 2.3.0 and IT WORKED !!!
    I
    could make a training session, get the nn file and inference then.

    But when i came back with the same env for the kart project , same error :
    Code (CSharp):
    1.  Version information:
    2.   ml-agents: 0.19.0,
    3.   ml-agents-envs: 0.19.0,
    4.   Communicator API: 1.0.0,
    5.   TensorFlow: 2.1.0
    6. 2020-09-18 15:23:11 INFO [learn.py:271] run_seed set to 3872
    7. 2020-09-18 15:23:11.999721: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
    8. WARNING:tensorflow:From c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:88: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
    9. Instructions for updating:
    10. non-resource variables are not supported in the long term
    11. 2020-09-18 15:23:14 INFO [environment.py:199] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
    12. Process Process-1:
    13. Traceback (most recent call last):
    14.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\multiprocessing\process.py", line 297, in _bootstrap
    15.     self.run()
    16.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\multiprocessing\process.py", line 99, in run
    17.     self._target(*self._args, **self._kwargs)
    18.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 139, in worker
    19.     worker_id, [env_parameters, engine_configuration_channel, stats_channel]
    20.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents\trainers\learn.py", line 207, in create_unity_environment
    21.     log_folder=log_folder,
    22.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents_envs\environment.py", line 220, in __init__
    23.     aca_params.package_version,
    24.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents_envs\environment.py", line 85, in _check_communication_compatibility
    25.     unity_communicator_version = StrictVersion(unity_com_ver)
    26.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\distutils\version.py", line 40, in __init__
    27.     self.parse(vstring)
    28.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\distutils\version.py", line 137, in parse
    29.     raise ValueError("invalid version number '%s'" % vstring)
    30. ValueError: invalid version number 'API-13'
    31. 2020-09-18 15:23:25 INFO [trainer_controller.py:192] Learning was interrupted. Please wait while the graph is generated.
    32. 2020-09-18 15:23:25 INFO [trainer_controller.py:76] Saved Model
    33. Traceback (most recent call last):
    34.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\multiprocessing\connection.py", line 312, in _recv_bytes
    35.     nread, err = ov.GetOverlappedResult(True)
    36. BrokenPipeError: [WinError 109] Le canal de communication a été fermé
    37.  
    38. During handling of the above exception, another exception occurred:
    39.  
    40. Traceback (most recent call last):
    41.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 88, in recv
    42.     response: EnvironmentResponse = self.conn.recv()
    43.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\multiprocessing\connection.py", line 250, in recv
    44.     buf = self._recv_bytes()
    45.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\multiprocessing\connection.py", line 321, in _recv_bytes
    46.     raise EOFError
    47. EOFError
    48.  
    49. During handling of the above exception, another exception occurred:
    50.  
    51. Traceback (most recent call last):
    52.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\runpy.py", line 193, in _run_module_as_main
    53.     "__main__", mod_spec)
    54.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\runpy.py", line 85, in _run_code
    55.     exec(code, run_globals)
    56.   File "C:\Users\-\.conda\envs\MLAgent_training_env\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
    57.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents\trainers\learn.py", line 276, in main
    58.     run_cli(parse_command_line())
    59.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents\trainers\learn.py", line 272, in run_cli
    60.     run_training(run_seed, options)
    61.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents\trainers\learn.py", line 149, in run_training
    62.     tc.start_learning(env_manager)
    63.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    64.     return func(*args, **kwargs)
    65.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents\trainers\trainer_controller.py", line 201, in start_learning
    66.     raise ex
    67.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents\trainers\trainer_controller.py", line 177, in start_learning
    68.     self._reset_env(env_manager)
    69.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    70.     return func(*args, **kwargs)
    71.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents\trainers\trainer_controller.py", line 113, in _reset_env
    72.     env_manager.reset(config=new_config)
    73.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents\trainers\env_manager.py", line 66, in reset
    74.     self.first_step_infos = self._reset_env(config)
    75.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 290, in _reset_env
    76.     ew.previous_step = EnvironmentStep(ew.recv().payload, ew.worker_id, {}, {})
    77.   File "c:\users\-\.conda\envs\mlagent_training_env\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 94, in recv
    78.     raise UnityCommunicationException("UnityEnvironment worker: recv failed.")
    79. mlagents_envs.exception.UnityCommunicationException: UnityEnvironment worker: recv failed.


    I saw in different threads that recv failed could be an issue with http_ proxy but i don't have any proxy , do i have to check something inside Unity ?
    Why could i have an invalid version number with kart project and not with 3DBall project ? Do i have to downgrade the ml-agent within unity package manager?
    I'm sorry i'm a total beginner and sorry for my grammatical mistakes if i made ;) !
    And of course thanks in advance for your answers
     
  3. charli51

    charli51

    Joined:
    Feb 2, 2019
    Posts:
    4
    And also i got this error in unity console :

    Couldn't connect to trainer on port 5004 using API version API-13. Will perform inference instead.
    UnityEngine.Debug:Log(Object)
    MLAgents.Academy:InitializeEnvironment() (at Assets/Karting/ML_Agents/Scripts/Academy.cs:228)
    MLAgents.Academy:LazyInitialization() (at Assets/Karting/ML_Agents/Scripts/Academy.cs:147)
    MLAgents.Agent:OnEnable() (at Assets/Karting/ML_Agents/Scripts/Agent.cs:255)

    i guess it's because python API is 14 and unity mlagents is 13 but if i upgrade the mlagents it doesn't work
     
  4. rjhd30

    rjhd30

    Joined:
    Sep 10, 2020
    Posts:
    1
    Having similar issues as you; both the API-13 and recv failed errors. I'm on TF 2.3, Python 3.7.4 and Unity 19.4..Same agents version. Like you, training works fine in the ml agents examples (3D Ball, Roller Ball), but not in the Microgame. The main difference is that the kart game doesn't use the ml agents package and its dependencies. Playing around with that breaks a whole lot of stuff.
     
    charli51 likes this.
  5. charli51

    charli51

    Joined:
    Feb 2, 2019
    Posts:
    4
    OK , problem solved : after a couple of days trying to figure it out , i solved it doing this :
    if you are using tensorflow gpu and were in this case, you have to do the following in this order :

    -create a new environment with python 3.7.9
    -install mlagents0.13.1 with development method i mean :
    - download or clone the repo from github release ( be careful and take version 0.13 cause the latest release doesn't match the API )
    - in your conda prompt access the folder of the downloaded release :( cd + the path to your folder )
    -type "cd .\ml-agents-env " then type "pip install -e ."
    -type "cd .."
    -type "cd .\ml-agents" then type again "pip install -e ."

    -install tensorflow-gpu with the command line : "conda install -c conda-forge tensorflow-gpu"

    -type " conda remove tensorflow"
    -type " conda install tensorflow-estimator=2.1.0"
    -type " conda install tensorflow-gpu=2.1.0"

    I've done alle these steps in that order and now i can train and inference my karts as it's explained in the tutorial ; nevertheless i regret that the tutorial isn't updated for the latest release and that we can't find good explanations for training with a gpu tensorflow ( even less in french, which is my native language)

    By the way, thank you for all the courses
     
    Mistermind and rjhd30 like this.