I am having two problems when training. First, I am unable to watch the training reward process with tensorboard and the updated on the earned reward is not printing to the console. I think that this may be connected to their being no summaries folder being generated. (I am basing this off of the hummingbird example and how he did it, so let me know if things have just changed since then because I was able to get tensorboard to load and show output using results instead of summaries.) Second, after I run the test for a couple of minutes I get this error: c:\users\capstone\.conda\envs\ml-agents-node\lib\site-packages\mlagents\trainers\torch\utils.py:242: UserWarning: This overload of nonzero is deprecated: nonzero() Consider using one of the following signatures instead: nonzero(*, bool as_tuple) (Triggered internally at ..\torch\csrc\utils\python_arg_parser.cpp:882.) res += [data[(partitions == i).nonzero().squeeze(1)]] If I let the training to proceed it trains fine for about 10 minutes python crashes, so I have to stop training and restart it. My configuration file is: behaviors: Node_AI: trainer_type: sac summary_freq: 50000 time_horizon: 128 max_steps: 5.0e6 keep_checkpoints: 5 checkpoint_interval: 500000 init_path: null threaded: true hyperparameters: learning_rate: 3e-4 batch_size: 100 #this is a guess avg is 32 - 512 buffer_size: 50000 learning_rate_schedule: constant buffer_init_steps: 0 init_entcoef: 0.5 save_replay_buffer: true tau: 0.005 steps_per_update: 1network_settings: hidden_units: 256 num_layers: 2 #typical is 1 - 3 normalize: false vis_encoder_type: match3reward_signals: extrinsic: gamma: 0.99 strength: 1.0curiosity: strength: 0.05 gamma: 0.99self_play: save_steps: 20000 team_change: 80000 swap_steps: 5000 play_against_latest_model_ratio: 0.5 window: 10
I believe this is the same post as the github issue https://github.com/Unity-Technologies/ml-agents/issues/5229. Since there is a python crash, let's handle it in the issue thread.