Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Bug Unity Crashes after 15-20hrs of training

Discussion in 'ML-Agents' started by MrpHDanny, Apr 4, 2022.

  1. MrpHDanny

    MrpHDanny

    Joined:
    Oct 20, 2020
    Posts:
    23
    I've been trying to leave a model to train overnight for several days now and with every attempt Unity has spat out an Error message which terminates the training after about 15-20hrs. Before stopping, the console prints out the message "The Unity Environment took too long to respond".

    Here is the Unity Error box:

    unity_error.png

    Also, the Tensorboard results only show 2.5m steps worth of training data while the training has been going on for 6.5m steps before stopping.

    How could I fix these issues?

    Edit: Unity just printed out out some warnings about Tensorflow while working on the Unity Project
    what.png
     

    Attached Files:

    • what.png
      what.png
      File size:
      151.7 KB
      Views:
      202
    Last edited: Apr 4, 2022
  2. mmmbop

    mmmbop

    Joined:
    Jan 22, 2022
    Posts:
    20
    I had another crash problem - Unity just crashed without any message.
    Deleting all Debug.Logs in code helps.
    Try to solve TF problem
     
  3. ditlevrisdahl

    ditlevrisdahl

    Joined:
    May 30, 2017
    Posts:
    26
    Have you made sure to refresh your tensorboard? Sometimes i experience the same, but a simpel refresh fixes it. Sometimes it takes ages for tensorboard to load the data. Try stopping and running tensorboard again if refresh doesnt work. How many steps are shown when you look under "Histograms"? Sometimes my "Scalar Values" steps are a few million and then Histogram show the correct amount of steps. Which measn that data is there somewhere, but needs to be "uploaded" to tensorboard.

    I have been training for 48+ hours with never a crash. So unfortunately i cannot help you there. Do you use the ml-agents training option or have you made your own training script?

    EDIT: Do you build and train on the .exe file? Or do you train with unity running and showing everything?
     
  4. MrpHDanny

    MrpHDanny

    Joined:
    Oct 20, 2020
    Posts:
    23
    @ditlevrisdahl Thanks for the suggestion to look in other tabs of Tensorflow, "Histograms" and "Time Series" show the correct number of steps. I'm not sure what is happening with "Scalars" though, as sometimes the data doesn't even show up at all, i.e. the graphs are completely empty.

    As for the training, I used the ml-agents training option and have been running it in Unity, not an exe build.
     
  5. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    185
    if you train for longer export the project as server build and train it over console. its a lot faster.

    Best