Search Unity

Disk space keeps reducing during the training

Discussion in 'ML-Agents' started by WalkLearner, Apr 14, 2020.

  1. WalkLearner

    WalkLearner

    Joined:
    Mar 12, 2020
    Posts:
    10
    I noticed that the disk space kept reducing during the training and tried to find out what exactly caused this issue but in vain. The size of the ml-agents folder stays approximately the same, and the generated checkpoints as well as the summaries do not occupy so much space. I start to wonder where exactly the cache produced by the training is and worry about deleting the cache would affect the training results, which is presumably not supposed to happen.

    Did anyone also have the same issue? I'm currently training adversarial agents with selfplay over 10+ millions steps.
     
  2. TreyK-47

    TreyK-47

    Unity Technologies

    Joined:
    Oct 22, 2019
    Posts:
    1,822
    I'll flag this for the team to take a look. Can you share any console logs, as well as your python version & C# version?
     
  3. WalkLearner

    WalkLearner

    Joined:
    Mar 12, 2020
    Posts:
    10
    My python version is 3.7.7 and the C# is 7.0. The training was running in Anaconda environment. The console logs were actually pretty regular like any self-play training, ELO, mean rewards, steps, etc. No errors occurred. Thanks in advance.
     
  4. ChrissCrass

    ChrissCrass

    Joined:
    Mar 19, 2020
    Posts:
    31
    If you are using visual observations then this can be a cause of large buffer sizes, but there should be an upper limit to how much data is being saved. If very alrge models and summary files are accumulating, this could also be a cause.

    Recently I had an issue where disk space was rapidly being used up, but in the end it was being caused by my own c# mistakes (I was creating a new mesh every fixed update instead of editing existing meshes). You should probably check to make sure that your scene itself isn't the cause of the problem.
     
    arjun_blaze likes this.
  5. WalkLearner

    WalkLearner

    Joined:
    Mar 12, 2020
    Posts:
    10
    Thanks for the reply. In my case, I don't use visual observation or create, modify game objects. As I later did an experiment, training in the original scene from 0k to 12M steps, the training again significantly ate up 5gb of disk space. but surprisingly as I stopped the training, closed the terminal, reopened the terminal and restarted the training, the disk space suddenly freed up from remaining 17gb to remaining 22gb, a instant 5gb free space. I assume it is really related to the issue of cache from the training.
     
  6. ChrissCrass

    ChrissCrass

    Joined:
    Mar 19, 2020
    Posts:
    31
    It sounds like the problem is with your scene. There is no way you are building up more than a gig of observations...
     
  7. nights007

    nights007

    Joined:
    Aug 12, 2014
    Posts:
    17
    I have the same problem, disk space on C is slowly getting used. Wonder if it has something to do with CUDA...
     
  8. nights007

    nights007

    Joined:
    Aug 12, 2014
    Posts:
    17
    Seems like the ML agents pushes enormous amounts of data to Editor.log, mine was over 20gb in size when i deleted it (could see that Unity was writing to it).

    @WalkLearner Have a look at the file and see if its causing a problem for you too.
    C:\Users\<User>\AppData\Local\Unity\Editor\Editor.log

    @ChrissCrass, why do you think there cant be issues beyond his scene? Seems a bit naive considering ML Agents just came out of preview a few weeks ago...
     

    Attached Files:

    NIZ_DX, pragmascript and ZRogers like this.
  9. Dorianosrex1

    Dorianosrex1

    Joined:
    Mar 18, 2018
    Posts:
    7
    Hi!
    I just got the same issue, in 2021. This is somehow still an issue. I've checked the LOG file and it was 66GB, taking a huge chunk of my petite 512GB M.2. Thank you, nights007 for suggesting this!
     
    pragmascript likes this.
  10. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    What were the messages in the log? Were they coming from ML-Agents or something else?
     
  11. arjun_blaze

    arjun_blaze

    Joined:
    Mar 3, 2021
    Posts:
    2
    I am facing a similar issue. Can you elaborate more on this?
     
  12. arjun_blaze

    arjun_blaze

    Joined:
    Mar 3, 2021
    Posts:
    2
    The editor.log does not seem to be an issue for me. That file is in kb.
     
  13. NIZ_DX

    NIZ_DX

    Joined:
    Mar 11, 2021
    Posts:
    4
    This worked for me. Found out that, my Editor.log file was more than 13 GB.
     
  14. NicoBacc

    NicoBacc

    Joined:
    Dec 10, 2020
    Posts:
    18
    Hi guys, I faced the same problem with the Editor.log file during the training..is there a way to log less info/not logging at all during the training?
     
  15. OmarVector

    OmarVector

    Joined:
    Apr 18, 2018
    Posts:
    130
    I've the same issue, even when training on cloud , it crashes printing out error : No space left on device

    Did anyone managed to find out a solution or why this even happening?

    Edit: to be more accurate, this effect also for release build, not just develop one
     
    Last edited: Feb 13, 2023