Search Unity

Question training stops with RuntimeError: dictionary changed size during iteration

Discussion in 'ML-Agents' started by MarkTension, Jul 6, 2020.

  1. MarkTension

    MarkTension

    Joined:
    Aug 17, 2019
    Posts:
    43
    Hi, Sometimes my training stops with this "dictionary changed size during iteration" error. Is anyone familiar with that? I'm using concurrent environments, and mlagents release 2.

    This was from command prompt:
    File "c:\users\hello\desktop\project\ml-agents-release_2\ml-agents\mlagents\trainers\stats.py", line 344, in write_stats
    for key in StatsReporter.stats_dict[self.category]:
    RuntimeError: dictionary changed size during iteration

    This was from my build's debug log:
    Unable to save timers to file C:/Users/hello/Desktop/project/builds/7_3_2/agents2_Data\ML-Agents\Timers\Clay3D_timers.json
    (Filename: C:\buildslave\unity\build\Runtime/Export/Debug/Debug.bindings.h Line: 35)

    Any idea what's happening, or ways to stop this behavior?
     
  2. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    The stats.py error is really weird - I understand that modifying the dictionary while iterating over it is bad, but don't see how that could be happening here. Could you open a github issue with the full callstack (and maybe some more info about your python version)?

    The "Unable to save timers to file" message should be harmless.
     
  3. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    32
    I had the same issue. Used Release 3 and the default 3DBall environment with SAC. Only changed the max_steps to 1 million. Occurred at around 700k steps or so. So it should (hopefully) be easy to reproduce.
     
  4. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    I tried a few times but can reproduce the problem (3DBall, release 3, SAC, max_steps=1000000). Can you please post the full callstack of the error, command line args you're using to run, and output from "python --version"?
     
  5. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    Still can't reproduce it, but I have a theory - I think StatsReporter is getting called from different threads simultaneously, so one thread causes a new key to be added (via add_stat or set_stat) while write_stats is being called.
     
  6. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    MarkTension and BotAcademy like this.
  7. MarkTension

    MarkTension

    Joined:
    Aug 17, 2019
    Posts:
    43
    Great! Happy it got solved
     
  8. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    If this is causing a problem for training, and you're comfortable modifying the python code, a simpler workaround is to convert the loop in question to
    for key in tuple(StatsReporter.stats_dict[self.category].keys()):


    The fix will be in the next release, tentatively scheduled for next week.
     
    MarkTension likes this.