Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Resolved "Zero Division Error" when training

Discussion in 'ML-Agents' started by Neohun, May 19, 2021.

  1. Neohun

    Neohun

    Joined:
    Oct 30, 2018
    Posts:
    69
    Hi, I have migrated the ml-agents from 1.07 to 2.0 and the following error occurs whenever I start the training..
    I assume it's a bug:

    \venv\lib\site-packages\torch\nn\init.py", line 376, in kaiming_uniform_std = gain / math.sqrt(fan)
    ZeroDivisionError: float division by zero

    I have tried pytorch 1.7.1 and 1.8.0 but no difference at all..
     
  2. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    We have similar issue coming up recently which is caused by using CameraSensor/GridSensor with a too large visual encoder in the config file. The solution would be either increase the sensor size, or use smaller visual encoder type.

    Is that your case? If not, please share the details of your setup and how to reproduce the error.
     
  3. Neohun

    Neohun

    Joined:
    Oct 30, 2018
    Posts:
    69
    OK that's my case, I have changed the "vis_encode_type" parameter from "simple" to "match3" and zero division problem has gone but whenever I use the gail it breaks again.. and the renderTexture is 15x25..
    If I increase the renderTexture size to 20x33 the problem has gone completely and it works with simple visual encoder as well.. I think the minimum size should be 20x20 because it doesn't work under this resolution.. but this is not a good solution since it is increasing the observations.. Is there another workaround for this, so I can use without increasing the size..
     
    Last edited: May 20, 2021
  4. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    The smallest CNN encoder we have right now is match3. If that still doesn't work, recently we've added a fully connected visual encoder for super small visual input that can handle any size of visual input. It's on the main branch now so you're welcomed to try it out, or it will also come in the next release.

    Note that fully connected encoder doesn't have convolution kernels, so it won't be able to capture local relationship among pixels and results in less powerful representation. It might affect performance depending on the complexity of observation and task difficulty.
     
    Tonsias likes this.
  5. Neohun

    Neohun

    Joined:
    Oct 30, 2018
    Posts:
    69
    OK I see, so this is not a good option for me because the relationship between neighbor pixels are important in my case.. and now I realized that we can modify the network_settings of gail as well I think this feature is added later because I just find out, now it has it's own network settings which we can modify however we want absolutely great! so I have added the "vis_encode_type: match3" under the "gail", now it works without any problem.. so the problem is fixed and I didn't have to increase the resolution..

    Also I want to ask that how large resolutions can be handled by the match3 cnn because it says in the docs that "match3 is a smaller CNN that is optimized for board games, and can be used down to visual observation sizes of 5x5." for an example if we use 50x30 resolutions then which is preferred "simple" or "match3" encoder.. and if we want to use bigger cnn encoder like "nature_cnn" or "resnet", do they have more benefits with tradeoff the computation power, over the smaller encoders? and I know that match3 is an optimized cnn for board games and in my case the levels are created as grid based so I think match3 will better..
     
    Last edited: May 20, 2021
  6. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    So "nature_cnn" is basically a bigger version of simple cnn and "resnet" has has some extra residual layers and also a bigger network, so they are definitely more powerful. Bigger networks has better representation power, and bigger CNN gives you local information in a larger local area and thus better high-level information.

    I'd say simple cnn is generally a good default to start with. If your observation size is really small, try match3. If your observation is very large and complex and you need more powerful encoder, you go up to try "nature_cnn" or "resnet".
     
    mbaske and Neohun like this.
  7. Neohun

    Neohun

    Joined:
    Oct 30, 2018
    Posts:
    69
    alright, thanks a lot about the information but it turned out that the "match3" encoder keeps a lot of space on disk and causes problems when training for some reason.. so you're right I should use simple encoder in my case by increasing the resolution a little then I'm good to go..
     
  8. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    49
    @Neohun could you share your training config ?

    my solution to division by zero using grid sensors was to make the gridsensor bigger, it seems like the visual encoder cannot take small input shapes. if anyone could point me to why to learn that would be great!
     
    Last edited: Sep 13, 2021