Search Unity

Minimum visual observation size

Discussion in 'ML-Agents' started by mbaske, Mar 28, 2020.

  1. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Hi, I've seen a python error message saying the minimum width/height for "vis_encode_type: simple" is 20. Is this an arbitrary number, or are there computational reasons for that?

    I'm writing a 64x4 band of depth texture values into a render texture, and the min limit prevents me from using it as a sensor. It's not a problem to increase the texture size, but for the sake of low memory and processing usage, I'm trying to keep things as small and simple as possible. Would it be preferable to implement a 2D float sensor for my use case instead? Thanks!
     
  2. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    Anything smaller than those values would result in a tensorflow error (I'm fuzzy on the details, but something about the strides in the convolutional layer). FWIW, the min dimensions for resnet is 15, so that might be enough for you. I'll log a feature request to support smaller dimensions too.

    I'm not sure that a 2D float sensor would be any better; it's still subject to the same dimension restrictions.
     
    mbaske likes this.
  3. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    Tracking ID for the feature request is MLA-808
     
  4. ChrissCrass

    ChrissCrass

    Joined:
    Mar 19, 2020
    Posts:
    31
    The problem is that if you pool and convolve too many times on a small image, it will end up with negative or puny dimensions on the other end. There's also an issue with the divisibility of texture resolutions. Only certain resolutions won't result in padding being added to the edges of the image (not a big deal though...)
     
  5. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Do the hidden layers only consider the resulting image after all convolution and pooling steps? Or do they also work with intermediate data?

    I have another use case where I'm training an agent to map a dungeon with different room sizes. The visual observation here is a top-down orthographic 32x32 b/w view of the mapped area. It starts out all black and as the agent moves around the rooms, grid cells get filled with grey values representing their accessibility (number of neighbouring walls). The agent receives a discrete reward for each new cell it detects. So far, I have been training it for 30M steps and slowly but surely, it gets more efficient overall. However, occasionally the agent still gets stuck in a looping pattern although it should be able to see a nearby exit to yet unmapped cells.
    My suspicion is that the agent is somewhat aware of the overall layout, but fails to detect critical details like a 2px wide gap representing a doorway. Are convolution and pooling doing more harm than good in this case?
     
  6. ChrissCrass

    ChrissCrass

    Joined:
    Mar 19, 2020
    Posts:
    31
    Only the final encoded result.

    That sounds plausible, but I'm not at all sure if thin features are sometimes biased out or not. to little experience with CNNs
     
    mbaske likes this.