Search Unity

  1. Unity 2020.1 has been released.
    Dismiss Notice
  2. We are looking for feedback on the experimental Unity Safe Mode which is aiming to help you resolve compilation errors faster during project startup.
    Dismiss Notice
  3. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

CNN with 4D observations?

Discussion in 'ML-Agents' started by mbaske, Jul 30, 2020.

  1. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    254
    Reading these posts (https://forum.unity.com/threads/con...ut-using-an-image-render.841828/#post-5570152 https://forum.unity.com/threads/can...for-a-visual-observation.832906/#post-5531560), I wonder if it's possible to encode 3 spatial dimensions + labels in a similar manner. My agent scans its surroundings, creating a voxel map with each voxel having a label/value, representing things like "walkable", "obstacle" etc. Could convolution help with handling the thousands of data points that are being generated this way? Thanks!
     
  2. celion_unity

    celion_unity

    Unity Technologies

    Joined:
    Jun 12, 2019
    Posts:
    187
    Hi @mbaske,
    There are two limiting factors to this right now:
    1) The mlagents trainer only supports 2D convolutions right now. 1D and 3D should be fairly easy to add, but we haven't had a chance to yet (and don't have a good example to show them off).
    2) Barracuda inference currently has a hard limit of 4 dimensions (NHWC), but they're actively working on relaxing this to allow more dimensions. No ETA on when this will be available though.

    I'll make sure the request for 3D (and 1D) convolutions are logged in our tracker.
     
  3. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    254
  4. kpalko

    kpalko

    Joined:
    May 14, 2020
    Posts:
    4
    @celion_unity can you expand on this some more? Do you mean that a 3 layer (RGB) image cannot be passed to the CNN within MLAgents? I would consider this a 3D observation (RGB layer x width x height).
     
  5. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    78
    @mbaske
    That is something that definetly should work. In terms of implementing this in ml-agents at the moment, is the environment 'truely 3D, in the sense that there's a lot of variation in all 3 dimensions, or an equivalent of minimap would work just as well. I'm just wondering if you could circumvent the current limitations with pushing the same data as a 2D image.
     
  6. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    254
    I want my agent to explore and map the rooms of a building. Rooms can be on different levels, so that would be 3D data. But I can think of ways to simplify this, perhaps by providing detailed observations for the current level (2D), but only rough data for the rest of the building, something like "60% of the levels below were mapped".

    Did you use minimap type of visual observation successfully before? So far, I wasn't able to get good results with that, for more details please see https://forum.unity.com/threads/minimum-visual-observation-size.855487/#post-5655583
     
  7. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    78
    not personally, but I know a few examples where this worked, even in production. Good example of it is SEED, where they experimented with that approach in Battlefield 1 https://www.ea.com/en-gb/news/teaching-ai-agents-battlefield-1
    The "minimap" approach mimics partial observability of the environment and at the same time simplifies A LOT what the agent is getting in terms of input, if you compare it to purely visual input.
     
    mbaske likes this.
  8. celion_unity

    celion_unity

    Unity Technologies

    Joined:
    Jun 12, 2019
    Posts:
    187
    Sorry, I should have been more precise. RGB images use the conv2d operator in tensorflow (example usage). At inference time, Barracuda tensors currently only support indexing by up to 4 dimensions; for RGB images, these dimensions are the index of the agent, image height, image width, and color channel (NHWC) respectively.

    @mbaske's request was to support 3 spatial dimensions + channels, which would require the conv3d operator, and additional support in Barracuda.

    conv1d operations are also a reasonable request; that's what OpenAI uses in their Hide-and-seek example for raycast observations.

    Just for posterity, this is logged as MLA-1220 in our tracker.
     
    kpalko and mbaske like this.
unityunity