Search Unity

  1. All Pro and Enterprise subscribers: find helpful & inspiring creative, tech, and business know-how in the new Unity Success Hub. Sign in to stay up to date.
    Dismiss Notice
  2. Dismiss Notice

CNN with 4D observations?

Discussion in 'ML-Agents' started by mbaske, Jul 30, 2020.

  1. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    363
    Reading these posts (https://forum.unity.com/threads/con...ut-using-an-image-render.841828/#post-5570152 https://forum.unity.com/threads/can...for-a-visual-observation.832906/#post-5531560), I wonder if it's possible to encode 3 spatial dimensions + labels in a similar manner. My agent scans its surroundings, creating a voxel map with each voxel having a label/value, representing things like "walkable", "obstacle" etc. Could convolution help with handling the thousands of data points that are being generated this way? Thanks!
     
  2. celion_unity

    celion_unity

    Unity Technologies

    Joined:
    Jun 12, 2019
    Posts:
    289
    Hi @mbaske,
    There are two limiting factors to this right now:
    1) The mlagents trainer only supports 2D convolutions right now. 1D and 3D should be fairly easy to add, but we haven't had a chance to yet (and don't have a good example to show them off).
    2) Barracuda inference currently has a hard limit of 4 dimensions (NHWC), but they're actively working on relaxing this to allow more dimensions. No ETA on when this will be available though.

    I'll make sure the request for 3D (and 1D) convolutions are logged in our tracker.
     
  3. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    363
  4. kpalko

    kpalko

    Joined:
    May 14, 2020
    Posts:
    5
    @celion_unity can you expand on this some more? Do you mean that a 3 layer (RGB) image cannot be passed to the CNN within MLAgents? I would consider this a 3D observation (RGB layer x width x height).
     
  5. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    85
    @mbaske
    That is something that definetly should work. In terms of implementing this in ml-agents at the moment, is the environment 'truely 3D, in the sense that there's a lot of variation in all 3 dimensions, or an equivalent of minimap would work just as well. I'm just wondering if you could circumvent the current limitations with pushing the same data as a 2D image.
     
  6. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    363
    I want my agent to explore and map the rooms of a building. Rooms can be on different levels, so that would be 3D data. But I can think of ways to simplify this, perhaps by providing detailed observations for the current level (2D), but only rough data for the rest of the building, something like "60% of the levels below were mapped".

    Did you use minimap type of visual observation successfully before? So far, I wasn't able to get good results with that, for more details please see https://forum.unity.com/threads/minimum-visual-observation-size.855487/#post-5655583
     
  7. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    85
    not personally, but I know a few examples where this worked, even in production. Good example of it is SEED, where they experimented with that approach in Battlefield 1 https://www.ea.com/en-gb/news/teaching-ai-agents-battlefield-1
    The "minimap" approach mimics partial observability of the environment and at the same time simplifies A LOT what the agent is getting in terms of input, if you compare it to purely visual input.
     
    mbaske likes this.
  8. celion_unity

    celion_unity

    Unity Technologies

    Joined:
    Jun 12, 2019
    Posts:
    289
    Sorry, I should have been more precise. RGB images use the conv2d operator in tensorflow (example usage). At inference time, Barracuda tensors currently only support indexing by up to 4 dimensions; for RGB images, these dimensions are the index of the agent, image height, image width, and color channel (NHWC) respectively.

    @mbaske's request was to support 3 spatial dimensions + channels, which would require the conv3d operator, and additional support in Barracuda.

    conv1d operations are also a reasonable request; that's what OpenAI uses in their Hide-and-seek example for raycast observations.

    Just for posterity, this is logged as MLA-1220 in our tracker.
     
    kpalko and mbaske like this.
  9. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    363
    Is there any news on this? I'm once again looking for a way to observe spatial data, like point clouds or voxel grids. I guess a possible workaround would be doing polar projections of points and encode distances as colour values. That would require a current POV though and therefore ignore any obscured regions, which likely makes it unsuitable for persistent mapping. Thanks!
     
    ademord likes this.
  10. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    30
    im actually trying to also observe spatial data like @mbaske
     
unityunity