Search Unity

Bug Cannot make inference on neural network trained using ML-Agents

Discussion in 'Barracuda' started by Cold85, Sep 22, 2021.

  1. Cold85

    Cold85

    Joined:
    Sep 22, 2021
    Posts:
    6
    Hi,

    I trained a neural network using ML-Agents, training was without any problems on PyTorch side, but trying to infer the trained network using Barracuda is giving me a whole list of errors.

    My RL agent uses 2 Buffer Sensors (one to encode teammates info, one encode room features info), 1 Grid Sensor with resnet (to encode neighborhood spatial info of the agent), and a vector observation. The agent setup is in the attached file ("Agent Setup.jpg").

    Agent Setup.jpg

    Upon adding the .onnx file created by ML-Agents into the Editor, I see errors regarding "Cannot reshape array of size 453152 into shape with multiple of 15232 elements at Unity.Barracuda.TensorExtensions.Reshape. The full error stack is in "Error Stack_on onnx file added.jpg". Inspecting the onnx file I see the warning "model detected as NCHW, but not natively in this layout, behaviour might be erroneous".

    Error Stack_on onnx file added.jpg

    When trying to make inference I see another error "Off-axis dimensions must match". The full error stack is in "Error Stack_on inference.jpg"

    Error Stack_on inference.jpg

    When using only Buffer Sensors or only Grid Sensor, inference has no problem. It is when both are used together that Barracuda seems to fail. I have uploaded my trained model in "NNModel Onnx File.zip"

    I am using:
    ML-Agents Release 18 (uses Barracuda 2.0.0)
    PyTorch 1.7.1
    Unity Version 2019.4.1f1
    Windows 10 OS

    Urgently need experts' help on this.

    Thank you.
     

    Attached Files:

  2. fguinier

    fguinier

    Unity Technologies

    Joined:
    Sep 14, 2015
    Posts:
    146
    Hi Cold85,

    A bunch of good news about Agent3_2_472_r18_resnet_and_attn.onnx :

    - On Barracuda 2.0.0 and up NN import without any problem according to my test.
    - On MLAgent 2.0.0 (and thus Barracuda 2.0.0) import without any problem according to my test too.
    - On MLAgent 1.8.0 (and thus Barracuda 1.3.1) import fail as you describe above.
    - On bleeding edge Barracuda inference match reference ONNX runtime (appart from RandomNormalLike node more on this below) I expect this to be true since Barracuda 2.0.0

    My guess is that you are using ML-Agent 1.8.0/Barracuda 1.3.1 thus and that the import bug was fixed along Barracuda 2.0.0 (itself used by ML-Agent 2.0.0)? Does it make senses and is it possible for you to give it a try with ML-Agent 2.0.0?

    As a side note: ML-Agent 2.0.0 is a verified release while ML-Agent 1.8.0 is a preview package.

    Final note: Barracuda can't match RandomNormalLike for two reason: seed is not defined by model and is up to implementation + actual implementation of the random distribution is not standard and is up to inference library, however replacing RandomNormalLike by Identity made inference match.

    Hope it helps!

    Florent
     
  3. Cold85

    Cold85

    Joined:
    Sep 22, 2021
    Posts:
    6
    @fguinier big thanks for looking into my problem!

    Firstly to point out that ML-Agents 2.0.0's dependency is Barracuda 1.4.0-preview instead (see git release page https://github.com/Unity-Technologies/ml-agents/releases).

    I was using ML-Agents 2.1.0-exp.1/ Barracuda 2.0.0-pre.3 when I encountered the errors above. See versioning I screen captured from my package manager:
    Barracuda versioning.JPG ML-Agents Release 18 versioning.JPG

    I also posted this question in ML-Agents forum (https://forum.unity.com/threads/can...ural-network-trained-using-ml-agents.1173203/), to which @WaxyMcRivers replied saying Barracuda 2.1.0-preview seems to solve the import errors.

    I updated my project's Barracuda to 2.1.0-preview and got the same results as @WaxyMcRivers. So on my machine, at least, it was Barracuda 2.1.0-preview that resolved the errors.
     
  4. fguinier

    fguinier

    Unity Technologies

    Joined:
    Sep 14, 2015
    Posts:
    146
    Hi @Cold85,

    Thanks for the info and followup!

    According to https://docs.unity3d.com/Packages/com.unity.ml-agents@2.1/changelog/CHANGELOG.html
    + local test on package manager. Seems that we have:
    ML-Agent 2.0.0 --> Barracuda 2.0.0 --> model import fine
    ML-Agent 2.1.0-exp.1 --> Barracuda 2.0.0-pre.3 --> error on import
    ML-Agent 2.0.0-exp.1 --> Barracuda 1.4.0-preview --> error on import
    ML-Agent 2.0.0-pre.3 --> Barracuda 2.0.0-pre.3 --> error on import
    Also as you said Barracuda 2.1.0-preview --> model import fine

    So it seems that Barracuda 2.0.0 or 2.1.0-preview are the minimum version with the fix. Witch match the behavior you are seeing + is a good news as it means both official and latest version contain the fix.

    However the documentation about the dependancies between ml-agents and barracuda seems indeed wrongly (as you poitned out https://github.com/Unity-Technologies/ml-agents/releases) I will raise with ML-Agent team.

    Thanks again for feedback!
    Florent
     
  5. Cold85

    Cold85

    Joined:
    Sep 22, 2021
    Posts:
    6
    Hi,

    To add the case that I had tested:

    ML-Agent 2.1.0-exp.1 --> Barracuda 2.1.0-preview --> model import fine, model inferred fine in game mode