Search Unity

Question Performances issue when several IMGUIContainer

Discussion in 'UI Toolkit' started by Ghat-Smith, Sep 2, 2020.

  1. Ghat-Smith

    Ghat-Smith

    Joined:
    Aug 16, 2016
    Posts:
    53
    Hi,

    I'm using Unity 2020.1.1f1 and I'm making a node system UI based on UIElement GraphView.
    I have a graph view, which contains Nodes. Each node contains an IMGUIContainer.
    I understand that using IMGUIContainer means not taking fully advantage of the new UI system but it's a limitation I have to ensure compatibility with preexisting UI logic.

    So I noticed performances were rather bad when the number of nodes increase.
    When moving the graph view for example. Maybe someone could give me some tips to avoid this problem ?

    Though, after investigating, the problem seems to be related to the property "worldTransform" and the method "UpdateWorldTransform" of the class VisualElement.
    upload_2020-9-2_20-45-16.png

    By looking at the source code of VisualElement I isolated two issues :

    1)
    Code (CSharp):
    1. public Matrix4x4 worldTransform
    2. {
    3.     get
    4.     {
    5.         if (isWorldTransformDirty)
    6.         {
    7.             UpdateWorldTransform();
    8.         }
    9.         return m_WorldTransformCache;
    10.     }
    11. }
    Looks like moving the graph view is making all the children elements (so nodes) world transform being dirty.
    Seems coherent, but I wonder if there is not a way to avoid that ?
    I tried to specify UsageHints property without success, I don't know if this really has something to do with my problem.

    2) Matrices operations seems to have a non-negligible impact on performances.
    I wonder if Unity already thought about optimize these operations for critical path ? (like this one where we can quickly get hundreds of multiplications).
    I already did this once for my own code, using methods with ref Matrix to avoid useless copies of Matrix4x4 (which is a struct) when using * operator. And if I remember correctly I was quite satisfied with the result.
    upload_2020-9-2_20-55-6.png
    upload_2020-9-2_20-53-49.png
     
  2. antoine-unity

    antoine-unity

    Unity Technologies

    Joined:
    Sep 10, 2015
    Posts:
    780
    Hello,

    It's true that IMGUIContainer won't be as fast as regular Visual Elements but as long as they are not many of them on the screen at the same time they shouldn't be extremely costly. If you graph support unzooming though, they could become the bottleneck.

    However the issue you're seeing look to me like they would happen even if you weren't using IMGUIContainer.

    1) UsageHints property help optimize regular rendering Visual Elements on the GPU and won't help for world transform calculation that happen on the CPU.

    2) Your screenshots and insights are definitely interesting. It looks like we are paying an excessive cost of struct copy and maybe unnecessary matrix multiplications. It's something we're aware of but it looks like there may be some low hanging fruits.

    We'll look into this issue and let you know what comes out of that.
     
    Ghat-Smith likes this.
  3. antoine-unity

    antoine-unity

    Unity Technologies

    Joined:
    Sep 10, 2015
    Posts:
    780
    Also, this looks like a Deep Profile screen shot.

    Can you post measurements of the UIRepaint sampler without deep profile and approximately the number of nodes in your graph?
     
  4. Ghat-Smith

    Ghat-Smith

    Joined:
    Aug 16, 2016
    Posts:
    53
    Hello Antoine,
    Thank you for your quick reply, I appreciate.

    There are exactly 439 nodes in my graph example (made just for the stress test).
    Here are the measurements without deep profile :
    upload_2020-9-2_22-19-27.png

    My graph support unzooming yes. I thought about simply hiding IMGUIContainer when there are too much nodes visibles at the same time (which also probably means GUI is too small to be readable anyway). What would you suggest to check if a VisualElement is visible on screen ? And to disable IMGUIContainer ?
    Actually I'm checking if node worldBound is overlapping with graph view and insert/remove IMGUIContainer from hierarchy according to that (disabling IMGUIContainer doesn't seem to work). But maybe there are more easy/optimized way to do that ? Especially knowing that IMGUIContainer has now a cullingEnabled property (which is curiously set to false by default).
     
  5. antoine-unity

    antoine-unity

    Unity Technologies

    Joined:
    Sep 10, 2015
    Posts:
    780
    IMGUIContainer.cullingEnabled = true
    is the way to go. This is disabled by default because it can have unintended side effects with legacy code which we have no control over (think Inspector Window for example).
    But you can definitely use this to your advantage. Do let us know if that doesn't work for you.

    Otherwise, there should be nothing to do for regular elements. We're trying very hard to not maintain visibility flags on the hot path of rendering, so that scrolling/panning remains smooth.
     
  6. Ghat-Smith

    Ghat-Smith

    Joined:
    Aug 16, 2016
    Posts:
    53
    Culling was already enabled since the first post.
    My profiling screenshots are made when viewing all the nodes. It is important for me to keep the editor smooth in such case, as it's common to unzoom a lot just to navigate through a big graph.

    I forgot to specify one thing, the issue is completely related to IMGUIContainer.
    Here is the same graph, same view (all the nodes), deep profile. I just removed the IMGUIContainer inside each node :
    upload_2020-9-3_11-35-27.png

    Note 1 : you can still see IMGUIContainer in the profiling data, it's just because I have one used to draw a toolbar for the graph view (so independent to the number of nodes inside the graph).

    Note 2 : the source of the problem would be RencerChainCommand.ExecuteNonDrawMesh method, which seems to be called for each IMGUIContainer in the hiearchy (called 448 times in the screenshot of the initial post).

    Note 3 : the big spikes you can see on the profiler screenshot are related to garbage collector. I selected a frame not affected by the GC to make the screenshot.

    Reminder : problems happens even when IMGUIContainer draw nothing.

    Why having empty IMGUIContainer in the hiearchy generates such an overhead ?
     
  7. mcoted3d

    mcoted3d

    Unity Technologies

    Joined:
    Feb 3, 2016
    Posts:
    1,003
    When you say the problem happens for IMGUIContainers that draw nothing, I assume this means an OnGUI callback that does nothing, correct?

    When we encounter an IMGUIContainer in the hierarchy, there's a lot of busywork that has to be executed even before reaching the OnGUI callback.
    • We must disable stencil clipping since the IMGUI system assumes it's off (this triggers a state change on the GPU)
    • The IMGUI clipping rect has to be computed and set
    • A new transform for the IMGUIContainer has to be computed
    • We must save the global IMGUI state (GUI.color, GUI.matrix, etc.)
    • The IMGUI event system has to be prepped up
    • We can now call "OnGUI".
    • Then we have to undo all of the above steps
    IMGUI rendering is itself very slow and inefficient, and UI Toolkit adds a layer of complexity above this. To reach peak performance, it is best to move away from IMGUI. The performances optimizations that can be achieved with IMGUI are quite limited.
     
  8. Ghat-Smith

    Ghat-Smith

    Joined:
    Aug 16, 2016
    Posts:
    53
    Yes

    Thank you for the details. That's what I feared unfortunately.
    In my case the bottleneck seems to be on the transform computation.
    But there doesn't seem to be a lot of solutions in this case... Optimizing UpdateWorldTransform doesn't seem easy for the general case (except micro-optimization like what I suggested for the matrices operations).
    Did you thought about wrapping the core logic of UpdateWorldTransform into a GetWorldTransform method that could be overridden ? I guess it doesn't really make sense if it's only useful for an isolated case like mine :/.
     
  9. mcoted3d

    mcoted3d

    Unity Technologies

    Joined:
    Feb 3, 2016
    Posts:
    1,003
    Letting you override the UpdateWorldTransform() to let users provide their own faster implementation doesn't seem like an avenue we would pursue. World transform computation is a bottleneck for everyone, we are working to improve this. It's very sad that the mono-generated code is so slow, but we're investigating a few ideas to make it faster for everyone. We will let you know if we can provide a more short-term speed boost.
     
    Ghat-Smith likes this.
  10. Ghat-Smith

    Ghat-Smith

    Joined:
    Aug 16, 2016
    Posts:
    53
    Okay, I totally understand. Thank you for all the replies.