Search Unity

Feedback [Showcase] Pure DOTS UI system (detailed description + feedback)

Discussion in 'Data Oriented Technology Stack' started by supron, Jun 2, 2019.

  1. supron


    Aug 24, 2013
    source code released:


    The current Unity UI solution is very powerful but struggles with performance (especially with many objects instantiation). DOTS seems like a great solution to this problem. I decided to recreate Unity's RectTransform with the maximum performance design in mind. Now it's time to share my results and thoughts after a few weeks with ECS.

    There's one thing I have to make clear before I start: Currently there is no way to write "pure" UI in Dots (there is no value-type Mesh, Texture, Sprite, Font, Material and CommandBuffer). I called it pure, because the only game object in the scene is a Camera. All UI controls are entities with SCD's for the types I listed above.

    UI System Design

    Rect transforms

    I implemented simplified RectTransform system. RectTransform is defined as:

    Code (CSharp):
    2. public struct RectTransform : IComponentData
    3. {
    4.     public float2 AnchorMin;
    5.     public float2 AnchorMax;
    6.     public float2 Position;
    7.     public float2 SizeDelta;
    8.     public float2 Pivot;
    9. }

    I skipped scale and rotation in the first iteration, to make things simpler. Parenting is copy-pasted Parent System form Entities.Transforms (I replaced its components with my own). RectTransformSystem is fully jobified. Every canvas in calculating on its own worker thread. Canvas job goes recursive through children tree and fills WorldSpaceRect. Since there is no rotation, the rect can be defined as 2 float2:

    Code (CSharp):
    2. public struct WorldSpaceRect : IComponentData
    3. {
    4.     public float2 Min;
    5.     public float2 Max;
    6. }
    RectTransform system supports both fixed pixel size and physical size (calculated from Screen.dpi). My results match 1:1 Unity's CanvasScaler.


    I keep sprites in SCD:

    Code (CSharp):
    1. public struct SpriteImage : ISharedComponentData
    2. {
    3.     public Sprite Value;
    4. }
    Sprite vertices and triangles are calculated in jobs. The job takes WorldSpaceRect and SpriteVertexData as input:

    Code (CSharp):
    1. public struct SpriteVertexData
    2. {
    3.     public float4 Outer;
    4.     public float4 Inner;
    5.     public float4 Padding;
    6.     public float4 Border;
    7.     public float PixelsPerUnit;
    8. }
    Sprite vertices and indices are stored in DynamicBuffers. It takes full advantage of multithreading and doesn't require jobs sync points.

    SDF Fonts

    I borrowed TextMeshPro SDF fonts and materials. I recreated simple text Mesh batcher with all the most important features. Currently, it supports:

    • Alignment (left/center/right, top/middle/bottom)
    • Different font sizes
    • Word wrapping
    • Bold style
    • Font color
    It's enough for most of my needs. Mesh generation is similar to the Sprite system. It's executed after RectTransformSystem, reads data from WorldSpaceRect, and writes vertices/indices to DynamicBuffers. Text jobs run in parallel to sprite batching jobs (no sync points required).

    UI Mesh batching

    Mesh batching is done in two stages (2 jobs). In the first stage, it creates persistent HashMap<Entity, int> with material ids. MaterialID is just SCD index of sprite or text font. Since SCD index is accessible form job, it's scheduled and the next job is prepared immediately. The second stage takes hashmap from a previous job as input, and goes through parent->children tree to build dynamic buffers with vertices, indices, and submeshes. The new submesh is created when the next entity MaterialID is different from the previous. SubMesh contains material type (sprite or text) and MaterialID. This job is also scheduled without sync points.

    Render system

    This is the last stage of the rendering process. Render system builds unity meshes and CommandBuffer from previously batched vertices. I can create one command buffer for all canvases or one CommandBuffer per canvas. Multiple command buffers are faster if we have static and dynamic canvases (frequently updated rect transforms), but the single command buffer is better for debugging purposes and testing.

    Canvases are sorted according to their sorting IDs. Vertices are copied from dynamic buffers to managed Lists and pushed to the GPU. I'm using hacks with NoAllocHelpers to avoid GC spikes. After that, I build command buffers. This process is quite simple. It is just an iteration over submeshes and calls to DrawMesh with material and MaterialPropertyBlock. Since MaterialProperyBlock is copied into CommandBuffer, I can reuse one instance for all submeshes (no GC alloc).

    Render system is single threaded, because of Mesh and CommandBuffer. It is very hard to improve this system further.

    Input system

    Mouse and Touch

    I parse mouse clicks and touches from Input class, and translate it to DOTS friendly array of structs. Mouse clicks are stored as touches with negative finger ID (left click and first touch share the same ID). I run hierarchy tree traversal in the job (again one thread per canvas). Results are stored in NativeArray and then properly sorted in another job (by canvas sorting ID). These jobs require .Complete(), because I have to know which entity is focused (for keyboard input).

    All inputs from mouse and touches are stored in DynamicBuffer. I also add an empty component to flag entities with filled buffers.


    Keyboard events are gathered from Event.PopEvent() method. I tried to use the new Unity Input System, but I gave up (it's still in active development and lacks low-level documentation). Keyboard inputs are added to DynamicBuffer of the focused entity.

    Event system

    This one is tricky. I couldn't find satisfactory solution for user defined events. Natural "ECS style" is adding flag components + entity queries, but common button system doesn't know about user defined types. I can attach my own flags permanently, but that will create many chunks for basically the same archetypes. I'm still thinking about pure ECS solution to this problem. For testing purposes I implemented simple delegate based system. It keeps mapping from entity to delegate. In OnUpdate I'm querying for all buttons with "click" component, and call delegate with entity as argument. Since old UnityEngine.UI is heavily based on delegates, it is easier to switch form old UI to DOTS with this approach. Example usage:

    Code (CSharp):
    1. World.GetOrCreateSystem<ButtonEventSystem().OnClick(entity, (ent) => {
    2.     Debug.Log($"Click: {ent}");
    3. });
    4. World.GetOrCreateSystem<InputFieldSystem>().OnSubmit(entity, (ent) => {
    5.     Debug.Log(TextData.ToManagedString(EntityManager.GetBuffer<TextData>()));
    6. });
    I can still manually filter "click" events in other systems. All mouse and keyboard events are cleared at the end of the frame (concurrent EntityCommandBuffer).

    Supported controls

    Right now I made support for:
    • Sprites (with 9-slice)
    • Texts
    • Rect masks
    • Input fields (very primitive implementation, keyboard only, no selection, no shortcuts, no mobile keyboard)
    • Buttons

    Rebuilding complex layout is very efficient with the job system. In some cases, my system is about 20x faster than UnityEngine.UI. I'm sure performance will drop with new features, but there is still room for more optimization. I'll compare only layout rebuild because rendering performance is almost the same (UnityEngine.UI makes better batching in some cases, but I'll improve it soon). Here is a comparison of similar UI layouts:

    Complex panel hierarchy + text (one canvas)

    300 rect transforms + heavy text (4496 words, 30314 characters)


    DOTS UI (you can easily distinguish it by missing italic font style):

    DOTS vs UGUI comparison:

    Zoomed DOTS:

    Results are clear. Single canvas has poor CPU utilization, but still beats UGUI.

    Complex panel hierarchy + text (multiple canvases)

    Same canvas as above, but duplicated 7 times

    DOTS vs UGUI rebuild comparison:

    Zoomed DOTS:

    This screenshot is very interesting. While layout rebuilding greatly benefits from multithreading, updating unity Mesh class and CommandBuffer is still single threaded and... slow. There is probably nothing I can do about it. But I'm very happy with results anyway. 500k vertices batched in about 100ms is still very promising. I'll try to test new Mesh API once it's available in 2019.3a and post results.

    Runtime instantiation (20000 sprites)

    This profiler snapshot show instantiation of new canvas with 20k very small sprites (all in the viewport):

    It's fast. Instantiation took about 30ms, parent system less than 50ms, and render system another 120ms. The whole player loop took less than 200ms. Such performance level is unachievable in MonoBehaviour world. Even with pooling, it would take a few seconds for activation and layout rebuild.


    I started this project as an ECS rookie. I had years of experience with Unity Engine, but absolutely no background in data-oriented design. After a few days of playing with ECS examples, I felt very confident with this new API and design. I have to admit - Unity did a great job with DOTS. Despite poor documentation, API is clean and simple, and very powerful.

    This UI system was written in 5 weeks. I think it's not bad, considering this was my first attempt with pure ECS.

    Is ECS suitable for UI system? Yes. Actually, I think it fits better than object-oriented design. UI controls are usually a set of components. Add sprite - it's an image. Add selectable component - it's an interactive control. Add event listener - it's a button. The only problem I see right now is event handling (as I mentioned in InputSystem description).

    What's next

    Currently, I'm finishing core and making the code cleaner. Once I finish, I'll release the source code on github. I'm also working on WYSIWYG editor for this UI (with code generation and easier event handling).


    My DOTS feedback

    This post was quite long :). I dived deep into details, and now it's time to share my thoughts.


    I love the Burst Inspector. As someone with good assembly background, I can easily check results and find bottlenecks in the code (usually my mistakes). Actually, I was surprised how good Burst is in its current stage. I didn't find any case where burst missed an opportunity for optimization. And vectorization working out of the box... I've been working on path-tracer in C++ and I know how hard it is to write code with SIMD support. Being able to write it in C# is just incredibly easy. Typical job execution is about 20-100x faster with burst compared to managed C# code. I was thinking about further improvements to the burst and here are my thoughts:

    1. NativeList<> performance. Right now burst can compile direct pointer access for NativeArray, but struggle with NativeList. Same with any user-written native container. The burst should compile similar code for lists, arrays, and all other containers with linear memory layout.

    2. Non-jobified code support. This is a must-have feature. Burst is just too powerful. I have a lot of main thread code without managed types for potential burst optimization. Even without vectorization, burst compiled code is **at least** 2-10x faster than managed C#.

    3. Static read-only/const arrays. They are great candidates for Burst, because:

    * They are immutable (thread safe)
    * They can be easily optimized to constant values at compile time (no memory access at all)
    * We don't care about their allocation/deallocation, they just exist without initialization from code

    Example use cases are MD5 and LZ4 algorithms. I already made jobified implementation of these, and they are way faster compared to C# implementations. However, I was forced to make hacks for const arrays (persistent NativeArrays with [ReadOnly] attributes).

    4. Burst inspector readability. It's a great tool, but it could be easily improved:

    • Instructions coloring. I already made a modification to burst inspector code, just to test things, and it's way easier to read:

    • Ability to filter/gray out engine code. Since we can detect source file and line, it could be useful to focus on actual job code. Currently, there is a lot of code from chunk management, component data access, and native containers. It makes it harder to inspect the user-written code.\
    • Filter functions. Currently, all compiled functions are combined in one plain text wall. There could be a combo box with compiled functions to select.
    • Make selectable lines with Ctrl+C support.
    • Clickable jumps. Usually jumps lead to labels within a function. It shouldn't be hard to do.
    • Open burst internal types for community development. We could implement all of these features in a few days, but we don't have access to internal types from Burst.Runtime. Some unity packages are explicitly defined to see burst internal types. Why not make these types public? I think we miss an opportunity to make good open source tools by the community.

    1. [DeallocateOnJobCompletion] for NativeHashmap (and other custom containers).
    2. SetSharedComponentData<T>(NativeArray<Entity>, T). Since setting shared component data affects chunks layout, it seems like possible performance improvement to set one shared component for many entities (especially for the same archetype).

    Other improvements

    1. Better memory layout visualization. I'd like to see where are my entities, components, and dynamic buffers, just like in the new Memory Profiler package. Something similar for entities could be very useful (chunks as zoomable bricks with components, sizes, and empty spaces).
    2. Transform-independent hierarchy system. Parent-children dependency is not always tied to matrix representation. Current Parent System requires LocalToWorld and LocalToParent components. For this reason, I had to write my own hierarchy system which is basically copy-pasted Unity's implementation with my UI components. I'd like to see simple and fast hierarchy with just Parent component and Child buffer (+all necessary "hidden" components like PreviousParent).
    3. NativeArray/NativeSlice APIs in UnityEngine (Mesh, CommandBuffers etc.). *I know this is being worked on as we speak (2019.3). I just wanted to say how badly we need that feature.*
    4. Value-type textures, meshes, etc. They are wrappers over native IDs and buffers anyway, so it could be possible to keep IDs in ComponentData instead of references in SCD.
    5. Visualize jobs dependencies. Sometimes my jobs are waiting for each other, and there is no easy way to find which resource is blocking them from running in parallel. I'd like to see some kind of "debug mode" where all dependencies are tracked and displayed in the editor.
    6. Ability to free all leaked allocations. Sometimes my TempJob allocations are lost (in example exception is thrown after allocation but before job schedule) and editor starts throwing warnings. The only way to clear these warnings is to restart the editor.
    7. Make adding/removing empty components cheaper. In current design adding/removing empty components requires relocation. Example:

    The only difference between these two chunks is RectMask. It's an empty component. Can anyone explain it to me, why do they have to be stored in different chunks? I do not expect a simple "out of the box solution". I can accept more boilerplate code just to make these operations faster.

    Thats all. Thanks for reading. I hope you enjoyed it :).
    Last edited: Aug 7, 2019
    pcg, pedro_unity, AVOlight and 33 others like this.
  2. Lars-Steenhoff


    Aug 7, 2007
    I don't understand any of it but the performance increase is amazing!
    Good work!
  3. starikcetin


    Dec 7, 2017
    Wow. Amazing.
  4. Xerioz


    Aug 13, 2013
    This is pretty amazing, it makes me wonder if UIElements ( for runtime ) will be ECS-based in some way.
  5. francois85


    Aug 11, 2015
    Awesome work .

    Can chunks contain different archetypes? I dont know why but I was under the impression that only one archetype per chunk, whitch is why I thought removing a tag would trigger reallocation.
  6. sngdan


    Feb 7, 2014
  7. supron


    Aug 24, 2013
    It looks like this is true. That's why I'm asking if there is any reason in deep ECS core design that prevents implementing zero-cost tagging components.
  8. Brendon_Smuts


    Jun 12, 2017
    Entity queries are done at the archetype level. If you have the ability for entities inside an archetype to have component tags that aren’t described in the archetype you now need to filter at the entity level. Essentially you’re throwing away query efficiency to prevent moving an entity, while the entire point of component tags is to query efficiently. It’s probably safe to say the benefit of fast efficient queries is greater than saving on the memcopy.

    You can already do “entity tagging” using some sort of flag component that exists in all interested entities and has its flag value modified/inspected against instead of adding or removing the component. This is pretty much what “zero-cost” tagging would have to do under the hood anyway.
    Seb-1814 likes this.
  9. supron


    Aug 24, 2013
    That was exaclty what I was thinking about, but more implicit on the core level.
  10. rsodre


    May 9, 2012
    As I understand tags don't affect chunks, so adding a tag to an entity changes it's archetype, but not it's chunk.
  11. felipin


    Nov 18, 2015
    each chunk has only one archetype, each archetype can have multiple chunks, if you change the entity archetype then you change its chunk.
    Brendon_Smuts likes this.
  12. psuong


    Jun 11, 2014
    Very nice work! :)
  13. 5argon


    Jun 10, 2013
    This is insane!

    But how would you animate your UI without Animation/Animator/Timeline/PlayableDirector toolings? I have been thinking of making something like this before I just can't let those go even if there are potential in ECS based layout : (
    GilCat likes this.
  14. Joachim_Ante


    Unity Technologies

    Mar 16, 2005
    Great post. Really cool stuff.

    Burst #1 have you see NativeList.AsNativeArray() also NativeList.AsDeferredArray() to have the job system do the conversion to array at execute time.

    A list has an extra indirection, due to resizability. So there will always be perf difference. But there is probably improvements we can make.

    Burst #3 should be supported in Burst 1.0 is there a specific static readonly data that didn't work for you?
  15. GilCat


    Sep 21, 2013
    Glad you took the time to go forward on this. UI is my main bottleneck so I'm really looking forward to try what you have done.
    Great work, thanks :)
  16. Sarkahn


    Jan 9, 2013
    Excited to see where this goes, I feel like I could learn so much from this. Thanks for your hard work!
  17. supron


    Aug 24, 2013
    It's not on my priority list. I'm focused on coding a fast and stable core. Animation is a high level feature which can be built on top of my UI. I'll release code with MIT license, so everyone can contribute to this project.

    I checked it in earlier versions and wrongly assumed that it was not on your road map. Now it works. Great job!

    I made a new input system from scratch. The old was more like oop than dots style and had a lot of issues. In the new system, mouse/keyboard events are entities with a component containing target entity and DynamicBuffer with events. This design avoids archetype changing and unnecessary relocations. It requires another job to filter event entities but it's still faster than the old design.

    Today I added child->parent event propagation. If a child receives a pointer enter event, a parent gets it too. It matches UnityEngine.EventSystem behavior. Example:

    There are still missing core features, and there is no easy way to create DotsUI. I have API to convert the entire canvas hierarchy to DOTS, but it's messy and has some bugs. I'll post source code ass soon as I finish new input system and UnityEngine.UI -> Dots conversion. Not sure when.
    Last edited: Jun 16, 2019
    Djayp, 5argon, florianhanke and 2 others like this.
  18. Lucas-Meijer


    Unity Technologies

    Nov 26, 2012
    Hey Supron,

    Posts like this are incredibly valuable for our teams. A single "I played around with this for real for a few weeks, and this is my writeup", is worth a 100 "this button doesn't work" :). (we do want to know if a button doesn't work tho :p).

    There's a lot to sink in, I just wanted to respond with a thanks, and a comment on: main thread burst support. While not as easy as slapping a [BurstCompile] onto any random method, what you can do is make a IJob struct, and invoke .Run() on it on the mainthread. That will get you burst speed for your mainthread code, allthough it is still constrained to the same constraints burst is usually constrained too.

    Good luck, looking forward to see where you're taking this,

    Matsuguma, sschoener and dzamani like this.
  19. tim_jones


    Unity Technologies

    May 2, 2019
    Thanks for this feedback, it's really useful. Specifically regarding the feedback on Burst: @Joachim_Ante and @Lucas-Meijer have addressed (1), (2), and (3), and we on the Burst team will discuss the suggestions in (4).
    Matsuguma, dzamani and 5argon like this.
  20. johnnyt


    Jun 20, 2013
    Maybe this work will motivate Unity team to come up with an UI implementation based on DOTS.
  21. 5argon


    Jun 10, 2013
    I hope that new runtime UIElements feature could ended up using DOTS backend automatically, maybe FB Yoga layout implemented on DOTS, UQuery that's fast because of DOTS, etc..
  22. benoitd_unity


    Unity Technologies

    Jan 2, 2018
    Wow, impressive indeed!

    In order for UIElements to be used to build Editor UI, it has to be part of the Unity engine, and because of that, we can't leverage all the awesomeness of ECS and Burst.

    That being said, once we deliver runtime support, we will look at extracting UIElements so we can improve performance in key areas using DOTS.
    psuong, optimise and 5argon like this.
  23. brunocoimbra


    Sep 2, 2015
    Considering that Unity wants to give us "Performance by default", wouldn't make sense to turn ECS and Burst part of the Unity engine then? I am not exactly making a feature request here (but yeah, that would be great!) just trying to understand why chose to block Unity features like UI from using the awesomeness of ECS and Burst instead of just making the Unity engine awesome by default.
  24. Joachim_Ante


    Unity Technologies

    Mar 16, 2005
    That would prevent us from releasing on our own schedule while we are still in preview, hence we are not doing that... Instead we move the parts that we want to build on top of dots out and into packages, which is in any case a better approach to developing our features.
    optimise, GilCat and brunocoimbra like this.
  25. starikcetin


    Dec 7, 2017
    But you are implying it is possible in the future, right?
  26. Joachim_Ante


    Unity Technologies

    Mar 16, 2005
    No. What i am saying is that in the future Unity itself will become a smaller and smaller core, with a bunch of packages representing the features and everything that we want to have fast C# code will become a package.

    As a user not much really changes. You create a project and by default it just includes a bunch of default packages...

    Hence there is no need for Entities or burst to ever be built into Unity directly.
    spacepluk and GilCat like this.