Search Unity

  1. Unity 2019.2 is now released.
    Dismiss Notice

200k dynamic animated sprites at 80fps

Discussion in 'Data Oriented Technology Stack' started by FabrizioSpadaro, Jun 16, 2019.

  1. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Hi guys, today I finished my sprite sheet system using a custom shader, compute buffers and DrawMeshInstancedIndirect, to FAST render, a lot of object with a single draw call.
    I would like to know what approach are you using, to share some knowledge about this topic since there is no "official" implementation of a sprite sheet system with ECS yet.
    I'll post my source code later this afternoon :D


    Feature:
    • Each entity can have different: Sprite, Color, Position, Scale, Rotation, Material
    • Each entity can have an animation: Loop, PingPong, Once
    • Only entities inside the camera bound gets drawn(W.I.P)
    BurstCompiler must be toggled ON
     
    Last edited: Jun 20, 2019
  2. GilCat

    GilCat

    Joined:
    Sep 21, 2013
    Posts:
    433
    Are those 200k instances of the same mesh (quad)?
     
  3. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Yeah, the mesh is the same among all the sprites, but they have different buffer arguments.
     
    foxnne and GilCat like this.
  4. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    yuliyF, starikcetin, siggigg and 4 others like this.
  5. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,714
    Looks good. What is missing is integrating into the conversion pipeline.

    Check out MeshRendererConversion.cs for an example. This way you could just have SpriteRenderer's as game object for editing and everything gets converted to optimised baked format for loading in the game.
     
    FabrizioSpadaro and foxnne like this.
  6. GilCat

    GilCat

    Joined:
    Sep 21, 2013
    Posts:
    433
    Looks pretty nice.
    I wonder if it is possible to use a similar approach to the one taken by SpriteRenderer where with a given atlas it can (apparently) render multiple meshes with just one draw call.
     
  7. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Not yet, but you can achieve the same result using a different approach:

    Given this sprite atlas, for example, you can set the SpriteSheet component as following
    Code (CSharp):
    1. new SpriteSheet { spriteIndex = 1, cell = new int2(9, 5) }
    and just by changing the spriteIndex you can select any sprite you want.

    The limit of this system is that you need sprites with the same size so that you can divide them into equal cell size.
    I'll be soon implementing a system that supports actual atlases where sprites don't always have the same size.
     
    RaL, foxnne and GilCat like this.
  8. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Right,thanks, after I implement some more feature I'll be working to include this in the conversion pipeline :D
     
  9. GilCat

    GilCat

    Joined:
    Sep 21, 2013
    Posts:
    433
    Yes, i've tried and unsderstood.

    I'm doing a similar approach but using DrawMeshInstanced witth MaterialPropertyBlocks and a custom shader with instanced properties (eg. color, tile, offset) and achieving the same as your DrawMeshInstancedIndirect/ComputeBuffer approach.
    The performance is not as good as using DrawMeshInstancedIndirect because of the imposed 1023 instance limit on DrawMeshInstanced.
    That is pretty straight forward. Have a look at MeshRendererConversion inside Unity.Rendering.Hybrid package as it should be similar in some ways.
     
  10. foxnne

    foxnne

    Joined:
    Apr 18, 2016
    Posts:
    59
    Fantastic, this is almost exactly what I was going to attempt very soon. Thank you for sharing this!
     
  11. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Yeah, I tried every approach possible... the only one I didn't try was geometry shaders.
    And out of anything I tried, DrawMeshInstancedIndirect was the fastest.
     
  12. LennartJohansen

    LennartJohansen

    Joined:
    Dec 1, 2014
    Posts:
    2,309
    You could probably speed up even a bit more by moving the culling to a compute shader.
    Add all the instances to a compute buffer as you do. Then use a compute shader to frustum cull and add the visible instances to an append buffer you render with DrawMeshInstancedIndirect.
    This way it would not need any data copied from CPU to GPU every frame.
     
    Seb-1814, GilCat and foxnne like this.
  13. foxnne

    foxnne

    Joined:
    Apr 18, 2016
    Posts:
    59
    Are you handling sorting in any way? Would it be possible to modify this to draw sprites with a lower Y on top?

    Also, can you supply additional data to the compute buffer to access in the shader?
     
    Last edited: Jun 18, 2019
  14. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    You are right, I already knew that my occlusion culling system was not perfect, but it's not even taking 1% to run, so I didn't complicate thing right at the beginning.
    I'll be surely editing this part in a new feature.
     
  15. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    I already wrote a sorting effect based on the lower Y, it's gonna be available in the next version(rotations are also supported).
    To supply additional data to the shader you need to create a ComputeBuffer just like I do, and then communicate the values to the linked variable inside the shader.
     
    Last edited: Jun 18, 2019
    foxnne likes this.
  16. GilCat

    GilCat

    Joined:
    Sep 21, 2013
    Posts:
    433
    Another suggestion would be to use Unity builtin SpriteAtlas instead of your own.
    In my renderer i do it by forcing the atlas to use FulRect sprites then via instancedProperty/ComputeBuffer you can set the offset/tiling and scale. Works just fine.
     
    Peter77 and foxnne like this.
  17. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Little update: pushed a new version with z-sorting and rotation support!
    I am currently working on a system to calculate tiling and offset without having equally-sized sprites
     
    foxnne likes this.
  18. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Another Update: I rewrote the UV system and now is way easier to configure a sprite sheet.
    All you need to do is a material with a texture and if that texture is a sprite with SpriteMode: multiple, it will automatically bake the uvs at the beginning and access them with a dynamic buffer inside the RenderDataSystem.
    Code (CSharp):
    1. EntityManager entityManager = World.Active.EntityManager;
    2.     spriteSheetArchetype = entityManager.CreateArchetype(
    3.       typeof(Position2D),
    4.       typeof(Rotation2D),
    5.       typeof(Scale),
    6.       typeof(Bound2D),
    7.       typeof(SpriteSheet),
    8.       typeof(SpriteSheetAnimation),
    9.       typeof(SpriteSheetMaterial),
    10.       typeof(UvBuffer)
    11.     );
    12.     NativeArray<Entity> entities = new NativeArray<Entity>(200000, Allocator.Temp);
    13.     entityManager.CreateEntity(spriteSheetArchetype, entities);
    14.     float2[] cameraBound = Bound2DExtension.BoundValuesFromCamera(Camera.main);
    15.     float4[] uvs = SpriteSheetCache.BakeUv(material);
    16.     for(int i = 0; i < entities.Length; i++) {
    17.       float2 position = cameraBound[0] + new float2(UnityEngine.Random.Range(-cameraBound[1].x / 2, cameraBound[1].x / 2), UnityEngine.Random.Range(-cameraBound[1].y / 2, cameraBound[1].y / 2));
    18.       entityManager.SetComponentData(entities[i], new Position2D { Value = position });
    19.       entityManager.SetComponentData(entities[i], new Scale { Value = 1 });
    20.       entityManager.SetComponentData(entities[i], new SpriteSheet { spriteIndex = UnityEngine.Random.Range(0, 16), maxSprites = uvs.Length });
    21.       entityManager.SetComponentData(entities[i], new SpriteSheetAnimation { play = true, repetition = SpriteSheetAnimation.RepetitionType.Loop, samples = 10 });
    22.       entityManager.SetSharedComponentData(entities[i], new SpriteSheetMaterial { material = material });
    23.       var lookup = entityManager.GetBuffer<UvBuffer>(entities[i]);
    24.       for(int j = 0; j < uvs.Length; j++)
    25.         lookup.Add(new UvBuffer { uv = uvs[j] });
    26.     }
    27.     entities.Dispose();
    Thanks to @GilCat for the good feedbacks
     
    tarahugger, GilCat and foxnne like this.
  19. foxnne

    foxnne

    Joined:
    Apr 18, 2016
    Posts:
    59

    Thanks for the updates!

    In the bounds system, you set the bounds scale to the Scale from entities, but your bounds scale is a float2.

    Does this mean bounds scale could be a float?

    Edit: I now see that the scale float is copied to both x and y so they always equal one another and scales both axes together
     
    Last edited: Jun 19, 2019
  20. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Yeah I decided to support only one axis to make everything fit inside a float4: positionx, positiony, rotationAngle, scale

    I might change it later, I think!

    Edit:
    In the Bound2D the scale is a float2 because I test the Intersection between camera/entity.
    Even if an entity has always the same scale XY, the camera has different scale XY.
     
    Last edited: Jun 19, 2019
  21. foxnne

    foxnne

    Joined:
    Apr 18, 2016
    Posts:
    59
    Thanks for the info! That makes sense.

    Wouldn't your Intersect method need some reference to a mesh size though? I suppose it doesn't matter as long as the quads are 1x1, but if using 1x2 for a character sprite wouldn't scale have to be 1,2 for it to calculate intersection correctly as is?
     
    Last edited: Jun 19, 2019
  22. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Yeah, it's only gonna work for 1x1, I am moving the occlusion culling system inside a compute shader and I will do the calculations way better there.
    It might take a while to implement the new occlusion culling using the compute shader, but I'm working on it :D
     
  23. foxnne

    foxnne

    Joined:
    Apr 18, 2016
    Posts:
    59

    No worries :) I'm just learning and trying to wrap my head around how you're doing all this. Thanks again!
     
  24. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    95
    Great work! I created a pull request with a simple example scene:
    54FD73EuaW.gif
    If you're not already working on it I may try to set up instance colors. Regarding performance - your repo says:
    200k entities were rendered at 80 fps on a Mid-2015 MacBook Pro

    Was that in the editor? I'm definitely not getting that kind of performance.
     
    foxnne likes this.
  25. GilCat

    GilCat

    Joined:
    Sep 21, 2013
    Posts:
    433
    I can confirm that performance on the editor on the initial commit. Haven't tested the latest.
     
  26. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    95
    Regarding this change - it's a great idea and works well, but atm you're using it in runtime code so it's causing build errors. I guess the shared state could be written into a scriptable object or something so we can access it at runtime.
     
    Last edited: Jun 19, 2019
  27. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    95
    Made another pull request - added per instance color data:
    teTT9KPxfC.gif
     
    Lars-Steenhoff likes this.
  28. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Yeah I know about that, that's why I will change to sprite atlas in the next update to get sprites information
     
  29. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Thanks about that :D , I will merge it soon
     
  30. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Ok so, the fps is linked to the texture size and how many sprites that animation have, in my example it was a 512x512 with 16 sprites, having a bigger texture and lot of sprites might slow thing a bit.
    Regarding the slow fps in editor, I noticed something wired, after using DynamicBuffer I saw that the fps got lower when I was inside the scene view, but they were normal when I was on the game view
     
    Last edited: Jun 20, 2019
  31. tylo

    tylo

    Joined:
    Dec 7, 2009
    Posts:
    130
    I have a 4k sprite sheet, I am guessing that would bring it to a crawl? ;)
     
    FabrizioSpadaro likes this.
  32. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Yeah, it would be a good stress test :D
     
  33. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    I just refactored a bit your code and merged it on master, after some little optimization, I managed to make it run at 78-84fps with 200k sprites.
    Thanks for the PR, appreciate it!

     
    Last edited: Jun 20, 2019
    florianhanke, jkuit and Sarkahn like this.
  34. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    95
    Weird, are you getting that from my example scene or from your own? Mine still chugs like crazy with ~200K visible entities, and I'm on a pretty beefy machine:
    WdpVpIgEpx.png
    Apparently most of it is just from copying the data. Looking at the RenderSystem it seems like there's room there to do the work parallel over time, I will poke at it some more tomorrow.
     
  35. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Yeah, most of it is just accessing the NativeArray, and there is not much I can do there.
    I know it might sound stupid, but... Do you have the burst compiler on?
    I posted a screenshot and it's 80 fps on your scene.
    Disabling the occlusion system is 85+ fps, without going above 80 fps

    without occlusion ^
     
  36. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Update:

    changed:
    Code (CSharp):
    1. matrices[i] = new float4x2(renderData.transform, renderData.uv);
    to
    Code (CSharp):
    1. matrices[i] = renderData.matrix;

    gained about 4-6 fps each frame, just by not creating a float4x2 every frame.
    Disabling the occlusion culling system gifts 10 fps more(yeah I do need a compute shader now...)

    Now 200k sprites got rendered from 89-104 fps
     
    Last edited: Jun 20, 2019
    Xerioz likes this.
  37. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    95
    I'm the stupid one - that was indeed the problem. I had no idea this whole time from when I started messing with dots that you had to manually enable burst in the editor. Since it still spits out BurstCompile errors at you I guess I assumed burst was enabled. Good to know!

    Now With burst enabled and in a worst case scenario with 200K sprites where nothing is being culled I get ~50FPS in the editor. Much better! Still not seeing your level of performance though for some reason.

    Unity_s5cWyfNAuJ.png
     
    Last edited: Jun 20, 2019
    GilCat likes this.
  38. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    That's wired... Make a screenshot of the profiler if you can, and remember to pull the latest version!
     
  39. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    928
    Nice. Was about to migrate my outdated meshinstanced but you already did it ;)

    Just checking source from mobile, it seems you could make use of changed filters (ie if colors don’t change between frames, the job can just skip the chunks - might make sense for other things like scale too)

    Edit: You would have to pull color in a separate component though...

    Edit2: you might also want to think about stuffing various things in your float4 matrix - likely better to have dedicated icd’s

    Edit 3: It should also be better to loop through your renderbuffer ijob and schedule them in parallel, combine handle and call complete. Then loop through again and drawmeshinstanced
     
    Last edited: Jun 20, 2019
  40. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Right, I added them, thanks.

    You mean another job?

    Yeah I was thinking the same thing, will do it soon.
     
  41. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    928
    No, I did not mean another job. I think you had a component like this
    IComponentData {
    var Something;
    var Color;}

    The something was likely to change every frame, so the version number is bumped, even if color does not change. If you created 2 separate Icomponendata, you could filter didchange more granular
     
  42. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    95
    This is using the latest version, and I switched my unity version to 2019.1.2 since it looks like that's what you're using in the screenshot. Oddly enough downgrading my unity version seems to have given me ~10 fps gain.

    I'll admit I'm pretty new to threading so I'm not entirely sure how to parse this:
    Unity_meRWST2LPS.png
    PY8XVoXS5E.png
    eRRH17uQR6.png
    Should there be that much idle time on the worker jobs?
     
  43. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    95
    I wanted to see if I could get my performance closer to what other people seemed to be having. I ended up rewriting pretty much everything but it works - I can get around 75fps with 250k animated sprites in the editor:
    VDQ9dtoWIK.gif
    And judging from the profiler it seems like there's still a lot of room for improvement.

    I moved all renderdata related stuff into dynamic buffers so you can just re-interpret them and copy directly into the compute buffers. Right now all the dynamic buffers are on a single entity though, I think if I spread them out like sngdan is suggesting the performance could skyrocket. Dots continues to impress.

    I also embeded the uvs into a render buffer so they aren't duplicated across entities. All they need is an index and they can access any uv from inside the shader.

    Some dots feedback - shared component data continues to be super awkward. For a lot of my jobs I needed to split work out based on SCD - I had to write a whole lot of boilerplate - even worse by needing to pass BufferFromEntity and Entities around. I was able to lessen it a bit by writing a generic system but still - highly annoying. And I think I saw somewhere that eventually we will be able to access SCD from inside a job as long as it's only using blittable types. That would be VERY nice.

    Edit: Code for anyone who is interested. The code is a mess right now though so I doubt you'd want to merge it back into master as is:
    https://github.com/sarkahn/SpriteSheetRenderer/tree/Rewrite
     
    Last edited: Jun 21, 2019
    Antypodish, 5argon and foxnne like this.
  44. foxnne

    foxnne

    Joined:
    Apr 18, 2016
    Posts:
    59

    Great job, would love to take a look at what all you changed. I'm working through this with far less experience with ECS... But I find this fascinating.

    Not sure if it's benefitial or not but I just saw that in the latest 2019.3.b06 build they added Graphics.DrawMeshInstancedProcedural which is described as like Graphics.DrawMeshInstancedIndirect but without compute buffer requirement?
     
  45. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    95
    From what I can see here: https://docs.unity3d.com/2019.3/Doc...nce/Graphics.DrawMeshInstancedProcedural.html it seems like it just lets you specify instance count from script. You would still need to pass any other data through Property Blocks or Compute Buffers I think. But I haven't messed with it, maybe I'm missing something there.

    It seems like copying the dynamic buffers to nativearrays to pass into the compute buffers is biggest bottleneck atm. Right now I'm just copying all the data every frame - there's a lot of room for improvement by keeping compute buffers around and only copying when the data changes.
     
    Last edited: Jun 21, 2019
  46. davenirline

    davenirline

    Joined:
    Jul 7, 2010
    Posts:
    497
    Instead of using the SpriteSheet component, is there a way to manually set a UV and sprite size? My use case is that I create an atlas at runtime so I want to set the UV at runtime, too.
     
  47. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    928
    on mobile again...just had a quick look at @Sarkahn update...difficult to navigate.... many optimization opportunities...

    I also like to work with dynamic buffers, any reason you copy to native array? just .Reinterpret<float4>().AsNativeArray should work, no need to allocate & dispose new memory
     
    Sarkahn likes this.
  48. FabrizioSpadaro

    FabrizioSpadaro

    Joined:
    Jul 10, 2012
    Posts:
    245
    Well done, I first introduced Uvs as DynamicBuffers, and after @sngdan 's comment I was already working on splitting RenderData into DynamicBuffers.
    Thanks for the example you provided me, it will be much easier to integrate them now :D
     
    Sarkahn likes this.
  49. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    95
    I thought I was doing that, it's the whole reason I switched to dynamic buffers in the first place - apparently i was pretty tired last night, hahah. Fixing that let me get up to 350k entities while still getting around 60FPS.
     
    foxnne likes this.
  50. foxnne

    foxnne

    Joined:
    Apr 18, 2016
    Posts:
    59
    Alright I see.

    I was working on something similar to this on the non-ECS side, and I was working with only using integers as shader graph has a node for calculating UV called flipbook, which accepts just a single number as the index of the slice you want to display.

    Would this work here? Calculate the sprite rectangle sizes and number of sprites through shader and then only have to pass a single int per sprite rather than a float4?
     
    Last edited: Jun 21, 2019