Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

GPU Instancing with different textures?

Discussion in 'Shaders' started by SeriouslyNot, Jun 24, 2020.

  1. SeriouslyNot

    SeriouslyNot

    Joined:
    Nov 24, 2017
    Posts:
    121
    I'm making use of GPU Instancing in my 2D game and it really reduces draw calls significantly when using the same Material and same sprite (texture) on all batched gameObjects.

    I'm changing colors using materialPropertyBlock and everything works fine.

    But is there a way to do that with different sprites (textures) on batched gameObjects' material?


    What if I used a Sprite atlas to gather all sprites together, will that make GPU Instancing possible with different sprites?
     
  2. flyer19

    flyer19

    Joined:
    Aug 26, 2016
    Posts:
    72
    try add sprite uv in materialPropertyBlock
     
  3. SeriouslyNot

    SeriouslyNot

    Joined:
    Nov 24, 2017
    Posts:
    121
    You mean _BaseColor, i added color as per-instance property, i'm talking about Texture. You can't add per-instance Texture property because GPU Instancing gather all objects that use the same Material and the same Texture.
     
  4. Samuel_Herb

    Samuel_Herb

    Joined:
    Jun 23, 2020
    Posts:
    8
    If your sprites are all packed into an atlas, you should be able to have a per instance float4 that defines what part of the atlas each sprite uses. I think that's what flyer19 was getting at.
     
  5. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    11,904
    There are kind of a few questions being asked here:

    Can you have instanced Sprite Renderers that use different sprites within an atlas and still instance together?
    No. Unity's Sprite Renderer only supports sprite color and flipping to be instanced properties, as that's all that's built into the default sprite shader.

    Can you have instanced meshes that use different textures and still instance together?
    No. This isn't something GPUs can do. You can setup numerical values to be instanced, because the way instanced values work is by putting them all in an array, and GPUs only support arrays of numerical values. They cannot do arrays of textures.*

    Can you have instanced meshes that use a different sprites within an atlas and still instance together?
    Yes!
    With the caveats of the sprite atlas must have rectangular bounds for each sprite, so no tight packing, and you'll need to write your own custom instanced shader and c# scripts that can handle the UV offset & scale as well as potentially individual mesh transform scale if your have a variety of sprite sizes.

    To do this you can either render them manually using Graphics.DrawMeshInstanced and setup the relevant data arrays yourself, or by using regular MeshRenderer components, and using a script that modifies material properties via a MaterialPropertyBlock to set the texture offset and scale to adjust the UVs to the specific sprite you want that quad to display, as well as expose the render order and render layer properties on the mesh renderer component (which it has, but just doesn't expose or serialize like Sprite Renderer does).


    * DirectX 12 and Vulkan have support for bindless textures, which effectively allow for "arrays" of textures, but I don't think you can use those with Unity at the moment. There is also the Texture2DArray texture type, which is a "single texture" as far as the GPU is concerned, but it has multiple layers. It's effectively a single 3D texture without support for blending between layers. This is different than an array of textures.
     
  6. Binary42

    Binary42

    Joined:
    Aug 15, 2013
    Posts:
    199
    Sorry that i still have to ask in spite of your very good answer, but im struggling for quite some days now with this problem and atlases come with some drawbacks.
    Is instancing with a Texture2DArray supposed to work?
     
  7. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    11,904
    Instancing absolutely works with a Texture2DArray. But the same limitations exist. Ever instance needs to be using the same texture (again, a Texture2DArray is a single texture as far as the GPU is concerned) to instance together, but you can use an instanced property to select the layer index so each instance can be visually a "different texture". It also means you need a custom shader that supports this.
     
  8. Binary42

    Binary42

    Joined:
    Aug 15, 2013
    Posts:
    199
    @bgolus Thanks a lot for for your great support across the board!
     
  9. Arthur-LVGameDev

    Arthur-LVGameDev

    Joined:
    Mar 14, 2016
    Posts:
    217
    Does Metal support Bindless textures? Is Unity support for them coming in the future, any idea?

    Sorry to hijack this thread, have stumbled on it multiple times today -- the data-structure I'm really looking for is Texture2DArray[] -- that is, an array of T2DArrays. I've got some [lots of] "TV Screens" that playback video, and I want to be able to instance them (via InstancedIndirect) & keep all of the animation/frame selection on the GPU, and be able to instance them together even if they're playing different "videos".

    Something like this -- pseudo-ish code:
    Code (CSharp):
    1.     StructuredBuffer<float> VideoDataPerTV;
    2.     UNITY_DECLARE_TEX2DARRAY(_Videos[]);
    3.  
    4. vert() {
    5.     int VideoID =  VideoPerTV[unity_instance_id];
    6.     UNITY_SAMPLE_TEX2DARRAY(_Videos[VideoID], float3(uv, _Time % 100));
    7. }
    Any thoughts/ideas on how might be best to achieve something like this?

    I could use a bunch of if/elseif statements and check against an ID & hard-code that to one of the multiple T2DArrays, but that's awful & ugly -- though it does work, at least until I hit the sampler/texture limits.

    Next best idea I've got is 1 DrawMeshInstancedIndirect call per "video" -- but that's not ideal for scaling with this load profile, either. WTB a T2DArray[] structure, badly.
     
  10. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    2,651
  11. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    11,904
    Yeah, that. You already have an ID to select a specific texture array, instead have that be the first frame index from the array for a specific frame set. If all animations are 100 frames (or some other fixed count), then you’re done. Otherwise pass in a second value that’s the frame count to % against. Most modern GPUs support 2048 or more layers, including any that support Metal. Worse case you could use an array atlas.
     
  12. Arthur-LVGameDev

    Arthur-LVGameDev

    Joined:
    Mar 14, 2016
    Posts:
    217
    Yeah, that's probably the best solution though I was hoping to be able to load/unload them a bit "smarter" to save on VRAM, but not sure that's going to be realistically doable while also achieving the goal of absolute-minimal CPU load. It may require downscaling the frames a bit, and I've got a few videos that are different resolution from the rest so would need to fix that as well, or perhaps go with 2 "combined" T2Ds, one for each distinct resolution.

    Context: I've got ~5k total frames of animation across approx. 30 "videos" which are variable-length but ~100 frames median each -- the videos frames are mostly 512x864 though a handful are a lower 16:9 resolution. Long term our goal is to add more of these 'videos', heh.

    An example video at the 512x864 resolution, it has 109 slices & weighs in at 30.7MB with DXT1 compression; that's a little steep but perhaps we downscale them a bit, can probably afford ~500MB of VRAM for this aspect of the game/feature.

    Pardon my ignorance, what is a "array atlas"?

    Am going to go ahead & try combining them today and see how that performs -- really I know it'll perform well, the real question is how much budget/headroom that'll leaves us on VRAM. Probably is about the best case though, and if needed we could "intelligently" load/unload data into the T2DArray, which is something we already do for other areas of the game's rendering, though that's likely less palatable here due to high #s of textures to move in/out (vs single textures coming in/out in the existing "intelligent load" setups).

    Thank you -- sincerely appreciate the ideas & guidance a ton.
     
  13. Arthur-LVGameDev

    Arthur-LVGameDev

    Joined:
    Mar 14, 2016
    Posts:
    217
    Using a single T2DArray indeed seems the best solution and I was able to get it PoC-level implemented and pretty much fully functional today. I've got a bit more work to do, resizing a few of the individual T2DArrays that were at different resolutions, but concept-wise this definitely seems to be the best solution and keeps the CPU side extremely light. The only real downside is initial startup time, but we can amortize it and/or pre-build the composite T2DArray or similar to mitigate that if needed.

    Honestly, I feel a little bit dense, because I thought this problem through for quite some time yesterday -- definitely longer than it took me to actually implement the combined concept/solution today. Alas, very much appreciate the second set of eyes, thank you! :)

    I'd still be curious what "array atlas" is though, btw! :)

    Ty!!
     
  14. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    11,904
    A texture atlas is just a single regular 2D texture with multiple images in it. Sometimes in a grid, but could be any kind of non-overlapping layout. Light maps, sprites, or just basic UV mapping are all common use cases of texture atlases. Another common use for VFX is for storing animations in a single texture, like multiple sprites, but usually on a fixed grid so that the shader only needs to know the width & heigh, number of rows & columns, and total number of frames, and can figure out the proper scale and offset for any frame number using basic math alone.

    For a atlas array you just use both the texture array and an atlas. You could have each layer be an atlas, or you could pack individual frames across multiple layers at a specific position in the array. The later is a little more work to setup, but easier for the shader.

    The real issue you have though is the sheer number of frames you want. It’s going to be a lot of memory no matter what you do. Once you get to the number of frames you’re talking about, it almost always comes down to needing some kind of compromise. Either downscaling the videos significantly (like max 256x256 or lower), or lowering the frame rate and use frame blending techniques to smooth it out.

    The other thought I have is if you actually have enough screens for this to even matter. Do you have tens, hundreds, thousands, or more than tens of thousands of screens?

    If you have tens or even low hundreds, instancing might not actually be any faster. If you have many thousands or more than tens of thousands, splitting it up so each individual video is instanced separately might end up being faster and way easier to implement. It’s a very narrow window where what you’re trying to do actually makes sense, and the window changes depending on the hardware.
     
  15. Arthur-LVGameDev

    Arthur-LVGameDev

    Joined:
    Mar 14, 2016
    Posts:
    217
    Gotcha -- yeah, definitely no stranger to standard atlases. I wasn't totally sure if you were talking about something specific / more advanced, or if it was a misunderstanding of terminology on my end.

    The instance counts question is a good one; what we're seeing is that the built-in instancing "breaks" once lights get involved, and the apparent outcome is that batch counts go through the roof. More generally, once lights are added our render times skyrocket and batching+instancing both seem to break down.

    By contrast, if we draw with DrawMeshInstancedIndirect it seems our batch counts are precisely what we'd expect based on the total distinct geometry shown on the scene, and [almost] regardless of lighting & whether we're feeding in extra data via buffers for distinct colorization/texture changes/etc.

    On our prior title we encountered similar issues, where SpriteRenderers were just too expensive to use when compared to "manually batching" a big mesh ourselves -- this is slightly different, as we're in 3D, but it seems to be a very similar "issue" where the automatic/default batching & instancing stuff quickly becomes [prohibitively] slower than a purpose-built "render stack" once the most moderate amount of scale and/or complexity comes into play.

    It's also worth noting that our game is completely procedural, the player can build & place items at their whim, and we basically aren't able to take advantage of any baking/similar -- which may just mean we're in a unique spot. Regardless, and perhaps there's something we're just missing here, but it does definitely appear that it's awfully easy to rapidly "outgrow" the built-in stuff once you start adding moderate scale/complexity, though it may just be especially true when working on completely procedurally-driven games, I really don't know for sure.

    FWIW, the "combined T2DArray" approach does seem like the best avenue -- the only other issue I've hit with it thus far is the texture limit, which my main dev hardware hits at 2048 depth on the T2DArray. We'll workaround that by breaking the videos up or similar, as I think it's the least-problematic / most viable approach overall.
     
  16. Tony_Max

    Tony_Max

    Joined:
    Feb 7, 2017
    Posts:
    277
    So Sprite Renderer breakes SRP batching but quad mesh with same texture (wich is the same i guess) doesn't? I thought that Sprite Renderer uses the same Graphics.DrawMeshInstanced. What the difference? I mean for example some developers use Graphics.DrawMeshInstanced to write custom SpriteRenderSystem in DOTS. Is this approach worse then use built-in sprite renderer?
     
  17. Arthur-LVGameDev

    Arthur-LVGameDev

    Joined:
    Mar 14, 2016
    Posts:
    217
    DrawMeshInstanced[Indirect] with lots of quads will out-perform SpriteRenderer, yes -- by orders of magnitude, even.

    Especially if you also use a custom shader & Texture2DArrays -- throw in [one or more] computeBuffers, and you can get extreme performance improvements vs SpriteRenderer, including *LOTS* of sprites rendered in a single very fast draw call. :)

    SpriteRenderer simply doesn't scale. Nor do GameObjects themselves; even at scales where SpriteRenderer *can* keep up, put all of them in motion (ie move each SR transform every frame), and you'll see just how much CPU overhead the GO/transform part adds to it all, too.

    If you want to render anything over (last I checked at least) ~1k sprites, and you need them to be moving at all, then you'll want to roll your own for sure.

    You near-trivially render >500k "sprites" that are *moving* with [one or more] Meshes and DrawMeshInstanced[Indirect] and have *very* good performance -- you won't come anywhere near that number with SpriteRenderer.

    This is all pretty easy to test in a "blank slate" project, too -- and the info that I'm relaying to you above is literally the fruit of doing precisely that type of 'blank slate project' benchmark testing (albeit that testing was done a few years ago & on 2017x Unity versions, but the situation hasn't changed much from my more-recent but lower-scale hands-on experience).
     
    Last edited: Nov 16, 2021
    SeriouslyNot and Tony_Max like this.
  18. Tony_Max

    Tony_Max

    Joined:
    Feb 7, 2017
    Posts:
    277
    Whoa! Thank for this answer, really what i wanted to hear :). Have you advices of how to implement SortingGroups?
     
    SeriouslyNot likes this.
  19. Arthur-LVGameDev

    Arthur-LVGameDev

    Joined:
    Mar 14, 2016
    Posts:
    217
    In *most* cases, for sorting order [not groups], you'll just use your "up" axis for sorting [that axis of the vert positions], and be OK with some overdraw, majority of the time it's not worth worrying about given GPU power these days.

    For sorting groups -- just "group" your sorting order ('up' axis vertex value) -- this is easier in some cases than others, but generally you'll do something like "round" each unit [aka sorting group] to nearest integer value, and then the sprites within that group you'll position, relative to each other, via moving them +/- 0.1f from your up-axis.

    Depending on how many sprites & how much precision you need, this could start to get expensive on the CPU side, but most cases you don't need that level of precision and won't need to do actual ranking/sorting & can just "let it fly" as-is. If you did need precision & you ran into CPU issues, you could conceivably offload that to a compute shader -- but we've never had to go down that path & we have production games that render seriously-large numbers of sprites at once and not run into any issues.

    Another way to do sorting groups, and what we use in production for one of our games (specifically for characters/agents, which are made up from ~6 sprites each in that title), is:
    In your shader have N [ie 6] Texture [or TextureArray] Samplers. Each draw call can then be drawing N 'sprites' *onto the same quad* -- simply by sampling all 6 textures at once and combining the result in the shader. This works well for us, but in most cases this means you're [ab]using things a bit [transparency / discard], and you will see some GPU overhead from the overdraw -- in our case, it was the fastest way to go to get sorting groups, though.

    Concrete example, our "character" shader from the game above, it has 6 samplers along these lines: T2DArray_Head, T2DArray_Body, T2DArray_Arms, T2DArray_Legs, T2DArray_Torso, T2DArray_Mouth

    The result fragment is then (alter to your needs):
    final_color += T2DArray_Head.rgb * T2DArray_Head.a;
    final_color += T2DArray_Arms.rgb * T2DArray_Arms.a;
    final_color += T2DArray_Legs.rgb * T2DArray_Legs.a;
    ..etc..
    final_color.rgb /= 6.0;

    You can customize that to behave however you want, so that your shader is combining the colors to your liking (ie deciding which 'sprite' takes precedence over which, etc). This obviously works best when you're rendering *lots* of a single "type" of like-things. Though if you're only rendering 300 of something, you're reading the wrong thread anyways/optimizing too early. ;)
    FWIW, not saying these are the only ways or even the best ways -- just ways that I *know* will work and that are capable of production-caliber results, based on our experience shipping games in production environments. :)

    Hope that helps!
     
    Tony_Max likes this.
  20. Tony_Max

    Tony_Max

    Joined:
    Feb 7, 2017
    Posts:
    277
    I have started to implement such a system and have faced a problem that i still want to use sprite atlas but now without using SpriteRenderer i need to pass UV for _MainTex by myself. I see that SampleTexture2D node UV input is set to somewhat UV0 channel (which i can't figure out how to access). And i'm a bit confused about how to do it manually. Is there any way to access those UV0/1/2/3 or i need to do some math to cut my texture from atlas?

    Solved: mesh has up to 8 channels, which is Mesh.uv, and can be setted via Mesh.SetUVs, so we can pass Sprite.uv to Mesh.uv assuming they has same vertex distribution (which is left_up -> right_up -> left_down -> right_down)
    Also it is common thing to pass Vector4 data for Tilling and Offset (node in shader graph) which in default shaders is _MainTex_ST, but as i understand in your custom shader you should define your own _ST properties. So using Tilling and Offset you can handle atlas packing.
     
    Last edited: Nov 18, 2021