Graphics.DrawMeshInstanced

Arnold_2013 · Jun 23, 2022

lclemens said: ↑

tried
Click to expand...

I fixed some errors by opening all shaderGraph / VFX graph items and re-saving them. Even shaders that I did not make in shadergraph might be opened by shadergraph when clicking the edit. I had this with the URP/Shaders/AutodeskInteractive... just open it and save. Only issue is that the error is only triggered when the shader is used... so hope I don't miss too many of them

The URP convert all material has been moved into windows -> rendering -> Render Pipeline Converter but so far it has not been the magic bullet for me (this was my goto solution for anything pink shader related)

In the editor I can run a game (have not tried to build anything).

Unity 2021.3.5F1
Entities+HR+... 0.51.0-preview.32
URP/ShaderGraph/VFXgraph 12.1.7...

lclemens · Jun 23, 2022

Arnold_2013 said: ↑

I fixed some errors by opening all shaderGraph / VFX graph items and re-saving them. Even shaders that I did not make in shadergraph might be opened by shadergraph when clicking the edit. I had this with the URP/Shaders/AutodeskInteractive... just open it and save. Only issue is that the error is only triggered when the shader is used... so hope I don't miss too many of them

The URP convert all material has been moved into windows -> rendering -> Render Pipeline Converter but so far it has not been the magic bullet for me (this was my goto solution for anything pink shader related)

In the editor I can run a game (have not tried to build anything).

Unity 2021.3.5F1
Entities+HR+... 0.51.0-preview.32
URP/ShaderGraph/VFXgraph 12.1.7...
Click to expand...

The render pipeline converter tools weren't much help. I'm not a shader graph expert, and I can't figure out how to open the shader in shader-graph since the original shader was created manually via code. From what I'm reading here: https://gamedev.stackexchange.com/questions/183836/convert-from-shader-to-shadergraph it sounds to me like there is no easy way to just open the .shader file - I either have to attempt to rebuild it in shader-graph or make a new shader graph file and integrate the functions as a black-box. The first method might not even be possible and the second method seems like it would still give me the same errors since the shader code won't change.

thelebaron · Jun 23, 2022

@lclemens
I had to change the UNITY_ACCESS_DOTS_INSTANCED_PROP_FROM_MACRO underscores from two __ to one _ which could be the source of your issues

lclemens · Jun 23, 2022

I think I figured it out...

I had to change these lines:

Code (CSharp):

#define _Color UNITY_ACCESS_DOTS_INSTANCED_PROP_FROM_MACRO(float4, Metadata__Color)

#define _SpeedInst UNITY_ACCESS_DOTS_INSTANCED_PROP_FROM_MACRO(float, Metadata__SpeedInst)

#define _CurTime UNITY_ACCESS_DOTS_INSTANCED_PROP_FROM_MACRO(float, Metadata__CurTime)

#define _ClipIdx UNITY_ACCESS_DOTS_INSTANCED_PROP_FROM_MACRO(float, Metadata__ClipIdx)

to these:

Code (CSharp):

#define _Color UNITY_ACCESS_DOTS_INSTANCED_PROP(float4, _Color)

#define _SpeedInst UNITY_ACCESS_DOTS_INSTANCED_PROP(float, _SpeedInst)

#define _CurTime UNITY_ACCESS_DOTS_INSTANCED_PROP(float, _CurTime)

#define _ClipIdx UNITY_ACCESS_DOTS_INSTANCED_PROP(float, _ClipIdx)

And now it works in 2021 with URP 12.1.7 and HRV2 0.51.

lclemens · Aug 25, 2022

I have a question. So for this baked-vertex-animation-in-a-texture method... What is the point in baking the normal positions? All this time I have been baking a position vertex map and a normal map because zulfajuniadi did it in his github project and other people were doing it in their vertex-animation projects. Every time I bake, I've been generating two textures as output.

But today I decided to make a lit version. I took the SimpleLit URP shader and modified it so that it used the encoded vertex-position texture, and for the normal texture I just used the regular normal map (not the generated one).

And as far as I can tell, it looks great! The shadows, lighting, and bmp mapping are behaving just like I would have expected.

And here is the plain unencoded normal map on the baked texture (this model also has alpha clipping).

So.... this is probably a stupid question, but why have I been generating vertex-encoded normal maps all this time??

Arathorn_J · Aug 25, 2022

If you don't bake the normal map and translate the normals based on that output you will get some really odd light reflections where you will see surfaces that should be obscured getting direct looking lighting or reflecting in the incorrect direction. So like if you had a model where the top of a an arm in T-Pose where the vertex has a normal of say 0,1,0 straight up and several frames later the arm is at the side where the vertex should be facing to the right of the model like 1,0,0 as an example it won't reflect light properly. In your example above you added a bump normal texture to overlay lighting on a textured surface but it won't change out the actual vertex normals which you still need to update by frame.

This is better visualized in shader graph where you will see the "Normal" for a bump/normal map as an input and also be able to translate the Vertex Normal and Position.

In the screenshot below I've actually got calculations reading a vertex animation texture and calculating the normal and position and I'm not populating the normal map at all (though I could if I needed to).

lclemens · Aug 25, 2022

Arathorn_J said: ↑

If you don't bake the normal map and translate the normals based on that output you will get some really odd light reflections where you will see surfaces that should be obscured getting direct looking lighting or reflecting in the incorrect direction. So like if you had a model where the top of a an arm in T-Pose where the vertex has a normal of say 0,1,0 straight up and several frames later the arm is at the side where the vertex should be facing to the right of the model like 1,0,0 as an example it won't reflect light properly. In your example above you added a bump normal texture to overlay lighting on a textured surface but it won't change out the actual vertex normals which you still need to update by frame.

This is better visualized in shader graph where you will see the "Normal" for a bump/normal map as an input and also be able to translate the Vertex Normal and Position.

In the screenshot below I've actually got calculations reading a vertex animation texture and calculating the normal and position and I'm not populating the normal map at all (though I could if I needed to).

View attachment 1106547
Click to expand...

Thanks - that makes perfect sense!

What about tangents? I don't know a whole lot about what the tangents are used for, but several of the texture-baked vertex position libraries I've seen have been ignoring them. I think they're used for bump mapping as well as the normals? Will I need them if I want that bump normal texture to work?

Hopefully tangents don't require too much precision because the most I can squeeze outs is 10 bits per axis before I would have to use a really large floating point texture instead.

Arathorn_J · Aug 25, 2022

I'm not sure on the tangent issue, and if it would actually be modified in some way by the position of the vertex. What I've done when I'm developing my shader is run the original mecanim animator in parallel and paused frame by frame to look at the surfaces and compare side by side to see if there are differences. Sorry I can't help more on the tangent calculation, maybe someone else can weigh in.

DreamingImLatios · Aug 25, 2022

Some lighting models use tangents, but not all. I know it is pretty hard to do brushed metals without it. You are much more likely to need it for an HDRP project I believe, but it has been a while since I have looked at all the shaders Unity uses across pipelines.

lclemens · Aug 26, 2022

Arathorn_J said: ↑

I'm not sure on the tangent issue, and if it would actually be modified in some way by the position of the vertex. What I've done when I'm developing my shader is run the original mecanim animator in parallel and paused frame by frame to look at the surfaces and compare side by side to see if there are differences. Sorry I can't help more on the tangent calculation, maybe someone else can weigh in.
Click to expand...

DreamingImLatios said: ↑

Some lighting models use tangents, but not all. I know it is pretty hard to do brushed metals without it. You are much more likely to need it for an HDRP project I believe, but it has been a while since I have looked at all the shaders Unity uses across pipelines.
Click to expand...

Thanks for the info. I went ahead and implemented the tangent to see if it mattered in the modified Simple Lit shader I am using. I could not find a visible difference at all.

To be honest - I think the models I'm using actually look better without the encoded normals texture. I'm not sure if it's a precision problem or what, but the bump mapping looks better without it in comparison to the original. On a practical level all of my character model movements are not very drastic (run, die, attack, stagger, idle) so it's not really noticeable that the encoded normals texture isn't being used, plus that is one less texture needed on mobile platforms. I think for the moment I will run without the encoded normal texture unless I see an obvious visual artifact.

lclemens · May 22, 2023

Something is perplexing me...

I built an executable of my Animation Cooker which uses Vertex Animation and sent it to a friend of mine. We both benchmarked with the same high-vertex model and 100k instances (all on-screen). It's definitely GPU-bound because the CPU is hardly touched at all (around 4%).

On my laptop with an RTX 2060, I got 9fps.
On his laptop with an RTX 3060, he got 9fps.

We retested with 10k instances and again got identical numbers.

This was unexpected to say the least.... any explanations for this odd phenomenon?

DreamingImLatios · May 22, 2023

Not all mobile GPUs are integrated equally. Even if they have the same model number, they could have drastically different power schemes. It could also be that there was minimal generational improvement in whatever is the specific bottleneck for this algorithm.

Arathorn_J · May 22, 2023

At a certain point with however much data per frame is getting sent to the GPU you hit a limit to the throughput and that can cause them to be pretty close. I gather you don’t have any sort of frame syncing enabled for the quality settings?

lclemens · May 23, 2023

Arathorn_J said: ↑

At a certain point with however much data per frame is getting sent to the GPU you hit a limit to the throughput and that can cause them to be pretty close. I gather you don’t have any sort of frame syncing enabled for the quality settings?
Click to expand...

The app is set to run as fast as possible (before spawning anything it's in the 300fps+ range). But in the player it defaulted with VSync Count set to "Every V Blank". Maybe that's it?

Another thing is that the build is set for full-screen native resolution. My friend's resolution is 2k and mine is 4k, and he still got the same number. Today I built a player to be 1920x1080 and ran it and it gets the same numbers as with 4k, so resolution doesn't make a difference.

So it sounds like it's some sort of throughput bottleneck that the RTX 3060 mobile didn't improve upon after the RTX 2060 mobile. When 100k entities are going I can see in task-manager that the GPU is just pegged to 100% on my 2060. I know if I choose a different model with less vertexes the frame rate improves a lot, so I just figured it was triangle throughput, but I'm pretty sure that a 3060 has a higher triangle throughput than a 2060, so it must be something else.

It's not super important that I discover the exact root cause or anything... I was just curios.

Any chance one of you could try it quick on a desktop? Just set the spawn count to 100,000 and then hit the Spawn button and note the FPS. Alt-F4 to exit. https://drive.google.com/file/d/1So-ellBcqjl43GZbH_60u-b-5RcH48fn/view?usp=drive_link

Rukhanka · May 24, 2023

lclemens said: ↑

Any chance one of you could try it quick on a desktop? Just set the spawn count to 100,000 and then hit the Spawn button and note the FPS. Alt-F4 to exit. https://drive.google.com/file/d/1So-ellBcqjl43GZbH_60u-b-5RcH48fn/view?usp=drive_link
Click to expand...

RTX 3070 Ti, Ryzen 7 5800X - 16FPS

inSight01 · May 26, 2023

lclemens said: ↑

Any chance one of you could try it quick on a desktop? Just set the spawn count to 100,000 and then hit the Spawn button and note the FPS. Alt-F4 to exit. https://drive.google.com/file/d/1So-ellBcqjl43GZbH_60u-b-5RcH48fn/view?usp=drive_link
Click to expand...

Laptop: RTX 3060, Ryzen 5800H, 32GB - 11fps

That's fairly impressive, all things considered.

lclemens · May 26, 2023

inSight01 said: ↑

Laptop: RTX 3060, Ryzen 5800H, 32GB - 11fps

That's fairly impressive, all things considered.
Click to expand...

Rukhanka said: ↑

RTX 3070 Ti, Ryzen 7 5800X - 16FPS
Click to expand...

Thanks for running the test. That confirms our suspicions on some sort of bottleneck that doesn't scale with card performance... even the 3070 Ti just barely pulls ahead, while the 3060 desktop card and 2060 laptop-version perform the same.

Using models with less vertices (like 600 to 700 or so) I can keep it above 40fps with 100k instances. Because the vertex count has a huge impact, my theory is that if I started using LODs it could go way higher. I'm not quite sure how to do that in DOTS yet (haven't really investigated it). It's on the todo list.

inSight01 · May 26, 2023

Are you trying to do something akin to GPU Instancer - Crowd Animations?

Arathorn_J · May 26, 2023

lclemens said: ↑

The app is set to run as fast as possible (before spawning anything it's in the 300fps+ range). But in the player it defaulted with VSync Count set to "Every V Blank". Maybe that's it?
Click to expand...

Yes you for sure need to make sure Vsync is disabled otherwise it won't matter what you do. As to any other issues, it can just be throughput bottlenecks, but profiler should show you spikes for graphics calls when you see it trying to push through too much consistently and the CPU waiting on those bottlenecks.

lclemens · May 26, 2023

Arathorn_J said: ↑

Yes you for sure need to make sure Vsync is disabled otherwise it won't matter what you do. As to any other issues, it can just be throughput bottlenecks, but profiler should show you spikes for graphics calls when you see it trying to push through too much consistently and the CPU waiting on those bottlenecks.
Click to expand...

I set the "VSync Count" to "Don't Sync" and tested on my 2060 laptop and it didn't make any difference at all, regardless of the entity count. I don't know why.

Lol! --- yeah I think it's safet to say that there are spikes for graphics calls.

lclemens · May 26, 2023

inSight01 said: ↑

Are you trying to do something akin to GPU Instancer - Crowd Animations?
Click to expand...

The technique is similar in that it bakes some things into textures, but it's different because the Crowd Animations asset is using GPU-Skinning - it is recording the bone transforms only into textures, whereas I'm recording vertex positions (it's the vertex animation technique). Vertex animation makes larger textures and is less flexible, but it is less work for the GPU so it can handle more agents at a time. I put it on gitlab - https://gitlab.com/lclemens/animationcooker .

WildMaN · Oct 4, 2023

Hey @lclemens , awesome job with your asset! Exploring it now and have a couple of questions:
- any particular reason to go for manual texture decompression versus the hardware built-in ones like BC6H for big devices or ASTC for mobiles?
- packing the metadata into a texture header rather than a separate buffer in AnimDB, what are the advantages?

lclemens · Oct 4, 2023

WildMaN said: ↑

Hey @lclemens , awesome job with your asset! Exploring it now and have a couple of questions:
- any particular reason to go for manual texture decompression versus the hardware built-in ones like BC6H for big devices or ASTC for mobiles?
- packing the metadata into a texture header rather than a separate buffer in AnimDB, what are the advantages?
Click to expand...

Excellent questions!

I guess the only reason I packed metadata into the header instead of AnimDB is because when I first started the project the shader was ticking the animation per frame instead of it being done in a system, so that was the only way to get that information into the shader. It turned out that with such a method it wasn't possible to know when animations ended and it had a few other downsides so later I moved that functionality to AnimationSystem. Now that ticking happens on the CPU side, I think I could store that info in AnimDB and pass it via material properties to the shader from AnimationSystem (as you suggested). It's a lot of values so maybe the material property could be a float 4x4 matrix or something to avoid using 15 different material properties. I've never done a matrix material property before, but I'm guessing it's possible? I'm not sure how performance would be affected by sending that matrix to the graphics card every frame, but hopefully it wouldn't make a huge difference.

As for ASTC/BC compression, obviously I would need to ditch the header as mentioned above. By doing my own compression, I was able to jam the data into an RGBA buffer to avoid floating point buffers. Additionally, I was able to make use of pre-calculated numbers for the min/max range of the animation positions, which a compression technique wouldn't have access to. I squeezed out every last bit of data on a per-bit basis (only 2048 values for X and Z and 1024 for Y). Another thing I did for compression was interpolation between frames, which was a huge savings. The resulting images look like this:

In a previous job I did a lot of work with H.264 and other compression schemes and I learned that compression wasn't very effective for really noisy images. I believe ASTC and the BC variants are lossy, which would further reduce the already limited precision, but perhaps dropping a small amount of precision might not be too bad? Another concern I have is that sometimes with compression there is bleedover - like if there is a section of really low value right next to some really high values, the in-between line might be blended/blurred. For images that's fine, but for positional values, it could make the animation look jittery because those particular positions would be way off from what they're supposed to be. One thing that would compress really well is the thick black line at the top which is just filler zeros to make the dimension line up to a power of 2.

Because you got me thinking... I did a quick test using the smaller of the above images.

Low and High quality both produced an 8KB image, so obviously it would be pointless to use low quality if you trust the KB number that Unity is giving. Also the low quality one looked horrible and I have no doubt it would be worthless for animations. However, the High quality setting looks decent and it compressed by a factor of 4. We can definitely see that some of the colors changed (especially the whites which became more yellowish). It's hard to tell how much it would affect the resulting animation, but I have no doubt that it would have an effect. I tested the larger image and with the high quality setting it went from 341.3KB to 85.4KB -- also a factor of 4. BC7 looks decent. DXT5|BC3 (the default for Win/Max/Linux builds) looks awful. BC6H looks decent if it's set to "Best".

I didn't see any options for ASTC so I'd have to get the C code and integrating that would take weeks. I think mobile in Unity only supports ETC. I didn't test ETC just now because I think I'd have to switch the project to Android build in order to test the compression options for that.

One thing to note is that a secondary compression wouldn't increase the number of vertexes a model could use - that is still determined by the max width/height a person is willing to accept (1024 for old mobile devices, 2048 for new mobile devices, and 16384 for desktop). The only way I could think of to increase max vertex count would be to try and stuff two vertexes into a single pixel, which would only leave 512 positions for the Y axis and 1024 positions for the X and Y axis. Hmmm... I should try that sometime.

I could probably convert the header section to material properties in a day or two. I would also need to save the texture assets as PNG or EXR instead of a ".asset" - and I already have code for that so it should be pretty easy. After that I think it would just be a matter of testing it with some different algorithms.

So in summary... It would be great to reduce the memory size by 1/4. The one bad thing is that if the experiment failed, I would have wasted a couple of days converting it for nothing. I wonder if there is a lossless mode for any of these codecs? My gut instinct is that the yellowish tint and color changes caused by the lossy compression will cause significant problems, but who knows...

WildMaN · Oct 5, 2023

lclemens said: ↑

I could probably convert the header section to material properties in a day or two. I would also need to save the texture assets as PNG or EXR instead of a ".asset" - and I already have code for that so it should be pretty easy. After that I think it would just be a matter of testing it with some different algorithms.
Click to expand...

AnimationKitchen unfortunately hard crashes the editor for me (2022.3.10f) so I wasn't able to do all the tweaks. But here are my thoughts:

- Usually, GPU animation is a way to render thousands of entities, and naturally, it's not about animation fidelity but performance. Though use cases may vary for sure, having options is always good.
- So explicit texture format allows us to choose texture format per object per LOD (!) and make an explicit choice of precision vs. size.
- Hardware encoding does all the tricks you've described like min/max, but not just per texture, per each block (ASTC). So it's more efficient. And zero code from the plugin side, it's all handled under the hood.
- Hardware sampling is free on any non-antique GPU, and hardware sampling provides a free blend between frames in the form of bilinear filtering. You're doing blending manually now in the shader. The guy behind https://www.headfirststudios.com/theorangecoder explained it nicely in the docs.
- You're baking final positions, not deltas from the origin - deltas would have zero mean and, depending on the animation range, of course, could compress WAY better, be it RGB9e5 with offset, ASTC, or fancier formats.
- Making a baked mesh also an importable mesh, compared to the current .asset storage, would allow for more optimizations like FP16 storage of non-position channels.

But anyway, I adapted your code to the ShaderGraph subnode and it plugged in nicely!

As for the animation FSM, it's a matter of project requirements I guess. Full CPU, full GPU, or smth in-between.

JussiKnuuttila · Oct 5, 2023

lclemens said: ↑

I am under the impression that HR 0.51 can be used with Unity 2021 and URP 12.1.7. Was that a false assumption, or is there some sort of backward incompatibility that causes these errors?
Click to expand...

HR 0.51 should be compatible with 2021 series. These error messages suggest that there might be some #include problem, possibly related to the order in which header files are being included (which could have been caused by something changing in the URP headers). One way to try to debug this is to use the "Preprocess only" shader compilation from the Inspector, and try to look at the site of error and compare with the original source.

lclemens · Oct 5, 2023

WildMaN said: ↑

AnimationKitchen unfortunately hard crashes the editor for me (2022.3.10f) so I wasn't able to do all the tweaks. But here are my thoughts:

- Usually, GPU animation is a way to render thousands of entities, and naturally, it's not about animation fidelity but performance. Though use cases may vary for sure, having options is always good.
- So explicit texture format allows us to choose texture format per object per LOD (!) and make an explicit choice of precision vs. size.
- Hardware encoding does all the tricks you've described like min/max, but not just per texture, per each block (ASTC). So it's more efficient. And zero code from the plugin side, it's all handled under the hood.
- Hardware sampling is free on any non-antique GPU, and hardware sampling provides a free blend between frames in the form of bilinear filtering. You're doing blending manually now in the shader. The guy behind https://www.headfirststudios.com/theorangecoder explained it nicely in the docs.
- You're baking final positions, not deltas from the origin - deltas would have zero mean and, depending on the animation range, of course, could compress WAY better, be it RGB9e5 with offset, ASTC, or fancier formats.
- Making a baked mesh also an importable mesh, compared to the current .asset storage, would allow for more optimizations like FP16 storage of non-position channels.

But anyway, I adapted your code to the ShaderGraph subnode and it plugged in nicely!

As for the animation FSM, it's a matter of project requirements I guess. Full CPU, full GPU, or smth in-between.
Click to expand...

- Yeah, AnimationCooker crashes the editor for me too - about once every 4 or 5 times I bake. I made a Unity bug report for it but they keep closing it as "not qualified" and say they can't reproduce it, but I don't think they actually tried it because it's pretty easy to reproduce and they have the full source code. https://unity3d.atlassian.net/servicedesk/customer/portal/2/IN-53065

- The way AnimationCooker currently works is that each LOD can use different texture dimensions and frame rates. It works pretty well.

- So you are saying I could give the hardware encoder a 3 color floating point image and it would pack it into an RGBA 11-10-11 image? I haven't run across that feature yet. I searched for quite a while and couldn't find anyone who was using texture compression with vertex animation - in fact most are using uncompressed floating point textures. I don't know much about these gaming compression algorithms other than the fact that they use blocks, but lossy compression algorithms like JPG and H.264 were developed to toss out visual data that the human eye won't notice.

- I tried bilinear filtering for frame-rate interpolation/slerping once, but it didn't work because it does funky stuff on the image edges and was messing stuff up because I am writing multiple animation clips to the same file and wrapping them. Each column does not correspond to a vertex sample. The GPU ECS Animation Baker from Orange Coder doesn't have to worry about wrapping, compression, or frame rates less than 30fps because it does baked bone animation instead of vertex position animation. The bone information is much smaller (there are way less bones than vertexes). The downside to the bone baking approach is that it requires transform hierarchies to be multiplied so it uses more CPU (though it's still fast enough for a huge hoard of entities). I think most people are using the bone baking approach because it has no limit on vertex count, uses less texture memory, and can support weapon/clothing attachments and animation blending.

- I was baking positions instead of deltas because that's what everyone else was doing, but I concur that deltas could compress better in most animations.

- Saving the mesh as an importable asset seems like a cool idea. I'm not really sure how to programmatically save as an FBX or something, but I think that FBX Exporter package has an API for it.

- I found that the animation FSM in a system running a job with ScheduleParallel() is barely noticeable. Even with over a million entities it's in the 2ms range. I prefer it in the CPU just because it's easier to tweak and modify and get/send events.

WildMaN · Oct 5, 2023

lclemens said: ↑

Yeah, AnimationCooker crashes the editor for me too
Click to expand...

It's 100% for me. Ok, will take a look as well.

lclemens said: ↑

The way AnimationCooker currently works is that each LOD can use different texture dimensions and frame rates
Click to expand...

Yes, but not the different format/compression. For mid-further LODs, it might make a small difference.

lclemens said: ↑

So you are saying I could give the hardware encoder a 3 color floating point image and it would pack it into an RGBA 11-10-11 image?
Click to expand...

No. On the encoding side, I frankly don't care, it's offline baking. On the runtime the GPU's texel unit is designed to handle decompression blazing fast, multiple texels with filtering per cycle. As for the format, RGB32 isn't something supported in silicon and would take a shader keyword and manual decompression, sure. RGB24, RGB9e5, and a bunch of lossy formats are supported. It's up to the project requirements I think, but I'd expect that deltas would compress nicely. Have to test.

lclemens said: ↑

The GPU ECS Animation Baker from Orange Coder doesn't have to worry about wrapping, compression, or frame rates less than 30fps
Click to expand...

Manual decompression gets in the way of filtering, yes. Wrapping etc. shouldn't with proper offset control, like never getting closer than 0.5 texels to the border. On the frame rates, I didn't get your point. He shows that with "bilinear blending" an 11fps anim could look almost the same as 30fps.

lclemens · Oct 6, 2023

Compressing

WildMaN said: ↑

It's 100% for me. Ok, will take a look as well.

Yes, but not the different format/compression. For mid-further LODs, it might make a small difference.

No. On the encoding side, I frankly don't care, it's offline baking. On the runtime the GPU's texel unit is designed to handle decompression blazing fast, multiple texels with filtering per cycle. As for the format, RGB32 isn't something supported in silicon and would take a shader keyword and manual decompression, sure. RGB24, RGB9e5, and a bunch of lossy formats are supported. It's up to the project requirements I think, but I'd expect that deltas would compress nicely. Have to test.

Manual decompression gets in the way of filtering, yes. Wrapping etc. shouldn't with proper offset control, like never getting closer than 0.5 texels to the border. On the frame rates, I didn't get your point. He shows that with "bilinear blending" an 11fps anim could look almost the same as 30fps.
Click to expand...

Cool. If you test the compression I'd love to know the results. I may do that at some point, but I'm in the middle of a lot of things right now so I probably won't mess around with animation again for a couple of months.

The OrangeCoder documentation says: "Frames are sampled & baked at 30 FPS.". I suspect he really doesn't need to go less than that because there's no need to compress texture sizes since bone data is so small already.

By wrapping I meant wrapping the animation clips. So for example the walk animation might start at row 1, col 0 and end at row 50 col 19. Then the attack animation would start at row 50 col 20 and end at row 90 col 77. Wrapping the animations like that causes bilinear interpolation to break stuff. The OrangeCoder doesn't need to do fancy wrapping since bone info is so small he can fit all the animations without wrapping. I benchmarked with and without my own interpolation in the shader and it's so fast that I couldn't even devise a test where I could find a difference in speed.

WildMaN · Oct 7, 2023

lclemens said: ↑

Cool. If you test the compression I'd love to know the results.
Click to expand...

Sure. I fixed the crashes by adding AssetDatabse.Refresh() after the texture creation. Also removed the dependency on Entities.Graphics (and plan to separate out Entities at all) and added an additional branch that processes the textures the way I described, preserving your way of packing as well. In the process, I noticed that because your code uses ARGB32 render texture, it's 8 bits per channel source data, so expanding it afterward to 10-11 bits does not increase actual precision.

DrawMeshInstancedProcedural horse with vertex animation. 4fps texture capture of 0.8sec clip, smoothed by hardware bilinear filtering. The discrepancy between the original GameObject (black) and the instanced one is the price to pay for compression. This is more than enough for my particular needs, when the screen scale is smth like this:

The example above is uncompressed 749x4 RGBA32 Float texture asset, 46.8kb. Texture stuff happened to be a bit more complicated than I thought, still trying to get my head around what the hell is UNorm and why I can't get regular floats...

Rukhanka · Oct 7, 2023

WildMaN said: ↑

The example above is uncompressed 749x4 RGBA32 Float texture asset, 46.8kb. Texture stuff happened to be a bit more complicated than I thought, still trying to get my head around what the hell is UNorm and why I can't get regular floats...
Click to expand...

RGBA32 is 4 bytes per texel (8+8+8+8=32 bits). There is no such format "RGBA32 Float". I beleive, you meant RGBAFloat (16 bytes per texel: 32+32+32+32=128 bits) , which is proved by sizes calculation: 16 * 749 * 4 = 47936 bytes. This is simply most gready, but most precise texture format that can be used (with wasted 4 bytes per texel). UNorm is decihered as "Unsigned Normalized" (SNorm - "Signed Normalized). Such formats means that GPU hardware reinterpret internal representation of texture data (texels) as unsigned float value in range [0..1] after sampling. I.e. R8G8B8A8_UNorm internally contains [0..255] ranged values in each byte (R, G, B, A). During sampling in shader GPU will return value modified by formula 'rawValue / 255.0f'. For R8G8B8A8_SNorm formula would be different: 'saturate(rawValie - 128) / 127.0f)'

WildMaN · Oct 8, 2023

lclemens said: ↑

If you test the compression I'd love to know the results.
Click to expand...

From left to right:
24fps ASTC4x4 17.6kb
10fps ASTC4x4 8.8kb
4fps ASTC4x4 2.9kb
24fps RGB24 52.7kb
10fps RGB16 14.6kb
4fps RGB16 5.9kb

I've tinkered with some other lossy compressions and they're all looking horrible close-in, inevitably mixing up the vertices. On the other hand, when zoomed out, why pay more?

24fps RGB24 at 52kb looks good (the GIF above adds some stutters due to its own compression). Pretty much all the others are horrible for close-ups, but on the other hand for RTS kind of view, they look surprisingly efficient.

Strangely enough, in RGB24 mode 4 fps one looks better than the others:

Edit:
Sizes in RGB24:
10fps 21.9kb
4fps 8.8kb

lclemens · Oct 13, 2023

WildMaN said: ↑

I noticed that because your code uses ARGB32 render texture, it's 8 bits per channel source data, so expanding it afterward to 10-11 bits does not increase actual precision.
Click to expand...

I am using all 32 bits of the ARGB32 image - The 3 values (XYZ) are spread across the 4 bytes (RGBA). So X is stored in R and partially in G (first 11 bits), Y is stored partially in G and partially in B (next 10 bits), and Z is stored in partially in B and A (last 11 bits). The process is reversed when the shader extracts the values so that the full 11 bits are used for X and Z (2048 values) and 10 bits are used for Y (1024 values). That's what the PackThree10BitFloatsToARGB() and UnpackRGBAToThree10BitFloats() functions do.

I originally started out with an RGB24 half image, which was HUGE on disk - several megabytes. I read somewhere that internally to the graphics card and OpenGL, RGB24 (888) is converted to RGBA, so it uses the same amount of memory as RGBA, which is one of the reasons I decided to use RGBA (The other reason being that I wanted 8 more bits because I didn't think that 255 values per axis was enough).

Thanks for doing the tests! Those sizes are definitely smaller than my compression for the Horse model at 10fps, though the visual quality definitely takes a hit. Did you bake all 3 horse animations, or just the walk animation?

So to fix the crash you put AssetDatabse.Refresh() after these lines in AnimationKitchenWindow.cs ?

Code (CSharp):

// Bake each skin and save its results to disk.

// 1x material, 1x skin mesh, 1x log, 1x pos tex, and an optional normal texture.

for (int i = 0; i < m_opts.SkinOpts.Count; i++) {

results.Add(BakeAndSave(m_opts.SkinOpts[i], subFolderPath, m_computeShaders, m_opts.EnableCopyTexturesToOutput, texDict));

}

I just looked into removing the header and moving it to AnimDb. I found that there is a MaterialPropertyFormat.Float4x4 which would be useful for transferring a lot of header info in a single property instead of using a bunch of float4 properties. Supposedly it generates a warning but it works.

I also found a bug - the AnimDb only stores information for the very last skin/LOD index, so if the first LOD is set to 24fps and the last one is set to 1fps, it'll get confused and try to interpret all 3 LODs as if they have the same frame rate which causes skipping for the first two LODs. I am going to try and fix that over the next couple of days and maybe if I can get rid of the header information at the same time I'll do that. It's a little tricky ditching the header because I'm storing multiple animation clips in a single texture and the shader needs to know a little information about each clip in order to properly find the correct vertex locations.

WildMaN · Oct 13, 2023

lclemens said: ↑

I am using all 32 bits of the ARGB32 image
Click to expand...

You're totally right, don't know why I called that a mistake. My apologies.

lclemens said: ↑

I read somewhere that internally to the graphics card and OpenGL, RGB24 (888) is converted to RGBA
Click to expand...

Yes, so that's why I mentioned RGB9e5 - hardware-supported 32 bit-wide packed, but not overly-compressed one, supposedly the best for storing deltas. I gave it a try but failed to encode/decode it properly. Maybe will get back to it after I finish the current pass on the whole render system.

lclemens said: ↑

I just looked into removing the header and moving it to AnimDb.
Click to expand...

With the caveat that I didn't look into your custom compression, my current shader utilizes standard TexelSize and one per-instance float which encodes the Y-axis animation shift. Nothing to pack into float4x4. On the CPU side, there's some data preparation with clip length/frame count, but that's it.

lclemens said: ↑

So to fix the crash you put AssetDatabse.Refresh() after these lines in AnimationKitchenWindow.cs ?
Click to expand...

As I heavily patched your code, line# may not be exact.

Code (CSharp):

result.PosTexPath = Path.Combine(subFolderPath, $"{prefabName}.posTex.asset");

AssetDatabase.CreateAsset(posTex, result.PosTexPath);

System.IO.File.WriteAllBytes(result.PosTexPath, posTex.EncodeToTGA());

AssetDatabase.Refresh();

lclemens · Oct 14, 2023

WildMaN said: ↑

With the caveat that I didn't look into your custom compression, my current shader utilizes standard TexelSize and one per-instance float which encodes the Y-axis animation shift. Nothing to pack into float4x4. On the CPU side, there's some data preparation with clip length/frame count, but that's it.
Click to expand...

So you're only handling one animation clip per texture? Are you using texture arrays or something?

Trindenberg · Oct 14, 2023

@WildMaN In your example above with 4/10/24fps, you should make sure the numbers are divisble, otherwise you get a skipped frame/distortion. You said somewhere its 4fps 0.8 sec clip but assume you might mean 30fps/0.8, meaning 24 frames. Therefore 24/12/6/3 (which may not be so bit compatible) or 24/8/4 you should test.

WildMaN · Oct 14, 2023

lclemens said: ↑

So you're only handling one animation clip per texture? Are you using texture arrays or something?
Click to expand...

Animation manager holds all the texture and clips settings:

Which translates to some burst-able structs for the animation system:

Per entity, there are two components - one stores the current animation state and changes only on the new clip command, and the other gets updated every tick. It's the clip's current timestamp, adjusted for clip start time, clip length, and texel height.

So f_currentShift goes straight into UV sampling in the shader:

Or in plain HLSL:

Code (CSharp):

Varyings vert(Attributes IN)

{

Varyings o = (Varyings)0;

UNITY_SETUP_INSTANCE_ID(IN);

float x = (IN.vid + 0.5) * _BakedAnimationsTexture_TexelSize.x;

float y = (f_currentShift + 0.5) * _BakedAnimationsTexture_TexelSize.y;

half3 bakedPosition = SAMPLE_TEXTURE2D_LOD(

_BakedAnimationsTexture,

sampler_BakedAnimationsTexture,

float2(x, y),

0);

bakedPosition = bakedPosition * _BakedAnimationsScale + _BakedAnimationsOffset;

VertexPositionInputs vertexInput = GetVertexPositionInputs(bakedPosition);

o.positionCS = vertexInput.positionCS;

return o;

}

WildMaN · Oct 14, 2023

Trindenberg said: ↑

otherwise you get a skipped frame/distortion
Click to expand...

Yes, that's the case. However, in practice, with that level of vertex compression artifacts, I doubt that frame rate inconsistency would be visible at all. Frame rates are set by the original code by @lclemens and I didn't bother to adjust them yet.

Trindenberg · Oct 14, 2023

WildMaN said: ↑

I doubt that frame rate inconsistency would be visible at all.
Click to expand...

You would be surprised how slight things are visible, its why 4 frames looked better than 10.

lclemens · Dec 18, 2023

WildMaN said: ↑

Animation manager holds all the texture and clips settings:

View attachment 1317374

Which translates to some burst-able structs for the animation system:

View attachment 1317377

Per entity, there are two components - one stores the current animation state and changes only on the new clip command, and the other gets updated every tick. It's the clip's current timestamp, adjusted for clip start time, clip length, and texel height.

View attachment 1317380

So f_currentShift goes straight into UV sampling in the shader:

View attachment 1317383

Or in plain HLSL:

Code (CSharp):

Varyings vert(Attributes IN)

{

Varyings o = (Varyings)0;

UNITY_SETUP_INSTANCE_ID(IN);

float x = (IN.vid + 0.5) * _BakedAnimationsTexture_TexelSize.x;

float y = (f_currentShift + 0.5) * _BakedAnimationsTexture_TexelSize.y;

half3 bakedPosition = SAMPLE_TEXTURE2D_LOD(

_BakedAnimationsTexture,

sampler_BakedAnimationsTexture,

float2(x, y),

0);

bakedPosition = bakedPosition * _BakedAnimationsScale + _BakedAnimationsOffset;

VertexPositionInputs vertexInput = GetVertexPositionInputs(bakedPosition);

o.positionCS = vertexInput.positionCS;

return o;

}

Click to expand...

Hope you're doing well! I finally got around to messing with AnimationCooker again. I don't see how hardware bilinear interpolation would work when the pixels are organized with a wrapping method like this...

The previous and next frames are not adjacent pixels but a hardware interpolator has no way of knowing that. I could organize the pixels so that each row corresponds with a vertex, but if the vertex count is not exactly a power of 2, there will be wasted space on the right side of the texture. The main issue is that the texture widths would be a function of vertex-count which would force meshes to have less than 2048 vertices on mobile platforms since that's the max texture width for most phones. Also if there are a lot of vertices and only a couple of short animation clips I would end up with goofy texture sizes like 2048x16 instead of something sensible like 256x128. So for now I'm leaning towards keeping the wrapping and using my own interpolation, which was equally as fast as hardware mode in all my benchmarks. It's done in the shader so it's still on the GPU and I eliminated all division operations.

I converted the auto-generated AnimDb.cs to a scriptable object, which is much cleaner and I fixed a bug where the frame rates got reduced for LOD0 and LOD1 when using LODs. I also improved and simplified the blob asset and some code in the static utils class.

I also removed the header line at the top of the texture and moved most of those parameters to material properties, but unfortunately I found out that unity doesn't support float array serialization for shader/material properties, so I had to store the begin/end frame values in the animation database and pass them to the shader every frame.

One advantage to removing the header is that I can mess around with some lossy hardware compression, but as you found out, it looks pretty crappy. I could do RGB24 lossless, but almost no GPUs support it natively, so it ends up getting converted to RGBA32 anyway and only saves space on disk while having around 1/8 the precision of my 11/10/11 RGBA packing.

So I think I'll experiment with some different compression schemes and if I can't get anything decent I might just put the header line back in there and call it a day.

WildMaN · Dec 19, 2023

Great job!

lclemens said: ↑

The main issue is that the texture widths would be a function of vertex-count which would force meshes to have less than 2048 vertices on mobile platforms since that's the max texture width for most phones
Click to expand...

But why do you need PoT textures? I do not see any requirements or advantages here. W/o PoT the "one line - one frame" concept allows for hardware filtering.

lclemens said: ↑

so I had to store the begin/end frame values in the animation database and pass them to the shader every frame.
Click to expand...

I did the same for the parameters. Once per frame doesn't sound too scary performance-wise.

lclemens said: ↑

but as you found out, it looks pretty crappy
Click to expand...

Depends on the use case. I'm currently using the strongest compression available for LODs 1-2 and it looks fine, and for LOD0 I managed to get some success with RGB9e5. Same bit width as yours but again allows for hardware filtering.

lclemens · Dec 23, 2023

WildMaN said: ↑

Great job!
But why do you need PoT textures? I do not see any requirements or advantages here. W/o PoT the "one line - one frame" concept allows for hardware filtering.
Click to expand...

I try to use power-of-two texture dimensions for these reasons:

According to the shader legend bgolus, mobile and older GPUs only support power of 2 textures.

bgolus also said most GPUs allocate a power of two size anyway, so having a non-power of two texture size really doesn't buy much other than disk space savings.

Unity recommends pow-of-two in their documentation: https://docs.unity3d.com/Manual/ImportingTextures.html . They said non-Po2 is a bit slower to sample.

By using pow-of-2, my shader can to do pixel position lookup without a division operation.

I did a bit of testing with compression. But before I post the results... one thing that confuses me is that I think what is shown in the inspector is NOT necessarily what happens in the GPU. For example, the inspector shows RGB24 (RGB8 UNorm) as smaller than RGBA 32, however, I know from reading various sources that nearly all GPUs treat them as the same. I tried looking for a tool that would show actual texture sizes in GPU memory, but I couldn't find anything that did that easily. Resource Checker just mirrors the inspector. I messed around with some NVidia Nsight stuff for a few minutes, but couldn't find anything that would drill down to that level. So think I have to take the Inspector values with a grain of salt and apply extra knowledge at times.

One other thing I learned - using the PNG format breaks RGB Float modes - it truncates values. So to test those modes I had to save files as EXR instead.

Test Setup: The model is the horse, LOD0 (1890 vertices).

Perfect Category contains formats that have perfect animation - zero distortion.

16 bit float (RGBA Half) - 512KB

R11G10B11 32 Bit - 256KB

Middle Category contains formats that have slight distortion. Overall they look good for a zoomed out view of the walk animation. However, zooming in on the nose of the horse during its idle animation, there is some wobbly waviness due to compression inaccuracies.

RGB 24 bit (RGB8) - 192KB, however, few GPUs support this, so it's likely stored as 256KB

RGBA 32 bit (RGBA8) - 256KB

RGB9e5 - 256KB

Crappy Category contains formats that look horrible. Personally I wouldn't use them for LOD0 or LOD1. They do cut the texture memory by half or more.

RGB 16 bit (B5G6B5) - 128KB. This one looks the best in this category (see gif below)

RGBA BC7 best - 64KB. This one looks second best in this category (see gif below)

DXT5|BC3, DXT1|BC1, BC6H - 64KB. You can't even tell the model is a horse!

ASTC - I think this requires Android, iOS, or WebGL and I didn't have time to build for those platforms.

CONCLUSION: Currently I don't see any reason to use the Midrange category - they all have the same size as the smallest one in the perfect category. Also, I don't see any reason to use 16 bit float since it's twice the size of its competitor in the same category. As for the crappy category... personally I doubt I'll use them other than maybe for the very last LOD level. From the benchmarks I did, the "hardware filtering" is no faster than the trivial filtering I built into the shader. Both methods do everything in hardware (the GPU) anyway.

Sidenote: The albedo main texture for the horse is 512x512. It is compressed with the default DXT5|BC3. It's size is 341.4KB, so it's quite a bit larger than all of the animation formats tested except for the floating point one.

................Appendix................

Horse's nose in the Perfect Category:

Horse's nose wobble in the Middle Category:

Walk Animation RGB 16 bit (B5G6B5):

Walk Animation BC7 best:

lclemens · Jan 4, 2024

Does anyone know a way to read a material property in a bursted job?

I want to store the "skin index" (aka "LOD index") as a property in the material such that my animation system/job knows which particular material it is dealing with. The catch is that the LOD system/job can swap out materials and meshes at any time via MaterialMeshInfo, so it seems useful to be able to permanently store the skin/LOD index as a hidden property in the material itself when the material is baked. I tried making an [MaterialProperty] + IComponentData for a _SkinIndex material property, but the job doesn't seem to read the value and it doesn't change as the LOD system swaps out materials. It's just always stuck at zero. I can write to it just fine... just not read it and I have no idea why. I also tried mesh.sharedMaterial.GetProperty(), but it can't be used in bursted jobs + ISystem.

In the worst case scenario, I could have my LOD system change a skin index component... but I'd rather not have the animation system depend on the LOD system... I'd prefer that they were independent of each other where the animation system reads the index ID directly from the material.

Any ideas?

DreamingImLatios · Jan 4, 2024

In Entities Graphics, Materials and MaterialProperty components are completely independent of each other. The MaterialProperty components are essentially value overrides, not at all dissimilar from MaterialPropertyBlocks. You need to either keep the skin index in a separate component, or store it in a hashmap indexed by the material's ID.

lclemens · Jan 4, 2024

DreamingImLatios said: ↑

In Entities Graphics, Materials and MaterialProperty components are completely independent of each other. The MaterialProperty components are essentially value overrides, not at all dissimilar from MaterialPropertyBlocks. You need to either keep the skin index in a separate component, or store it in a hashmap indexed by the material's ID.
Click to expand...

Okay, thanks so much for the advice. That's unfortunate and strange that the only way to read a material property is with burst disabled because it seems that reading a material property in a job could be useful in a few scenarios. I guess I can hack around it for now.

DreamingImLatios · Jan 5, 2024

lclemens said: ↑

That's unfortunate and strange that the only way to read a material property is with burst disabled
Click to expand...

Why is it strange that reading from a managed type like Material doesn't work in Burst? There's a lot of classical Unity things that don't work with jobs and Burst.

lclemens · Jan 6, 2024

DreamingImLatios said: ↑

Why is it strange that reading from a managed type like Material doesn't work in Burst? There's a lot of classical Unity things that don't work with jobs and Burst.
Click to expand...

Well it's strange that writing to a property work in Jobs and Burst just fine, yet reading that same property isn't possible. I would have guessed it would have been the other way around. Anyway, I got something working - it's hackier than I'd like but it gets the job done and performance is good. Thanks again for the info!

DreamingImLatios · Jan 6, 2024

It's not the same property. You can read material property components in jobs fine. They'll be set to whatever you left them at last. It is the shader that combines the ECS components with the actual material.

lclemens · Jan 6, 2024

DreamingImLatios said: ↑

It's not the same property. You can read material property components in jobs fine. They'll be set to whatever you left them at last. It is the shader that combines the ECS components with the actual material.
Click to expand...

The terminology is confusing . I can read a material property override component in burst (and write to it) using IComponentData with [MaterialProperty("_MyProp")], but as you said it's only reading the last state I set it to and it doesn't give me the actual value that is in the current material, which is what I want. Writing to it definitely saves it in the actual material because after running I can look at the material and see that its value changed in the inspector. The only way I know of to get the actual saved default property value from an entity's current material is via sharedMaterial.GetFloat("_MyProp"), but of course it's not Burst/Job friendly. So the net result is that in Burst/Jobs I can write to a material property at runtime, but I can't read its actual current non-overridden value.

DreamingImLatios · Jan 6, 2024

lclemens said: ↑

Writing to it definitely saves it in the actual material because after running I can look at the material and see that its value changed in the inspector.
Click to expand...

Writing to the IComponentData changes the property in the actual material? That shouldn't happen. There is no mechanism for that to happen. So there is something else you are probably doing that is causing that.

lclemens · Jan 6, 2024

DreamingImLatios said: ↑

Writing to the IComponentData changes the property in the actual material? That shouldn't happen. There is no mechanism for that to happen. So there is something else you are probably doing that is causing that.
Click to expand...

Yeah you are right. I thought it was happening, but I just realized that it was not.

As a sidenote.... one thing that would solve my problem is if there was a way to store a persistent array property in a material. Before when I was using a special header-line in the texture I put the array there, but now that I removed that, I had to store that info in a blob asset and send it each time. I found Material.SetFloatArray(), but it's not serialized with the material asset, so it's rather pointless because I'd end up storing the array in a blob asset anyway. Aside from using _ArraySlot1, _ArraySlot2, _ArraySlot3... any ideas?

Search Unity

Unity ID

Useful Searches

Graphics.DrawMeshInstanced

Unity Technologies