Search Unity

Graphics.DrawMeshInstanced

Discussion in 'Graphics for ECS' started by Arathorn_J, Jun 26, 2018.

  1. Arnold_2013

    Arnold_2013

    Joined:
    Nov 24, 2013
    Posts:
    284
    I fixed some errors by opening all shaderGraph / VFX graph items and re-saving them. Even shaders that I did not make in shadergraph might be opened by shadergraph when clicking the edit. I had this with the URP/Shaders/AutodeskInteractive... just open it and save. Only issue is that the error is only triggered when the shader is used... so hope I don't miss too many of them :)

    The URP convert all material has been moved into windows -> rendering -> Render Pipeline Converter but so far it has not been the magic bullet for me (this was my goto solution for anything pink shader related)

    In the editor I can run a game (have not tried to build anything).

    Unity 2021.3.5F1
    Entities+HR+... 0.51.0-preview.32
    URP/ShaderGraph/VFXgraph 12.1.7...
     
  2. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    The render pipeline converter tools weren't much help. I'm not a shader graph expert, and I can't figure out how to open the shader in shader-graph since the original shader was created manually via code. From what I'm reading here: https://gamedev.stackexchange.com/questions/183836/convert-from-shader-to-shadergraph it sounds to me like there is no easy way to just open the .shader file - I either have to attempt to rebuild it in shader-graph or make a new shader graph file and integrate the functions as a black-box. The first method might not even be possible and the second method seems like it would still give me the same errors since the shader code won't change.
     
  3. thelebaron

    thelebaron

    Joined:
    Jun 2, 2013
    Posts:
    857
    @lclemens
    I had to change the UNITY_ACCESS_DOTS_INSTANCED_PROP_FROM_MACRO underscores from two __ to one _ which could be the source of your issues
     
  4. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    I think I figured it out...

    I had to change these lines:

    Code (CSharp):
    1. #define _Color UNITY_ACCESS_DOTS_INSTANCED_PROP_FROM_MACRO(float4, Metadata__Color)
    2. #define _SpeedInst UNITY_ACCESS_DOTS_INSTANCED_PROP_FROM_MACRO(float, Metadata__SpeedInst)
    3. #define _CurTime UNITY_ACCESS_DOTS_INSTANCED_PROP_FROM_MACRO(float, Metadata__CurTime)
    4. #define _ClipIdx UNITY_ACCESS_DOTS_INSTANCED_PROP_FROM_MACRO(float, Metadata__ClipIdx)
    to these:

    Code (CSharp):
    1. #define _Color UNITY_ACCESS_DOTS_INSTANCED_PROP(float4, _Color)
    2. #define _SpeedInst UNITY_ACCESS_DOTS_INSTANCED_PROP(float, _SpeedInst)
    3. #define _CurTime UNITY_ACCESS_DOTS_INSTANCED_PROP(float, _CurTime)
    4. #define _ClipIdx UNITY_ACCESS_DOTS_INSTANCED_PROP(float, _ClipIdx)
    And now it works in 2021 with URP 12.1.7 and HRV2 0.51.
     
  5. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    I have a question. So for this baked-vertex-animation-in-a-texture method... What is the point in baking the normal positions? All this time I have been baking a position vertex map and a normal map because zulfajuniadi did it in his github project and other people were doing it in their vertex-animation projects. Every time I bake, I've been generating two textures as output.

    upload_2022-8-24_19-55-14.png

    But today I decided to make a lit version. I took the SimpleLit URP shader and modified it so that it used the encoded vertex-position texture, and for the normal texture I just used the regular normal map (not the generated one).

    upload_2022-8-24_20-1-43.png

    And as far as I can tell, it looks great! The shadows, lighting, and bmp mapping are behaving just like I would have expected.
    ezgif-4-3f1a816974.gif

    And here is the plain unencoded normal map on the baked texture (this model also has alpha clipping).

    upload_2022-8-24_20-9-55.png

    So.... this is probably a stupid question, but why have I been generating vertex-encoded normal maps all this time??
     
  6. Arathorn_J

    Arathorn_J

    Joined:
    Jan 13, 2018
    Posts:
    51
    If you don't bake the normal map and translate the normals based on that output you will get some really odd light reflections where you will see surfaces that should be obscured getting direct looking lighting or reflecting in the incorrect direction. So like if you had a model where the top of a an arm in T-Pose where the vertex has a normal of say 0,1,0 straight up and several frames later the arm is at the side where the vertex should be facing to the right of the model like 1,0,0 as an example it won't reflect light properly. In your example above you added a bump normal texture to overlay lighting on a textured surface but it won't change out the actual vertex normals which you still need to update by frame.

    This is better visualized in shader graph where you will see the "Normal" for a bump/normal map as an input and also be able to translate the Vertex Normal and Position.

    In the screenshot below I've actually got calculations reading a vertex animation texture and calculating the normal and position and I'm not populating the normal map at all (though I could if I needed to).

    upload_2022-8-24_21-53-30.png
     
    lclemens likes this.
  7. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    Thanks - that makes perfect sense!

    What about tangents? I don't know a whole lot about what the tangents are used for, but several of the texture-baked vertex position libraries I've seen have been ignoring them. I think they're used for bump mapping as well as the normals? Will I need them if I want that bump normal texture to work?

    Hopefully tangents don't require too much precision because the most I can squeeze outs is 10 bits per axis before I would have to use a really large floating point texture instead.
     
  8. Arathorn_J

    Arathorn_J

    Joined:
    Jan 13, 2018
    Posts:
    51
    I'm not sure on the tangent issue, and if it would actually be modified in some way by the position of the vertex. What I've done when I'm developing my shader is run the original mecanim animator in parallel and paused frame by frame to look at the surfaces and compare side by side to see if there are differences. Sorry I can't help more on the tangent calculation, maybe someone else can weigh in.
     
    lclemens likes this.
  9. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    Some lighting models use tangents, but not all. I know it is pretty hard to do brushed metals without it. You are much more likely to need it for an HDRP project I believe, but it has been a while since I have looked at all the shaders Unity uses across pipelines.
     
    lclemens likes this.
  10. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    Thanks for the info. I went ahead and implemented the tangent to see if it mattered in the modified Simple Lit shader I am using. I could not find a visible difference at all.

    To be honest - I think the models I'm using actually look better without the encoded normals texture. I'm not sure if it's a precision problem or what, but the bump mapping looks better without it in comparison to the original. On a practical level all of my character model movements are not very drastic (run, die, attack, stagger, idle) so it's not really noticeable that the encoded normals texture isn't being used, plus that is one less texture needed on mobile platforms. I think for the moment I will run without the encoded normal texture unless I see an obvious visual artifact.
     
  11. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    Something is perplexing me...

    I built an executable of my Animation Cooker which uses Vertex Animation and sent it to a friend of mine. We both benchmarked with the same high-vertex model and 100k instances (all on-screen). It's definitely GPU-bound because the CPU is hardly touched at all (around 4%).

    On my laptop with an RTX 2060, I got 9fps.
    On his laptop with an RTX 3060, he got 9fps.

    We retested with 10k instances and again got identical numbers.

    This was unexpected to say the least.... any explanations for this odd phenomenon?
     
  12. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    Not all mobile GPUs are integrated equally. Even if they have the same model number, they could have drastically different power schemes. It could also be that there was minimal generational improvement in whatever is the specific bottleneck for this algorithm.
     
  13. Arathorn_J

    Arathorn_J

    Joined:
    Jan 13, 2018
    Posts:
    51
    At a certain point with however much data per frame is getting sent to the GPU you hit a limit to the throughput and that can cause them to be pretty close. I gather you don’t have any sort of frame syncing enabled for the quality settings?
     
  14. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    The app is set to run as fast as possible (before spawning anything it's in the 300fps+ range). But in the player it defaulted with VSync Count set to "Every V Blank". Maybe that's it?

    Another thing is that the build is set for full-screen native resolution. My friend's resolution is 2k and mine is 4k, and he still got the same number. Today I built a player to be 1920x1080 and ran it and it gets the same numbers as with 4k, so resolution doesn't make a difference.

    So it sounds like it's some sort of throughput bottleneck that the RTX 3060 mobile didn't improve upon after the RTX 2060 mobile. When 100k entities are going I can see in task-manager that the GPU is just pegged to 100% on my 2060. I know if I choose a different model with less vertexes the frame rate improves a lot, so I just figured it was triangle throughput, but I'm pretty sure that a 3060 has a higher triangle throughput than a 2060, so it must be something else.

    It's not super important that I discover the exact root cause or anything... I was just curios.

    Any chance one of you could try it quick on a desktop? Just set the spawn count to 100,000 and then hit the Spawn button and note the FPS. Alt-F4 to exit. https://drive.google.com/file/d/1So-ellBcqjl43GZbH_60u-b-5RcH48fn/view?usp=drive_link
     
  15. Rukhanka

    Rukhanka

    Joined:
    Dec 14, 2022
    Posts:
    204
    lclemens likes this.
  16. inSight01

    inSight01

    Joined:
    Apr 18, 2017
    Posts:
    90
    lclemens likes this.
  17. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    Thanks for running the test. That confirms our suspicions on some sort of bottleneck that doesn't scale with card performance... even the 3070 Ti just barely pulls ahead, while the 3060 desktop card and 2060 laptop-version perform the same.

    Using models with less vertices (like 600 to 700 or so) I can keep it above 40fps with 100k instances. Because the vertex count has a huge impact, my theory is that if I started using LODs it could go way higher. I'm not quite sure how to do that in DOTS yet (haven't really investigated it). It's on the todo list.
     
  18. inSight01

    inSight01

    Joined:
    Apr 18, 2017
    Posts:
    90
  19. Arathorn_J

    Arathorn_J

    Joined:
    Jan 13, 2018
    Posts:
    51
    Yes you for sure need to make sure Vsync is disabled otherwise it won't matter what you do. As to any other issues, it can just be throughput bottlenecks, but profiler should show you spikes for graphics calls when you see it trying to push through too much consistently and the CPU waiting on those bottlenecks.
     
    lclemens likes this.
  20. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    I set the "VSync Count" to "Don't Sync" and tested on my 2060 laptop and it didn't make any difference at all, regardless of the entity count. I don't know why.

    Lol! --- yeah I think it's safet to say that there are spikes for graphics calls. :)
    upload_2023-5-26_11-30-15.png
     
  21. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    The technique is similar in that it bakes some things into textures, but it's different because the Crowd Animations asset is using GPU-Skinning - it is recording the bone transforms only into textures, whereas I'm recording vertex positions (it's the vertex animation technique). Vertex animation makes larger textures and is less flexible, but it is less work for the GPU so it can handle more agents at a time. I put it on gitlab - https://gitlab.com/lclemens/animationcooker .
     
  22. WildMaN

    WildMaN

    Joined:
    Jan 24, 2013
    Posts:
    128
    Hey @lclemens , awesome job with your asset! Exploring it now and have a couple of questions:
    - any particular reason to go for manual texture decompression versus the hardware built-in ones like BC6H for big devices or ASTC for mobiles?
    - packing the metadata into a texture header rather than a separate buffer in AnimDB, what are the advantages?
     
  23. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    Excellent questions!

    I guess the only reason I packed metadata into the header instead of AnimDB is because when I first started the project the shader was ticking the animation per frame instead of it being done in a system, so that was the only way to get that information into the shader. It turned out that with such a method it wasn't possible to know when animations ended and it had a few other downsides so later I moved that functionality to AnimationSystem. Now that ticking happens on the CPU side, I think I could store that info in AnimDB and pass it via material properties to the shader from AnimationSystem (as you suggested). It's a lot of values so maybe the material property could be a float 4x4 matrix or something to avoid using 15 different material properties. I've never done a matrix material property before, but I'm guessing it's possible? I'm not sure how performance would be affected by sending that matrix to the graphics card every frame, but hopefully it wouldn't make a huge difference.

    As for ASTC/BC compression, obviously I would need to ditch the header as mentioned above. By doing my own compression, I was able to jam the data into an RGBA buffer to avoid floating point buffers. Additionally, I was able to make use of pre-calculated numbers for the min/max range of the animation positions, which a compression technique wouldn't have access to. I squeezed out every last bit of data on a per-bit basis (only 2048 values for X and Z and 1024 for Y). Another thing I did for compression was interpolation between frames, which was a huge savings. The resulting images look like this:

    upload_2023-10-4_13-44-8.png upload_2023-10-4_13-44-55.png

    In a previous job I did a lot of work with H.264 and other compression schemes and I learned that compression wasn't very effective for really noisy images. I believe ASTC and the BC variants are lossy, which would further reduce the already limited precision, but perhaps dropping a small amount of precision might not be too bad? Another concern I have is that sometimes with compression there is bleedover - like if there is a section of really low value right next to some really high values, the in-between line might be blended/blurred. For images that's fine, but for positional values, it could make the animation look jittery because those particular positions would be way off from what they're supposed to be. One thing that would compress really well is the thick black line at the top which is just filler zeros to make the dimension line up to a power of 2.

    Because you got me thinking... I did a quick test using the smaller of the above images.

    TextureCompressionTest.gif

    Low and High quality both produced an 8KB image, so obviously it would be pointless to use low quality if you trust the KB number that Unity is giving. Also the low quality one looked horrible and I have no doubt it would be worthless for animations. However, the High quality setting looks decent and it compressed by a factor of 4. We can definitely see that some of the colors changed (especially the whites which became more yellowish). It's hard to tell how much it would affect the resulting animation, but I have no doubt that it would have an effect. I tested the larger image and with the high quality setting it went from 341.3KB to 85.4KB -- also a factor of 4. BC7 looks decent. DXT5|BC3 (the default for Win/Max/Linux builds) looks awful. BC6H looks decent if it's set to "Best".

    I didn't see any options for ASTC so I'd have to get the C code and integrating that would take weeks. I think mobile in Unity only supports ETC. I didn't test ETC just now because I think I'd have to switch the project to Android build in order to test the compression options for that.

    One thing to note is that a secondary compression wouldn't increase the number of vertexes a model could use - that is still determined by the max width/height a person is willing to accept (1024 for old mobile devices, 2048 for new mobile devices, and 16384 for desktop). The only way I could think of to increase max vertex count would be to try and stuff two vertexes into a single pixel, which would only leave 512 positions for the Y axis and 1024 positions for the X and Y axis. Hmmm... I should try that sometime.

    I could probably convert the header section to material properties in a day or two. I would also need to save the texture assets as PNG or EXR instead of a ".asset" - and I already have code for that so it should be pretty easy. After that I think it would just be a matter of testing it with some different algorithms.

    So in summary... It would be great to reduce the memory size by 1/4. The one bad thing is that if the experiment failed, I would have wasted a couple of days converting it for nothing. I wonder if there is a lossless mode for any of these codecs? My gut instinct is that the yellowish tint and color changes caused by the lossy compression will cause significant problems, but who knows...
     
  24. WildMaN

    WildMaN

    Joined:
    Jan 24, 2013
    Posts:
    128
    AnimationKitchen unfortunately hard crashes the editor for me (2022.3.10f) so I wasn't able to do all the tweaks. But here are my thoughts:

    - Usually, GPU animation is a way to render thousands of entities, and naturally, it's not about animation fidelity but performance. Though use cases may vary for sure, having options is always good.
    - So explicit texture format allows us to choose texture format per object per LOD (!) and make an explicit choice of precision vs. size.
    - Hardware encoding does all the tricks you've described like min/max, but not just per texture, per each block (ASTC). So it's more efficient. And zero code from the plugin side, it's all handled under the hood.
    - Hardware sampling is free on any non-antique GPU, and hardware sampling provides a free blend between frames in the form of bilinear filtering. You're doing blending manually now in the shader. The guy behind https://www.headfirststudios.com/theorangecoder explained it nicely in the docs.
    - You're baking final positions, not deltas from the origin - deltas would have zero mean and, depending on the animation range, of course, could compress WAY better, be it RGB9e5 with offset, ASTC, or fancier formats.
    - Making a baked mesh also an importable mesh, compared to the current .asset storage, would allow for more optimizations like FP16 storage of non-position channels.

    But anyway, I adapted your code to the ShaderGraph subnode and it plugged in nicely!

    As for the animation FSM, it's a matter of project requirements I guess. Full CPU, full GPU, or smth in-between.
     
  25. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    351
    HR 0.51 should be compatible with 2021 series. These error messages suggest that there might be some #include problem, possibly related to the order in which header files are being included (which could have been caused by something changing in the URP headers). One way to try to debug this is to use the "Preprocess only" shader compilation from the Inspector, and try to look at the site of error and compare with the original source.
     
  26. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    - Yeah, AnimationCooker crashes the editor for me too - about once every 4 or 5 times I bake. I made a Unity bug report for it but they keep closing it as "not qualified" and say they can't reproduce it, but I don't think they actually tried it because it's pretty easy to reproduce and they have the full source code. https://unity3d.atlassian.net/servicedesk/customer/portal/2/IN-53065

    - The way AnimationCooker currently works is that each LOD can use different texture dimensions and frame rates. It works pretty well.

    - So you are saying I could give the hardware encoder a 3 color floating point image and it would pack it into an RGBA 11-10-11 image? I haven't run across that feature yet. I searched for quite a while and couldn't find anyone who was using texture compression with vertex animation - in fact most are using uncompressed floating point textures. I don't know much about these gaming compression algorithms other than the fact that they use blocks, but lossy compression algorithms like JPG and H.264 were developed to toss out visual data that the human eye won't notice.

    - I tried bilinear filtering for frame-rate interpolation/slerping once, but it didn't work because it does funky stuff on the image edges and was messing stuff up because I am writing multiple animation clips to the same file and wrapping them. Each column does not correspond to a vertex sample. The GPU ECS Animation Baker from Orange Coder doesn't have to worry about wrapping, compression, or frame rates less than 30fps because it does baked bone animation instead of vertex position animation. The bone information is much smaller (there are way less bones than vertexes). The downside to the bone baking approach is that it requires transform hierarchies to be multiplied so it uses more CPU (though it's still fast enough for a huge hoard of entities). I think most people are using the bone baking approach because it has no limit on vertex count, uses less texture memory, and can support weapon/clothing attachments and animation blending.

    - I was baking positions instead of deltas because that's what everyone else was doing, but I concur that deltas could compress better in most animations.

    - Saving the mesh as an importable asset seems like a cool idea. I'm not really sure how to programmatically save as an FBX or something, but I think that FBX Exporter package has an API for it.

    - I found that the animation FSM in a system running a job with ScheduleParallel() is barely noticeable. Even with over a million entities it's in the 2ms range. I prefer it in the CPU just because it's easier to tweak and modify and get/send events.
     
    WildMaN likes this.
  27. WildMaN

    WildMaN

    Joined:
    Jan 24, 2013
    Posts:
    128
    It's 100% for me. Ok, will take a look as well.

    Yes, but not the different format/compression. For mid-further LODs, it might make a small difference.

    No. On the encoding side, I frankly don't care, it's offline baking. On the runtime the GPU's texel unit is designed to handle decompression blazing fast, multiple texels with filtering per cycle. As for the format, RGB32 isn't something supported in silicon and would take a shader keyword and manual decompression, sure. RGB24, RGB9e5, and a bunch of lossy formats are supported. It's up to the project requirements I think, but I'd expect that deltas would compress nicely. Have to test.

    Manual decompression gets in the way of filtering, yes. Wrapping etc. shouldn't with proper offset control, like never getting closer than 0.5 texels to the border. On the frame rates, I didn't get your point. He shows that with "bilinear blending" an 11fps anim could look almost the same as 30fps.
     
  28. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    Compressing
    Cool. If you test the compression I'd love to know the results. I may do that at some point, but I'm in the middle of a lot of things right now so I probably won't mess around with animation again for a couple of months.

    The OrangeCoder documentation says: "Frames are sampled & baked at 30 FPS.". I suspect he really doesn't need to go less than that because there's no need to compress texture sizes since bone data is so small already.

    By wrapping I meant wrapping the animation clips. So for example the walk animation might start at row 1, col 0 and end at row 50 col 19. Then the attack animation would start at row 50 col 20 and end at row 90 col 77. Wrapping the animations like that causes bilinear interpolation to break stuff. The OrangeCoder doesn't need to do fancy wrapping since bone info is so small he can fit all the animations without wrapping. I benchmarked with and without my own interpolation in the shader and it's so fast that I couldn't even devise a test where I could find a difference in speed.
     
  29. WildMaN

    WildMaN

    Joined:
    Jan 24, 2013
    Posts:
    128
    Sure. I fixed the crashes by adding AssetDatabse.Refresh() after the texture creation. Also removed the dependency on Entities.Graphics (and plan to separate out Entities at all) and added an additional branch that processes the textures the way I described, preserving your way of packing as well. In the process, I noticed that because your code uses ARGB32 render texture, it's 8 bits per channel source data, so expanding it afterward to 10-11 bits does not increase actual precision.

    Unity_EgKBDxLRVG.gif

    DrawMeshInstancedProcedural horse with vertex animation. 4fps texture capture of 0.8sec clip, smoothed by hardware bilinear filtering. The discrepancy between the original GameObject (black) and the instanced one is the price to pay for compression. This is more than enough for my particular needs, when the screen scale is smth like this:

    Unity_W0ibjaSHOo.gif

    The example above is uncompressed 749x4 RGBA32 Float texture asset, 46.8kb. Texture stuff happened to be a bit more complicated than I thought, still trying to get my head around what the hell is UNorm and why I can't get regular floats...
     
  30. Rukhanka

    Rukhanka

    Joined:
    Dec 14, 2022
    Posts:
    204
    RGBA32 is 4 bytes per texel (8+8+8+8=32 bits). There is no such format "RGBA32 Float". I beleive, you meant RGBAFloat (16 bytes per texel: 32+32+32+32=128 bits) , which is proved by sizes calculation: 16 * 749 * 4 = 47936 bytes. This is simply most gready, but most precise texture format that can be used (with wasted 4 bytes per texel). UNorm is decihered as "Unsigned Normalized" (SNorm - "Signed Normalized). Such formats means that GPU hardware reinterpret internal representation of texture data (texels) as unsigned float value in range [0..1] after sampling. I.e. R8G8B8A8_UNorm internally contains [0..255] ranged values in each byte (R, G, B, A). During sampling in shader GPU will return value modified by formula 'rawValue / 255.0f'. For R8G8B8A8_SNorm formula would be different: 'saturate(rawValie - 128) / 127.0f)'
     
    lclemens likes this.
  31. WildMaN

    WildMaN

    Joined:
    Jan 24, 2013
    Posts:
    128
    Unity_O2u8OuJPad.gif

    From left to right:
    24fps ASTC4x4 17.6kb
    10fps ASTC4x4 8.8kb
    4fps ASTC4x4 2.9kb
    24fps RGB24 52.7kb
    10fps RGB16 14.6kb
    4fps RGB16 5.9kb

    I've tinkered with some other lossy compressions and they're all looking horrible close-in, inevitably mixing up the vertices. On the other hand, when zoomed out, why pay more?

    Unity_3NVYcoyvWA.gif

    24fps RGB24 at 52kb looks good (the GIF above adds some stutters due to its own compression). Pretty much all the others are horrible for close-ups, but on the other hand for RTS kind of view, they look surprisingly efficient.

    Strangely enough, in RGB24 mode 4 fps one looks better than the others:

    Unity_AuZegIiZKX.gif

    Edit:
    Sizes in RGB24:
    10fps 21.9kb
    4fps 8.8kb
     
    Last edited: Oct 8, 2023
  32. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    I am using all 32 bits of the ARGB32 image - The 3 values (XYZ) are spread across the 4 bytes (RGBA). So X is stored in R and partially in G (first 11 bits), Y is stored partially in G and partially in B (next 10 bits), and Z is stored in partially in B and A (last 11 bits). The process is reversed when the shader extracts the values so that the full 11 bits are used for X and Z (2048 values) and 10 bits are used for Y (1024 values). That's what the PackThree10BitFloatsToARGB() and UnpackRGBAToThree10BitFloats() functions do.

    I originally started out with an RGB24 half image, which was HUGE on disk - several megabytes. I read somewhere that internally to the graphics card and OpenGL, RGB24 (888) is converted to RGBA, so it uses the same amount of memory as RGBA, which is one of the reasons I decided to use RGBA (The other reason being that I wanted 8 more bits because I didn't think that 255 values per axis was enough).

    Thanks for doing the tests! Those sizes are definitely smaller than my compression for the Horse model at 10fps, though the visual quality definitely takes a hit. Did you bake all 3 horse animations, or just the walk animation?

    So to fix the crash you put AssetDatabse.Refresh() after these lines in AnimationKitchenWindow.cs ?
    Code (CSharp):
    1.             // Bake each skin and save its results to disk.
    2.             // 1x material, 1x skin mesh, 1x log, 1x pos tex, and an optional normal texture.
    3.             for (int i = 0; i < m_opts.SkinOpts.Count; i++) {
    4.                 results.Add(BakeAndSave(m_opts.SkinOpts[i], subFolderPath, m_computeShaders, m_opts.EnableCopyTexturesToOutput, texDict));
    5.             }
    I just looked into removing the header and moving it to AnimDb. I found that there is a MaterialPropertyFormat.Float4x4 which would be useful for transferring a lot of header info in a single property instead of using a bunch of float4 properties. Supposedly it generates a warning but it works.

    I also found a bug - the AnimDb only stores information for the very last skin/LOD index, so if the first LOD is set to 24fps and the last one is set to 1fps, it'll get confused and try to interpret all 3 LODs as if they have the same frame rate which causes skipping for the first two LODs. I am going to try and fix that over the next couple of days and maybe if I can get rid of the header information at the same time I'll do that. It's a little tricky ditching the header because I'm storing multiple animation clips in a single texture and the shader needs to know a little information about each clip in order to properly find the correct vertex locations.
     
  33. WildMaN

    WildMaN

    Joined:
    Jan 24, 2013
    Posts:
    128
    You're totally right, don't know why I called that a mistake. My apologies.

    Yes, so that's why I mentioned RGB9e5 - hardware-supported 32 bit-wide packed, but not overly-compressed one, supposedly the best for storing deltas. I gave it a try but failed to encode/decode it properly. Maybe will get back to it after I finish the current pass on the whole render system.

    With the caveat that I didn't look into your custom compression, my current shader utilizes standard TexelSize and one per-instance float which encodes the Y-axis animation shift. Nothing to pack into float4x4. On the CPU side, there's some data preparation with clip length/frame count, but that's it.

    As I heavily patched your code, line# may not be exact.

    Code (CSharp):
    1. result.PosTexPath = Path.Combine(subFolderPath, $"{prefabName}.posTex.asset");
    2.  
    3. AssetDatabase.CreateAsset(posTex, result.PosTexPath);
    4.  
    5. System.IO.File.WriteAllBytes(result.PosTexPath, posTex.EncodeToTGA());
    6.  
    7. AssetDatabase.Refresh();
     
  34. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    So you're only handling one animation clip per texture? Are you using texture arrays or something?
     
  35. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    396
    @WildMaN In your example above with 4/10/24fps, you should make sure the numbers are divisble, otherwise you get a skipped frame/distortion. You said somewhere its 4fps 0.8 sec clip but assume you might mean 30fps/0.8, meaning 24 frames. Therefore 24/12/6/3 (which may not be so bit compatible) or 24/8/4 you should test.
     
  36. WildMaN

    WildMaN

    Joined:
    Jan 24, 2013
    Posts:
    128
    Animation manager holds all the texture and clips settings:
    Unity_dTsaQ4aQBR.png

    Which translates to some burst-able structs for the animation system:
    devenv_PZtXttqf7z.png

    Per entity, there are two components - one stores the current animation state and changes only on the new clip command, and the other gets updated every tick. It's the clip's current timestamp, adjusted for clip start time, clip length, and texel height.

    devenv_1c2OC8h9AK.png

    So f_currentShift goes straight into UV sampling in the shader:
    Unity_mSJc0tNsY8.png

    Or in plain HLSL:
    Code (CSharp):
    1.            
    2. Varyings vert(Attributes IN)
    3. {
    4.                 Varyings o = (Varyings)0;
    5.  
    6.                 UNITY_SETUP_INSTANCE_ID(IN);
    7.  
    8.                 float x = (IN.vid + 0.5) * _BakedAnimationsTexture_TexelSize.x;
    9.                 float y = (f_currentShift + 0.5) * _BakedAnimationsTexture_TexelSize.y;
    10.  
    11.                 half3 bakedPosition = SAMPLE_TEXTURE2D_LOD(
    12.                     _BakedAnimationsTexture,
    13.                     sampler_BakedAnimationsTexture,
    14.                     float2(x, y),
    15.                     0);
    16.  
    17.                 bakedPosition = bakedPosition * _BakedAnimationsScale + _BakedAnimationsOffset;
    18.  
    19.                 VertexPositionInputs vertexInput = GetVertexPositionInputs(bakedPosition);
    20.  
    21.                 o.positionCS = vertexInput.positionCS;
    22.                 return o;
    23.             }
    24.  
     
  37. WildMaN

    WildMaN

    Joined:
    Jan 24, 2013
    Posts:
    128
    Yes, that's the case. However, in practice, with that level of vertex compression artifacts, I doubt that frame rate inconsistency would be visible at all. Frame rates are set by the original code by @lclemens and I didn't bother to adjust them yet.
     
  38. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    396
    You would be surprised how slight things are visible, its why 4 frames looked better than 10.
     
  39. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    Hope you're doing well! I finally got around to messing with AnimationCooker again. I don't see how hardware bilinear interpolation would work when the pixels are organized with a wrapping method like this...

    upload_2023-12-17_21-51-56.png

    The previous and next frames are not adjacent pixels but a hardware interpolator has no way of knowing that. I could organize the pixels so that each row corresponds with a vertex, but if the vertex count is not exactly a power of 2, there will be wasted space on the right side of the texture. The main issue is that the texture widths would be a function of vertex-count which would force meshes to have less than 2048 vertices on mobile platforms since that's the max texture width for most phones. Also if there are a lot of vertices and only a couple of short animation clips I would end up with goofy texture sizes like 2048x16 instead of something sensible like 256x128. So for now I'm leaning towards keeping the wrapping and using my own interpolation, which was equally as fast as hardware mode in all my benchmarks. It's done in the shader so it's still on the GPU and I eliminated all division operations.

    I converted the auto-generated AnimDb.cs to a scriptable object, which is much cleaner and I fixed a bug where the frame rates got reduced for LOD0 and LOD1 when using LODs. I also improved and simplified the blob asset and some code in the static utils class.

    I also removed the header line at the top of the texture and moved most of those parameters to material properties, but unfortunately I found out that unity doesn't support float array serialization for shader/material properties, so I had to store the begin/end frame values in the animation database and pass them to the shader every frame.

    One advantage to removing the header is that I can mess around with some lossy hardware compression, but as you found out, it looks pretty crappy. I could do RGB24 lossless, but almost no GPUs support it natively, so it ends up getting converted to RGBA32 anyway and only saves space on disk while having around 1/8 the precision of my 11/10/11 RGBA packing.

    So I think I'll experiment with some different compression schemes and if I can't get anything decent I might just put the header line back in there and call it a day.
     
  40. WildMaN

    WildMaN

    Joined:
    Jan 24, 2013
    Posts:
    128
    Great job!

    But why do you need PoT textures? I do not see any requirements or advantages here. W/o PoT the "one line - one frame" concept allows for hardware filtering.

    I did the same for the parameters. Once per frame doesn't sound too scary performance-wise.

    Depends on the use case. I'm currently using the strongest compression available for LODs 1-2 and it looks fine, and for LOD0 I managed to get some success with RGB9e5. Same bit width as yours but again allows for hardware filtering.
     
  41. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    I try to use power-of-two texture dimensions for these reasons:
    1. According to the shader legend bgolus, mobile and older GPUs only support power of 2 textures.
    2. bgolus also said most GPUs allocate a power of two size anyway, so having a non-power of two texture size really doesn't buy much other than disk space savings.
    3. Unity recommends pow-of-two in their documentation: https://docs.unity3d.com/Manual/ImportingTextures.html . They said non-Po2 is a bit slower to sample.
    4. By using pow-of-2, my shader can to do pixel position lookup without a division operation.
    I did a bit of testing with compression. But before I post the results... one thing that confuses me is that I think what is shown in the inspector is NOT necessarily what happens in the GPU. For example, the inspector shows RGB24 (RGB8 UNorm) as smaller than RGBA 32, however, I know from reading various sources that nearly all GPUs treat them as the same. I tried looking for a tool that would show actual texture sizes in GPU memory, but I couldn't find anything that did that easily. Resource Checker just mirrors the inspector. I messed around with some NVidia Nsight stuff for a few minutes, but couldn't find anything that would drill down to that level. So think I have to take the Inspector values with a grain of salt and apply extra knowledge at times.

    One other thing I learned - using the PNG format breaks RGB Float modes - it truncates values. So to test those modes I had to save files as EXR instead.

    Test Setup: The model is the horse, LOD0 (1890 vertices).

    Perfect Category contains formats that have perfect animation - zero distortion.
    • 16 bit float (RGBA Half) - 512KB
    • R11G10B11 32 Bit - 256KB
    Middle Category contains formats that have slight distortion. Overall they look good for a zoomed out view of the walk animation. However, zooming in on the nose of the horse during its idle animation, there is some wobbly waviness due to compression inaccuracies.
    • RGB 24 bit (RGB8) - 192KB, however, few GPUs support this, so it's likely stored as 256KB
    • RGBA 32 bit (RGBA8) - 256KB
    • RGB9e5 - 256KB
    Crappy Category contains formats that look horrible. Personally I wouldn't use them for LOD0 or LOD1. They do cut the texture memory by half or more.
    • RGB 16 bit (B5G6B5) - 128KB. This one looks the best in this category (see gif below)
    • RGBA BC7 best - 64KB. This one looks second best in this category (see gif below)
    • DXT5|BC3, DXT1|BC1, BC6H - 64KB. You can't even tell the model is a horse!
    • ASTC - I think this requires Android, iOS, or WebGL and I didn't have time to build for those platforms.
    CONCLUSION: Currently I don't see any reason to use the Midrange category - they all have the same size as the smallest one in the perfect category. Also, I don't see any reason to use 16 bit float since it's twice the size of its competitor in the same category. As for the crappy category... personally I doubt I'll use them other than maybe for the very last LOD level. From the benchmarks I did, the "hardware filtering" is no faster than the trivial filtering I built into the shader. Both methods do everything in hardware (the GPU) anyway.

    Sidenote: The albedo main texture for the horse is 512x512. It is compressed with the default DXT5|BC3. It's size is 341.4KB, so it's quite a bit larger than all of the animation formats tested except for the floating point one.

    ................Appendix................

    Horse's nose in the Perfect Category:
    r11g10b11 - nose - 256KB.gif

    Horse's nose wobble in the Middle Category:
    rgb9ef - nose - 256KB.gif
    Walk Animation RGB 16 bit (B5G6B5):
    rgb16bit - b5g6b5.gif

    Walk Animation BC7 best:
    rgba-bc7.gif
     
  42. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    Does anyone know a way to read a material property in a bursted job?

    I want to store the "skin index" (aka "LOD index") as a property in the material such that my animation system/job knows which particular material it is dealing with. The catch is that the LOD system/job can swap out materials and meshes at any time via MaterialMeshInfo, so it seems useful to be able to permanently store the skin/LOD index as a hidden property in the material itself when the material is baked. I tried making an [MaterialProperty] + IComponentData for a _SkinIndex material property, but the job doesn't seem to read the value and it doesn't change as the LOD system swaps out materials. It's just always stuck at zero. I can write to it just fine... just not read it and I have no idea why. I also tried mesh.sharedMaterial.GetProperty(), but it can't be used in bursted jobs + ISystem.

    In the worst case scenario, I could have my LOD system change a skin index component... but I'd rather not have the animation system depend on the LOD system... I'd prefer that they were independent of each other where the animation system reads the index ID directly from the material.

    Any ideas?
     
  43. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    In Entities Graphics, Materials and MaterialProperty components are completely independent of each other. The MaterialProperty components are essentially value overrides, not at all dissimilar from MaterialPropertyBlocks. You need to either keep the skin index in a separate component, or store it in a hashmap indexed by the material's ID.
     
    lclemens likes this.
  44. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    Okay, thanks so much for the advice. That's unfortunate and strange that the only way to read a material property is with burst disabled because it seems that reading a material property in a job could be useful in a few scenarios. I guess I can hack around it for now.
     
  45. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    Why is it strange that reading from a managed type like Material doesn't work in Burst? There's a lot of classical Unity things that don't work with jobs and Burst.
     
  46. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    Well it's strange that writing to a property work in Jobs and Burst just fine, yet reading that same property isn't possible. I would have guessed it would have been the other way around. Anyway, I got something working - it's hackier than I'd like but it gets the job done and performance is good. Thanks again for the info!
     
  47. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    It's not the same property. You can read material property components in jobs fine. They'll be set to whatever you left them at last. It is the shader that combines the ECS components with the actual material.
     
  48. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    The terminology is confusing ;). I can read a material property override component in burst (and write to it) using IComponentData with [MaterialProperty("_MyProp")], but as you said it's only reading the last state I set it to and it doesn't give me the actual value that is in the current material, which is what I want. Writing to it definitely saves it in the actual material because after running I can look at the material and see that its value changed in the inspector. The only way I know of to get the actual saved default property value from an entity's current material is via sharedMaterial.GetFloat("_MyProp"), but of course it's not Burst/Job friendly. So the net result is that in Burst/Jobs I can write to a material property at runtime, but I can't read its actual current non-overridden value.
     
  49. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    Writing to the IComponentData changes the property in the actual material? That shouldn't happen. There is no mechanism for that to happen. So there is something else you are probably doing that is causing that.
     
  50. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    761
    Yeah you are right. I thought it was happening, but I just realized that it was not.

    As a sidenote.... one thing that would solve my problem is if there was a way to store a persistent array property in a material. Before when I was using a special header-line in the texture I put the array there, but now that I removed that, I had to store that info in a blob asset and send it each time. I found Material.SetFloatArray(), but it's not serialized with the material asset, so it's rather pointless because I'd end up storing the array in a blob asset anyway. Aside from using _ArraySlot1, _ArraySlot2, _ArraySlot3... any ideas?
     
    Last edited: Jan 6, 2024