Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Burst-efficient way of storing bytes or shorts in floats?

Discussion in 'Burst' started by DragonCoder, Mar 13, 2022.

  1. DragonCoder

    DragonCoder

    Joined:
    Jul 3, 2015
    Posts:
    1,459
    Hello Community,

    am working on some burst-accelerated smart particle magic where I need to store small amounts of data alongside each particle. The "custom data" system seems like predestined for that, but 8 values total are a bit tight.
    Would rather not introduce a parallel data structure to store the data to keep the system self contained.

    What would be a good way to store 4 bytes or 2 shorts in every float of a vector4 (or more like its floats) when the goal is to be burst compiled efficiently?
    There's the "BitConverter" ( https://docs.microsoft.com/en-us/dotnet/api/system.bitconverter.getbytes?view=net-6.0 ) but that does not seem like a very Unity-friendly solution.
    Are there low level bit operations?

    Huge thanks for any input!
     
  2. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    The Unity.Mathematics package has bitwise reinterpretations between float and uint. You can easily pack bytes and shorts into uints using bitwise operations and shifts.
     
  3. Mortuus17

    Mortuus17

    Joined:
    Jan 6, 2020
    Posts:
    105
    Why "in floats"?
    Do you mean "float vectors" and thus actually SIMD registers?

    Well that's my assumption anyways.
    2 Approaches I like:
    1: Use my SIMD math library (signature). It has vectors up to
    (s)byte32
    , aswell as
    (u)short
    and
    (u)long
    vectors (and matrices), while using assembly language, exposed by Burst as compiler intrinsics.
    2: Just use unsafe code.

    Code (CSharp):
    1. unsafe
    2. {
    3.     byte[] myArray = GetItFromSomewhere();
    4.     fixed (void* ptr = myArray)
    5.     {
    6.         float4 firstFloat4 = *(float4*)ptr;
    7.         ptr = (float4*)ptr + 1;
    8.         float4 scndFloat4 = *(float4*)ptr;
    9.     }
    10. }
     
  4. DragonCoder

    DragonCoder

    Joined:
    Jul 3, 2015
    Posts:
    1,459
    Oh, math.asint \ math.asfloat! I see, thank you, think that will work.

    No, they are Vector4, albeit maybe in the background the burst compiler will use SMD registers(?).
    Am working with particles and burst jobs with the IJobParticleSystemParallelFor interface. Now the execute function looks like this:
    Code (CSharp):
    1. public void Execute(ParticleSystemJobData particle_data, int i)
    2. {
    3.       float a = particle_data.customData1.x[i];
    4.       // now read two shorts out of that float
    5.       // ...
    6.       // and write two different shorts to it
    7.       particle_data.customData1.x[i] = new_value;
    8. }
    I do not need interaction between different particles, so I do not loop over a single array in one go, but over all particles in separate calls (managed by the job scheduler). Therefore am not sure how I can make use of your suggestions.

    Your library certainly looks impressive though for true low-level programming in Unity. Amazing what's possible.

    EDIT: Guess I can use your library if I convert the Vector4 into float4 (and when writing back, turn back into vector4). Do you know whether those conversions would come without cost that negates any advantages of a SIMD library over following DreamingImLatios's suggestion?
    It is a bit unfortunate that Unity's particle system does not use the datatypes from Mathematics yet.
     
  5. Mortuus17

    Mortuus17

    Joined:
    Jan 6, 2020
    Posts:
    105
    @DragonCoder Hmm unless your code has any branches or variable bit shifts in it (the
    // ...
    part of your code ;) ), Latios' approach will be just as fast as using for example a
    ushort8
    instead. Although I tend to like the latter approach more, since it result in fewer bitwise operations (which will probably still be compiled away, depending on implementation details - performance will most likely be optimal either way) and the intent of you writing shorts into a block of memory is expressed in a more clear way. PLUS, with my lib, you can use 256 bit vector types (
    float8 
    and friends) which either uses AVX SIMD vectors (they handle twice as much data) or exploit instruction level parallelism by handling 2 128bit vectors at once. Burst doesn't do it automatically since it considers vectorized code not to be "vectorizable" any further. I'm getting lost in micro optimizations again...

    But who knows... Maybe the specific job code will result in way faster machine code when using a
    short
    vector. I've seen it happen before but I cannot attempt predicting the result without looking at the details of your code.

    Vector4
    <->
    float4
    conversion is completely free (0 instructions) in Burst code for sure and I strongly suspect that it is even free in Mono JIT code. No worries there. But always stick to the "rule of 128". The vector you use should ALWAYS be a multiple of 128 bits or 16 bytes wide. It is very important for performance reasons.
     
    DragonCoder and Kmsxkuse like this.