Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Half4 <-> Float4 is very expensive

Discussion in 'Entity Component System' started by sean_virtualmarine, Oct 1, 2021.

  1. sean_virtualmarine

    sean_virtualmarine

    Joined:
    Dec 1, 2017
    Posts:
    11
    I'm processing a bunch of texture data (NativeArray<half4>) in a Job and I did a deep profile on it and I discovered that something like 70% of my processing is just implicitly converting half4's to float4's because halfs don't have any operators (+, -, *, /) defined.

    I'd prefer not to write bitwise float operations myself. But I guess it's either that, convert the whole array during asyncGPUReadback, or using float4 textures on the GPU across the board.

    Anyone with enough experience to weigh in? Am I on a fools errand?
     
  2. apkdev

    apkdev

    Joined:
    Dec 12, 2015
    Posts:
    277
    Yeah, CPUs have no hardware half support. You're essentially emulating it in software, and that's pretty slow. I think the idea is that you do all your math in float and invoke conversion between half and float as rarely as possible.
     
    sean_virtualmarine likes this.
  3. Mortuus17

    Mortuus17

    Joined:
    Jan 6, 2020
    Posts:
    105
    There is native hardware support for 16 bit floats in very specific intel CPUs https://en.wikipedia.org/wiki/AVX-512#BF16
    But until Burst supports AVX512 and its' numerous instruction set extensions(LLVM, the compiler behind Burst does support it), your best bet is to convert your half vectors to float vectors explicitly at the very beginning of your job, perform your math on floats and convert back to half vectors when writing back the results.

    Also, F16C (the instruction set extension that adds hardware support for half<->float conversion) is bound to AVX2 being supported, in Burst at least. Without AVX2, conversion goes up from ~4 clock cycles to at least 30. Previous to Burst 1.5 that was also the case with AVX2 being supported, so make sure to have that Burst version installed at least (yay me - I reported that bug).
    If you're not compiling for X86, I THINK there's nothing you can do about it at all. But yeah - I only know X86 so I really don't know for sure.
     
    Last edited: Oct 2, 2021
    sean_virtualmarine and apkdev like this.
  4. sean_virtualmarine

    sean_virtualmarine

    Joined:
    Dec 1, 2017
    Posts:
    11
    My saving grace is that I don't actually have to write back to the texture, I just have to read from it, a lot, at semi-random locations. (It's half4's are simulated ocean-wave data, and my job is doing buoyancy)
     
  5. SamOld

    SamOld

    Joined:
    Aug 17, 2018
    Posts:
    333
    Have you tested doing the conversion during readback? I don't know how that's implemented, but if the conversion happens on the GPU it might use the fast hardware conversion mechanism used by sampling, although it would double the readback bandwidth and I have no idea how that tradeoff would go.