Search Unity

Question Clarification on ComputeBuffer.SetData layout requirements?

Discussion in 'General Graphics' started by SamOld, Nov 4, 2023.

  1. SamOld

    SamOld

    Joined:
    Aug 17, 2018
    Posts:
    333
    Hi, I'm looking for a little clarification on the general case layout requirements when passing structs to
    ComputeBuffer.SetData
    with reliable cross platform support. Looking around online, in some cases I see people doing it without specifying
    LayoutKind
    at all, and in some cases I see people simply summing the size of individual fields without any attention paid to things like alignment and padding.

    What are the actual requirements here?

    Does a struct with
    LayoutKind.Sequential
    reliably match the same struct definition in HLSL on all platforms?
     
  2. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,014
    Hi!
    No.

    If you need cross-platform compatibility, don't use 3-component vectors. Vectors should be sorted by the number of components, from highest to lowest. If you have an array, use 4-component vectors inside.
    The safest option is to always use 4-component vectors everywhere.
    All in addition to sequential layout, ofc.
     
    KimmoFactor, ekakiya and SamOld like this.
  3. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    666
    I think for constant buffers you need to use what is called std140 layout in OpenGL. That's a cross-platform layout as far as I know.
    You can use 3 component vectors but you always have to align them on 16 byte boundaries and pad them with a 4th component.
    So you could do something like this:

    float3 MyVec3;
    float Padding;

    float3 AnotherVec3;
    int MyInt;

    float2 MyVec2;
    float2 AnotherVec2;

    float2 YetAnotherVec2;
    int AnotherInt;
    float AnotherFloat;

    That's how it would look on the CPU side. On the GPU, if you have two 3 component vectors next to each other, the padding is implicit.

    Code (CSharp):
    1.     When using the "std140" storage layout, structures will be laid out in
    2.     buffer storage with its members stored in monotonically increasing order
    3.     based on their location in the declaration. A structure and each
    4.     structure member have a base offset and a base alignment, from which an
    5.     aligned offset is computed by rounding the base offset up to a multiple of
    6.     the base alignment. The base offset of the first member of a structure is
    7.     taken from the aligned offset of the structure itself. The base offset of
    8.     all other structure members is derived by taking the offset of the last
    9.     basic machine unit consumed by the previous member and adding one. Each
    10.     structure member is stored in memory at its aligned offset. The members
    11.     of a top-level uniform block are laid out in buffer storage by treating
    12.     the uniform block as a structure with a base offset of zero.
    13.  
    14.       (1) If the member is a scalar consuming <N> basic machine units, the
    15.           base alignment is <N>.
    16.  
    17.       (2) If the member is a two- or four-component vector with components
    18.           consuming <N> basic machine units, the base alignment is 2<N> or
    19.           4<N>, respectively.
    20.  
    21.       (3) If the member is a three-component vector with components consuming
    22.           <N> basic machine units, the base alignment is 4<N>.
    23.  
    24.       (4) If the member is an array of scalars or vectors, the base alignment
    25.           and array stride are set to match the base alignment of a single
    26.           array element, according to rules (1), (2), and (3), and rounded up
    27.           to the base alignment of a vec4. The array may have padding at the
    28.           end; the base offset of the member following the array is rounded up
    29.           to the next multiple of the base alignment.
    30.  
    31.       (5) If the member is a column-major matrix with <C> columns and <R>
    32.           rows, the matrix is stored identically to an array of <C> column
    33.           vectors with <R> components each, according to rule (4).
    34.  
    35.       (6) If the member is an array of <S> column-major matrices with <C>
    36.           columns and <R> rows, the matrix is stored identically to a row of
    37.           <S>*<C> column vectors with <R> components each, according to rule
    38.           (4).
    39.  
    40.       (7) If the member is a row-major matrix with <C> columns and <R> rows,
    41.           the matrix is stored identically to an array of <R> row vectors
    42.           with <C> components each, according to rule (4).
    43.  
    44.       (8) If the member is an array of <S> row-major matrices with <C> columns
    45.           and <R> rows, the matrix is stored identically to a row of <S>*<R>
    46.           row vectors with <C> components each, according to rule (4).
    47.  
    48.       (9) If the member is a structure, the base alignment of the structure is
    49.           <N>, where <N> is the largest base alignment value of any of its
    50.           members, and rounded up to the base alignment of a vec4. The
    51.           individual members of this sub-structure are then assigned offsets
    52.           by applying this set of rules recursively, where the base offset of
    53.           the first member of the sub-structure is equal to the aligned offset
    54.           of the structure. The structure may have padding at the end; the
    55.           base offset of the member following the sub-structure is rounded up
    56.           to the next multiple of the base alignment of the structure.
    57.  
    58.       (10) If the member is an array of <S> structures, the <S> elements of
    59.            the array are laid out in order, according to rule (9).
    -------------------
    Correction: Unfortunately, this isn't true for Metal, as aleksandrk pointed out below.
     
    Last edited: Nov 13, 2023
    SamOld likes this.
  4. SamOld

    SamOld

    Joined:
    Aug 17, 2018
    Posts:
    333
    Thanks for the reply!

    The documentation could be a little clearer in this area, and this seems like a prime candidate for a C# analyzer! It would be cool if the compiler could check all types passed to
    SetData<T>
    and other similar methods for reasonable layouts. It's pretty dangerous to have an API like that which implies that any type will work, when the restrictions are actually so heavy.

    That's a great resource @c0d3_m0nk3y, thanks!
     
  5. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,014
    On Metal the size of `float3` is 16 bytes. So if you add padding the way you described, the layout will be broken. There is a packed float3 type that is 12 bytes, but it's recent enough to not be supported on all relevant devices.
     
    SamOld likes this.
  6. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,014
    Definitely :)
    We have something like that in the backlog.
     
    SamOld likes this.
  7. SamOld

    SamOld

    Joined:
    Aug 17, 2018
    Posts:
    333
    If we use 1, 2, and 4 component types (mixing and matching between floats and ints) is that guaranteed to be good?

    Cool, I'd love to see it hit the frontlog, as it were. Perhaps if you could provide a technically precise specification of the behaviour (including all of the little details like that Metal thing which I wouldn't have known from your first reply) the community could build it for now? I'd be happy to make it. With all due respect, when I hear a feature is in the Unity backlog (especially a feature relating to a years old API) I assume it won't appear anytime soon.
     
  8. SamOld

    SamOld

    Joined:
    Aug 17, 2018
    Posts:
    333
    I suppose the main thing we would need here is a comprehensive list of the layouts of all HLSL types, including information about how they may vary across all platforms, including with nested structs.
     
  9. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,014
    No, each has to be aligned on this type's width. That ism if you have a
    Code (CSharp):
    1. float2 a;
    2. float b;
    3. float2 c;
    you'll have 4 bytes of padding between `b` and `c`, because the start of `float2` needs to be a multiple of `float2`.

    Let me check, I believe someone was actually looking into it.
     
    SamOld likes this.
  10. SamOld

    SamOld

    Joined:
    Aug 17, 2018
    Posts:
    333
    As long as they're arranged largest to smallest that should be good then, right?

    Cool! Are analyzers like this the type of thing that would make their way into older versions, or would this be a 2023+ thing? Obviously an analyzer is theoretically an easy drop in for an old version. I'll probably just build one myself soon actually, it shouldn't be hard at all. But it would be good for others if it was done by default!
     
    Last edited: Nov 8, 2023
  11. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    666
    Thanks for pointing that out. Didn't know that.
     
    aleksandrk likes this.
  12. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,014
    Yes.
    Technically, it's a feature, and we normally don't backport those.
    Enabling it by default would be great, but it needs to be done in several stages. Otherwise if someone updates to a newer version, they would suddenly start getting errors/warnings out of nowhere.

    Looks like it has been deprioritised, unfortunately :(
     
  13. SamOld

    SamOld

    Joined:
    Aug 17, 2018
    Posts:
    333
    Ah well, I'd be lying if I said I didn't expect that outcome. Thanks anyway.