Search Unity

Weird performance behaviour with Jobs

Discussion in 'Data Oriented Technology Stack' started by xman7c7, Dec 19, 2018.

  1. xman7c7

    xman7c7

    Joined:
    Oct 28, 2016
    Posts:
    28
    Hello,
    I have got little problem with performance about calculations using Jobs, Burst and SIMD.

    About what I am doing: some mesh manipulation using noise generators which can create terrain. I have got only problem with iterating through entities and filling noise value. For short I will talk only about iterating through noise values and not how i create structures and so on.
    When i have started doing this mesh generation i have structure for vertices like and using IJobParallelFor as job

    Code (CSharp):
    1. public struct Vertex : IBufferElementData
    2. {
    3.     public float3 Value;
    4. }
    Capacity and Size of buffer array is set with [InternalBufferCapacity(Settings.Size)] to not access heap memory
    which represent point in space - x/y/z. After some Investigating i have seen that i don't need x/z in my case and remove it. So i stick only with y

    Code (CSharp):
    1. public struct Vertex : IBufferElementData
    2. {
    3.     public float Value;
    4. }
    After this change my noise iterator suddenly slow down 2x - 3x. I thought that it is because of SIMD float3 vs float. So i have change how i create noise, but still same problem. So I started to look what I am doing wrong and i find this.
    I have created various options of data structure for Vertex (NoiseValue) like:
    1x float1 - 1x1
    1x float4 - 1x4
    4x float1 - 4x1
    4x float4 - 4x4
    After that i have found that 1x1 and 4x1 is slower than 1x4 and 4x4. But this didn't make sense and if you use 1x4 and 4x4 to fill only for 1x4 1 value and 4x4 4x values an other values will be waste ... only increase memory size. All of this was using IJobParallelFor with chunk. So i was thinking can be problem with this

    Code (CSharp):
    1.  
    2. public NativeArray<ArchetypeChunk> Chunks;
    3.  
    4. public void Execute(int index)
    5. {
    6.     var chunk = Chunks[index];
    7.     ...
    8. }
    that Chunks are big ? So i have made various test cases ... see source code and get this (all timers are in ms)

    stats.png

    Strange is that bigger slow down is only see for 127, 64 resolution not for 255 (there is still slow down but not so much)

    It's like IJobParallelFor is slower in some situations or can this be done by burst compiler?
    I still don't know where is there problem ... can this be done by cache miss in those specific setups? or I am doing something wrong?

    Any performance differences between IJobParallelFor vs IJobChunk vs IJobProcessComponentData

    You can see source code, where important files/folders are:
    \Assets\WorkSpace\Utils\Settings.cs - setup data
    \Assets\WorkSpace\Systems - tested systems
    \Assets\PerformanceTests\TestSettings.cs - test options
    *
    \Assets\WorkSpace\Utils\NoiseSettings.cs - noise input, but it's not important

    other files are not important, they are only some kind of wrapper for time measuring and better usage
    Creating and interating for different data structures is almost same. I have tried to skip as much code as i don't need to demonstrate this problem.

    BTW if you see source code of systems 1x4 and 4x4 data type You can argument that is not same compare to 1x1 and 4x1. Because of assigning noise value only to x
    Code (CSharp):
    1. noise[j].Value.x = result;
    but in measuring test it is almost same

    DiffrentAssign.png
     

    Attached Files:

    Last edited: Dec 19, 2018