Search Unity

Discussion Mono multithreading and burst which is faster? The tests didn't feel much different

Discussion in 'Burst' started by cai814037725, Dec 9, 2022.

?

Mono multithreading vs burst

  1. No difference

    1 vote(s)
    6.3%
  2. burst faster

    15 vote(s)
    93.8%
Multiple votes are allowed.
  1. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
    I am working on a broken plugin and would like to know the performance difference between the two.burst has a lot of syntax limitations, and if the difference is not significant there is no need to migrate the codebase.
     
    Last edited: Dec 9, 2022
  2. vectorized-runner

    vectorized-runner

    Joined:
    Jan 22, 2018
    Posts:
    398
    Burst + Job system is going to be much faster probably, depends on how you implement it
     
  3. CodeSmile

    CodeSmile

    Joined:
    Apr 10, 2014
    Posts:
    6,005
    You can easily see gains for Burst compiled code that is hundreds of times faster than regular C# code.
     
  4. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
    Do you have the test data? mono +IL2CPP+ Multithreaded Parallel.For is also fast. Maybe DOP is the future, and "Burst" is worth a try.
     
  5. Zuntatos

    Zuntatos

    Joined:
    Nov 18, 2012
    Posts:
    612
    Burst is separate from multithreading
    you can multithread with Parallel.For and call into burst code from there
    That said, I think the job system will work better for most short term jobs than using C# threads/tasks, due to the managed/gc overhead and such there
     
  6. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
    Using Burst is free of GC overhead, but not necessarily fast. If you are dealing with a lot of data add/modify/copy is slower than the reference type. That's my personal guess.
     
  7. Saniell

    Saniell

    Joined:
    Oct 24, 2015
    Posts:
    195
    Can we stop guessing and use profiler for once? You can pass structs by reference to avoid copies
     
  8. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
    I'm not familiar with Burst programming. Tutorial resources are relatively few, I will test it when I have time.:)
     
  9. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    Burst compiles the code directly into machine code with SIMD instructions using LLVM, while IL2CPP generates a C++ representation of the dotnet IL that is then compiled to machine code. The differences in performance between the two depends heavily on what kind of code you're feeding into them and how compatible it is with vectorization. If the code cannot be vectorized at all then the difference should be small.

    Vectorization means operating in multiple data elements with a single instruction. For example, let's say you have a loop that iterates through two int arrays (A and B) with 1024 items each, and writes the sum of each item into a 3rd array C. With normal scalar code, the CPU needs to operate 1024 times while with 128-bit vectorization it can do the same work using only 256 operations by reading, adding, and writing four elements at once.

    Vectorization also allows operating on all components of a Vector4, Vector3 and Vector2 at once.
     
    Last edited: Dec 15, 2022
    Ryiah and ElevenGame like this.
  10. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
  11. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
  12. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
  13. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27

    Today I did another 10,000 particle motion tests:

    mono +IL2CPP+ Parallel.For 74FPS-80FPS

    IL2PP+Burst 75FPS-80FPS

    ComputeShader 72FPS-95FPS

    The fact proves that the performance gap is not very big, the key is algorithm optimization. ComputeShader is about 3 times faster than Burst in pure computing.
     
  14. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
  15. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
    The Burst CPU usage rate is not high, so I don’t know where the settings are not accurate?
     
  16. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
    In fact, there is no point in discussing this issue. The ultimate goal is concurrent programming. Burst provides a more secure method. Either way, the code is the same.
     
  17. Zuntatos

    Zuntatos

    Joined:
    Nov 18, 2012
    Posts:
    612
    ... is that a benchmark taking FPS numbers from the in-editor statistics menu?

    That's a wildly inaccurate metric :)
     
  18. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
    The built project is the same, and the FPS of another statistical method is 50 frames. Of course, this statistics is unfair to ComputeShader, All the stress is on the GPU.
    .
     
  19. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
    Obviously, there is no high-performance and flexible method, no dictionary, no List, no HashSet how to write your code. That's the point.I don't know about the performance of Collections. Concurrent. It is difficult to write code with only arrays:)
     
    Last edited: Jan 6, 2023
  20. vectorized-runner

    vectorized-runner

    Joined:
    Jan 22, 2018
    Posts:
    398
  21. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
    The test is meaningless. I need to modify the way of thinking about writing code. Use array instead of dictionary, list. This way the code can run anywhere.
     
  22. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
    Rendering will affect the test results. If your rendering pressure is high, the FPS will be reduced. Therefore, the game CPU and GPU need to be balanced to have high performance. Burst is three times faster. ComputeShader is six times faster. Burst really improves performance.
    Of course, the test results are related to your computer configuration.;)
     
    Last edited: Jan 7, 2023
  23. VirtusH

    VirtusH

    Joined:
    Aug 18, 2015
    Posts:
    95
    Array: NativeArray<T>
    List: NativeList<T>
    Dictionary: NativeParallelHashMap<T, U>

    If you want to understand how fast burst is you need to benchmark burst code itself (Stopwatch class around the job would do it), or just use the profiler. The frame-rate is never an appropriate way to measure the performance of a small slice of code. Spoiler: Burst is much faster than you seem to think it is. I encourage you to explore it more, you'll be pleasantly surprised.
     
    MartinTilo likes this.
  24. Mortuus17

    Mortuus17

    Joined:
    Jan 6, 2020
    Posts:
    105
    This essentially boils down to two things, I'd say.

    1) What's the better compiler? Clang/LLVM or some Mono-JIT? Answer: Mono-JITs are a joke in comparison.
    2) What's better for performance - compile time validation or runtime validation? Answer: It's obvious.

    Overall answer: It's obvious.
     
    VirtusH likes this.
  25. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
    Burst is indeed faster than Mono, but it is not fast enough when there are a large number of ray calculations, and SIMD code is not easy to write. ComputeShader has no way to use List and HashMap, so you can only use Array. ComputeShader is much more difficult to write code. I think burst and It is better to use ComputeShader in combination. It would be great if burst can take advantage of GPU acceleration.:)
     
  26. Mortuus17

    Mortuus17

    Joined:
    Jan 6, 2020
    Posts:
    105
    It is indeed easy to write SIMD code with Unity.Mathematics. With C#
    unsafe
    you can even load
    int4
    SIMD vectors from a
    NativeArray<int>
    .
    There are many things that oppose the suggestion that Burst should... somehow make use of GPGPU: Some devices simply don't have a GPU... GPU compilation being completely different from CPU compilation... The GPU actually being the bottleneck in the vast majority of games while the CPU code sometimes doesn't even use multithreading, not even talking about taking 100% advantage of the CPU... You still being able to write compute shaders if you wish etc. etc. etc.
     
  27. cai814037725

    cai814037725

    Joined:
    Apr 25, 2020
    Posts:
    27
    I don't think the SIMD code is easy to implement, my code is not a simple array loop, the logic is very complex and has a lot of intersecting ray lines. To implement the SIMD splitting logic, one would need to create more arrays to store the data, which would make the computation slower and take up more memory. I'm not against Burst, I'm already using it, it's just not performing as expected. Currently the code takes between 4ms-70ms to process the mesh in one run, so I would like to use ComputeShader to speed up some of the code, if the device supports it.

    I've written this plugin to be less suitable for low end devices, where dynamically generating meshes requires a lot of performance, or to run the game after pre-processing, my goal is to be able to run it on PC devices.

    Burst performance is still good,so I'll finish my task in burst first.