Search Unity

  1. Unity 2018.3 is now released.
    Dismiss Notice
  2. The Unity Pro & Visual Studio Professional Bundle gives you the tools you need to develop faster & collaborate more efficiently. Learn more.
    Dismiss Notice
  3. We've updated our Terms of Service. Please read our blog post from Unity CTO and Co-Founder Joachim Ante here
    Dismiss Notice
  4. Want to provide direct feedback to the Unity team? Join the Unity Advisory Panel.
    Dismiss Notice
  5. Improve your Unity skills with a certified instructor in a private, interactive classroom. Watch the overview now.
    Dismiss Notice

Could Unity add in editor tips and hints on how best to write Burst code?

Discussion in 'Entity Component System and C# Job system' started by Arowx, Dec 16, 2018.

  1. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    6,683
    CPUs have batch based SIMD instruction sets that can literally run multiple vector ops a cycle, and in theory the Burst compiler should be able to take advantage of these features.

    The thing is as a programmer I don't have any idea of what Burst does or cannot do with the code I give it...

    Therefore if the super clever Burst compiler gave me the programmer some hints or tips based on how well my code compiled or how it could be improved to take advantage of SIMD instruction sets and other optimizations then wouldn't that massively improve what ECS can do or I can do with ECS?

    Burst is a black box technology to most of us, there will be a few clever programmers who dig into the code it generates and learn to tweak their code to get the best from it, but for the rest of us OK programmers some context relevant hints or tips in editor as feedback would be ideal.

    Also as ECS is so complex would some ECS tips for common misconceptions/mistakes also be useful for getting people up to speed...

    At least until you bring out the Visual ECS programming where we just present the problem we want ECS to solve in graphical form and you do the rest...
     
  2. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    903
    I would like to see some examples of convertible bad ECS code pattern, and what would be the tips given for that pattern.

    All I can think of are already in the form of library limitations that prevents you from going that way in the first place, like HPC# enforcement or aliasing prevention allowing good assembly to be made.

    Then Burst can vectorize NativeArray iteration loop automatically, which basically SIMD the linear access for you. What other kind of automatic SIMD optimization would you like to have? Like if you have 3 floats adding with the same value and be given a tip to combine it into float3 in the first place? (It would be quite bizzarre if it tells me to change my data structure design)
     
    Lurking-Ninja likes this.
  3. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    6,683
    Check out this thread https://forum.unity.com/threads/branch-misprediction-in-systems.594241/

    Here a developer is looking into how to optimize their ECS system, they get some feedback and benchmark a few different approaches. I ask for them to share their code and they do they are getting the distance of between two points then normalizing this direction vector, so they are doing the four calculations when they only need to do 3 (direction, magnitude, normalize by division).

    This is in the inner loop of an ECS system their original code had function calls within this inner loop.

    What if ECS warned you that calling a function has a big overhead that can reduce the performance of your system, or repeating the same calculations is less than optimal.

    It's just good optimization tips and tricks that it takes time to learn and more time to test and profile to ensure they improve the compiled codes performance, however the developers of the Burst compiler probably know every trick and tip in the book the trouble is we don't so we can be giving the Burst compiler messy and slow code that it would struggle to optimize.

    If you are a good or great programmer then you won't need any hints or tips as you will provide the Burst compiler with streamlined data and inline vectorised code that it will make great SIMD code from, for the rest of us a few tips and hints could mean the difference between thousands of cool effects and millions.
     
    Sylmerria and GliderGuy like this.
  4. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    903
    Ok, I had been to that thread. I think those optimal optimizations are pretty specific to the problem? And we understand it because we are human and we know the overall context. How can a machine tell if the function call (which could get unknown amount of variable data to work on) is big or small before running other than preventing function calls from happening at all, which is very drastic. Then we can't use properties too, and also indexers are properties and properties are functions.

    Plus Burst already inline things, so function call not excluded from HPC# subset seems to be an intended behaviour that is still performant by default.

    How to determine a workload of a code without actually running? What is the definition of "inner/tight loop" that is possible to detect from assemblies, without human looking on it?

    And if it is possible to develop algorithm that could tell in any algorithm which repeated calculations (that is not consisted of constant values) are a wasted mistake instead of intended behaviour of the algorithm, that alone could be a big thesis paper. Imagine throwing that to optimize everything... even within HPC# constraint we could still get outside data from native containers. It is still uncertain until runtime if it is a wasted calculation or not.

    I am interested to know some of the "every trick and tip in the book the trouble is" that is detectable by pattern/machine so I can at least avoid them manually first. I believe this area still requires human support. For a machine to be able to do it, other possibilty maybe train a neural network with bad patterns until it magically find one for us but then NN is good for results but not good for explaining why.
     
    Last edited: Dec 16, 2018
  5. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    655
    I would be happy if the burst inspector could print the cpu cycles for the generated assembler lines
     
    eizenhorn, FROS7 and 5argon like this.
  6. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    6,683
    This kind of feedback summary with hardware SIMD options/instructions that Burst could use if the program takes advantage of a different approach or uses a Burst optimization.
     
  7. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    655
    I am expecting that the compiler does all the optimizations for me without much feedback. There will be of course a limit to this (I am not expecting magic).

    If we could see the cpu cycles (intelligently summarized for branches / code paths) we could however, as an alternative to profiling, assess the effect of changes to our c# code on final assembly (if we don’t want/can to go there directly)
     
  8. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    903
    Since C# jobs are self contained to some degree, I think Unity-specific compiler optimization to allow rapid iteration on designing jobs is very possible. Like a super-incremental compiler which knows not just assembly level but job/struct level. Imagine editing a job code and the assembly updates instantly just like you can hot edit shader file and see the changes instantly. Dock a small panel locking to that job struct, set a reference compile and see before-after cpu cycles. That would be a dream to work on. Currently I had to take screenshot and compare. Number of cycles would look much more objective.
     
    Jes28 and sngdan like this.
  9. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    655
    Exactly this.
     
  10. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    655
    @Joachim_Ante

    Is adding cpu cycles to the assembly something you could consider - found myself today again in a situation, where like @5argon I have to copy code in text editor and go line by line....