Search Unity

  1. Looking for a job or to hire someone for a project? Check out the re-opened job forums.
    Dismiss Notice
  2. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

bolt performance vs C#

Discussion in 'Visual Scripting' started by zeropointblack, Nov 11, 2020.

  1. zeropointblack

    zeropointblack

    Joined:
    Jun 8, 2020
    Posts:
    147
    has anyone done or seen any kind of legitimate benchmark testing? and what kind of performance hit are we talking here? PS5 for example. pretty powerful hardware. would that render the difference basically negligible?

    and the figure of 5 to 8 times performance hit. that is to run the code, correct? so like 5 times as much code, right? obviously not triangles and framerate. so then what is the true difference?

    is it just not worth it to make a bolt to C# official automatic conversion in a game build output? how about doing it manually when the project is nearing completion? worth the time and effort?
     
  2. stuksgens

    stuksgens

    Joined:
    Feb 21, 2017
    Posts:
    134
    First, we have to define certain things... The performance of Bolt1 is really inferior to that of native C #. (This is a fact)

    But not as horrible as everyone says. this usually happens because of more complex codes that use reflection, and this leads to poor performance... This does not mean that the visual script is 5 times heavier than C # code, just that in some extreme cases it will perform less than 5x... but in practice, it is only inferior, like running the game with 2 to 5 less frames... (assuming you don't have complex codes)

    but, back to your question, is it worth rewriting the entire bolt code in C#?

    Of course not.:)

    Try to optimize the heavier codes (such as terrain generation, pathfinder, among others) these codes are usually quite heavy, so rewriting in C# should significantly improve the performance of your game...

    Simple codes (even if rewritten) will make no difference. because the performance gain in C# will be negligible... you can use the profile, it will be of great help for you to identify the most complex codes...

    ---

    A good point is that the unit plans to change the backend of the native visual script at some point, so performance should improve a lot in the next updates... so we just have to wait

    And unfortunately I don't have current comparison data (after all, putting cubes to rotate is not a valid example), but at some point I intend to make a comparison video about it because I'm also curious (never test a direct comparison "in battle");)


    I hope I answered your question:D
    And whoever has more complete data, could share please, it would be really cool:p
     
  3. valentinopower154

    valentinopower154

    Joined:
    Jul 29, 2020
    Posts:
    4
    my opnion: bolt is easy to use and very complete but after make only one graph fps from 600 goes down to 120
     
  4. theor-unity

    theor-unity

    Unity Technologies

    Joined:
    Feb 26, 2016
    Posts:
    171
    We're working on a new backend. Our preliminary testing gives us tremendous perf improvement (10x to 15x - however we still need to implement a few feature that might change that).

    A typical AAA production will usually rewrite some visual scripts at the end when it's time to ship, as the purpose of visual scripting in that case is mainly to allow designers to iterate without a programmer; however once the iteration is done and the implementation locked down, a dev might reimplement it in code.

    Another approach is to write units for the perf intensive parts of your graphs and to use that instead of complex subgraphs.
     
    Haneferd likes this.
  5. Gladyon

    Gladyon

    Joined:
    Sep 10, 2015
    Posts:
    296
    If what makes Bolt1 slow is reflection, an efficient way to optimize it is to do the reflection only once, and cache the results.
    It's even possible to access members by creating access methods on the fly with only 2-3 IL instructions.
    Of course, it won't work on mobile...

    I'm curious about how you optimized it.
     
  6. zeropointblack

    zeropointblack

    Joined:
    Jun 8, 2020
    Posts:
    147
    looks like Bolt 2 would have been pretty cool. too bad unity canned it.

    ive since gone back to converting my code to C# manually (helps me learn more). bolt 1 is great for concepting code and troubleshooting. i still use it for that.
     
  7. theor-unity

    theor-unity

    Unity Technologies

    Joined:
    Feb 26, 2016
    Posts:
    171
    Bolt1 is already doing that. Still slow. The other major bottleneck is the constant boxing of values.

    That's a no go for il2cpp platforms.
    First, it's all experimental, so no promises:

    The current prototype we have is using a separate graph optimized for runtime, created automatically from the bolt graph. This graph is using codegen to generate runtime nodes, and so it gets rid completely of reflection (excepted if you actively force it to use reflection).
    Also, it doesn't box primitive values. The goal eventually is to get to 0 boxing all the time if possible.
    The last source of perf gain is that, as we generate a separate graph for runtime, VS becomes a compiler ; we can do the same kind of optimization a compiler would do (strip unused nodes, pre-compute constant expressions, etc)
     
    SugoiDev and stuksgens like this.
  8. Gladyon

    Gladyon

    Joined:
    Sep 10, 2015
    Posts:
    296
    Thanks for the answer.

    It's true that boxing is a real killer. It takes a lot of effort to remove it from generated code.
    If you manage to completely get rid of reflection and boxing, then you'll have something really fast, good job!
     
  9. stuksgens

    stuksgens

    Joined:
    Feb 21, 2017
    Posts:
    134
    Today I decided to do a simple performance test of C # vs Visual Scritp in its current state (Unity 2021.1b4)

    As I said before, this test is not very suitable since it does not really represent everything in a game, they are just rotating objects, yet it can be a good one for those who want to know the actual current performance of the visual script.

    I used an Intel I5 2500 processor with a 640x480 resolution scale. There are approximately 1000 objects rotating (using transform.Rotate)

    C#:
    min 4.78ms
    max 6.20ms
    avg 5.25ms | 190.5 FPS

    BOLT (VISUAL SCRIPT)
    min 37.70ms
    max 50.12ms
    avg 39.75ms | 25.2 FPS

    Visual Script in this scenario is about 8x slower than C#.:eek:

    I hope that this new backend can solve this problem, after all, performance is currently the worst part of the visual script:(


    Edit: I updated the FPS to be based on the average FrameTime, not the FPS Stats
     
    Last edited: Feb 11, 2021
  10. theor-unity

    theor-unity

    Unity Technologies

    Joined:
    Feb 26, 2016
    Posts:
    171
    Sadly that's the usual trade-off for visual scripting solutions. We are working however to reduce that gap.
     
    stuksgens likes this.
  11. stuksgens

    stuksgens

    Joined:
    Feb 21, 2017
    Posts:
    134
    Yes, I recognize that. :)

    The ideal would be to have the same performance as C#. However, this is very difficult to implement... (only via code generation)But if the new backend can reduce the difference to 1/2 the performance of C#, that would be more than enough to be useful on mobile devices, which is currently the worst use case.:D

    Another factor is the snipers nodes. they do not suffer from problems of reflection. so moving some commonly used nodes to this format can be interesting...;)
     
  12. stuksgens

    stuksgens

    Joined:
    Feb 21, 2017
    Posts:
    134
    And one factor that few people can notice is that the editor will never compile your code in visual script:D.

    And that makes a BIG difference... :eek:

    Believe it or not, it was MUCH faster for me to write this simple code in visual script than in C #. That's because the editor took much longer to compile the code...

    So let's assume that you had corrected several C# scripts, the time it takes you to change just one line is long, multiply by 10 and you will have a problem, but with the visual script this compilation time disappears completely.:cool:

    So in terms of prototyping, the Unity visual script has a huge advantage over C# precisely because of the speed of iteration, so in fact the biggest problem today is the lack of performance, because if it is a little better (1/2), we nor would you need to rewrite the "heavy code" in C#. because the iteration time would cover that already.
     
    Haneferd likes this.
  13. theor

    theor

    Joined:
    Jul 15, 2014
    Posts:
    1
    That's what we're aiming for ! we have a prototype that codegens nodes (eg. Transform.setLocalRotation) to get rid of the reflection, but only when you're ready for a domain reload (so until then it still uses reflection to give you speed of iteration, but once you click somewhere, it batch codegen all nodes not done yet)
     
    SugoiDev and stuksgens like this.
  14. Haneferd

    Haneferd

    Joined:
    Feb 3, 2018
    Posts:
    33
    Interesting and nice comparison. Could you elaborate a little bit on the measuring technique or tool you used to get the figures?
     
  15. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    712
    That's not 4x, it's roughly 7.5X slower. When discussing performance, it's the time that matters, not the framerate.

    Bolt is a run-time scripting virtual machine, and a 5 to 10x slower performance compared to native code (well, at least JIT'ed C#) is quite normal.
     
  16. stuksgens

    stuksgens

    Joined:
    Feb 21, 2017
    Posts:
    134
    I agree. I used the frame rate as a base instead of the average frame time because it's easier for other people to understand.:p
    But you pointed out a detail, the execution time in the compilation... maybe IL2CPP can make a difference in this case, maybe yes... Another point is the way I did the test, I always use the profile for internal tests, and I know that it generates a certain noise in the tests, and it usually affects the performance also whenever it is open, so I should have used the Profile Stand-alone, and not the "common profile". :rolleyes:

    So I'm going to retake the test this week to be more precise, and include this information too. ;)
     
    Haneferd likes this.
  17. PanthenEye

    PanthenEye

    Joined:
    Oct 14, 2013
    Posts:
    130
    In my personal tests Bolt is roughly 5 times slower than C# on average with the biggest hit coming from Bolt's event system (Start, Update, Custom Event, etc). Interestingly, Flow Canvas works with a very similar reflection based runtime, and it runs only 60% slower than C# and over 3 times faster than Bolt. One thing I noticed is that Flow Canvas has an Update manager (only has 1 update instance no matter how many objects you instantiate), so it might particularly excel is tests that employ Update.

    I've also tested Flow Reactor, but despite it not being reflection based, it's about 40% slower than Flow Canvas and has a pretty busy custom node syntax and unexpected limitations like nodes not being able to reference self (this). It also has received no updates in a while.
     
    Haneferd and stuksgens like this.
  18. stuksgens

    stuksgens

    Joined:
    Feb 21, 2017
    Posts:
    134

    Interesting:)

    The "update", "start", etc. methods are called via C++ and this creates an overhead, unity has already commented on this in an article posted on the blog. this affects even the C# code itself.

    https://blogs.unity3d.com/en/2015/12/23/1k-update-calls/

    theor-unity Is it possible to include this in the visual script through nodes snipers? (i.e. call the update method and others natively using C#, instead of C++?)
    I know I can do this through unityEvents, but if it were native it would be much better:p
     
    Haneferd likes this.
  19. theor-unity

    theor-unity

    Unity Technologies

    Joined:
    Feb 26, 2016
    Posts:
    171
    I'm not sure that's the bottle neck here. Let me try the same benchmark (1000 rotating cubes) and get back to you
     
    stuksgens and Yavvn like this.
  20. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    712
    Even if the reflection results are cached, calling methods via MethodInfo or reading/writing fields via FieldInfo should be more expensive than doing it natively due to the extra indirection. There's room for improvement, but an interpreted language can only go so far because of that.
     
  21. theor-unity

    theor-unity

    Unity Technologies

    Joined:
    Feb 26, 2016
    Posts:
    171
    Interpreted does not equal reflection. You can interpret a graph made of nodes using no reflection. That's what's we're experimemting with right now (and seems, no promises, to give a consistent 12x speedup)
     
    stuksgens likes this.
  22. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    712
    I meant indirection when it comes to interpreted languages, as the what code will be executed next is stored in data.

    But good to hear about performance improvements! But wouldn't calling external C# functions still need reflection, similar to SendMessage()?
     
  23. theor-unity

    theor-unity

    Unity Technologies

    Joined:
    Feb 26, 2016
    Posts:
    171
    the prototype we have can codegen nodes for external functions - use the nodes, click on the warning "some nodes are using reflection", it generates them.
     
    stuksgens likes this.
  24. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    712
    Oh, that's nice! Didn't know you were planning to use codegen on Bolt. Yeah, that opens up a lot of optimization possibilities.
     
  25. stuksgens

    stuksgens

    Joined:
    Feb 21, 2017
    Posts:
    134
    Did the test again but there was no major change. except that I was right about the profile and stats...

    The test was performed on Unity 2021.1.5f, (HDRP 11)
    The scene has no lighting, and has no effects, just the camera and the cubes
    The resolution is the lowest possible (640x480px) this so that the limiter is the scrips


    C# (222 FPS)
    • Avg- 4.50ms
    • Max - 6.90ms
    • Min - 4.32ms
    Visual Script (25 FPS)
    • Avg - 40ms
    • Min - 38ms
    • Max - 51ms

    the other time I used Stats FPS, but researching more about how it works, it doesn't really represent the actual FPS. So I took advantage and fixed my previous post;)


    How the test was done:

    I open the profile, set the maximum time to record (2,000 frames) and wait a while until the editor normalizes the frame rate.

    So I check in which frame you hear a large consumption of resources in the scripts (with the exception of GC) and where you hear the lowest consumption ...

    the average frame time is based on the average frame of the same value. (this represents how the game will behave most of the time)

    And this time, the FPS is based directly on the average FrameTime.:)
     
    Haneferd likes this.
  26. gfrast

    gfrast

    Joined:
    May 5, 2014
    Posts:
    25
    I made some tests today and noticed that its possible to include a flow-graph in an AssetBundle and dynamically load behaviours, which could be a huge game-changer for a project that i'm working on.

    Obviously, converting such a graph into code (for the performance gain) before building would break this functionality.

    Is this only possible because Bolt currently uses reflection, and will further versions and optimizations will get rid of this feature?
     
  27. theor-unity

    theor-unity

    Unity Technologies

    Joined:
    Feb 26, 2016
    Posts:
    171
    It is possible because you graph uses only code already in the build. Reflection is one way but not the only one. Node based codegen would work too as long as you shipped the nodes codes before. Graph based codegen would not.
     
    Haneferd likes this.
  28. gfrast

    gfrast

    Joined:
    May 5, 2014
    Posts:
    25
    Just to make sure i understood you correctly:
    Graph based codegen = Translate a node-sequence of a graph into code (i assume c++ or c#)
    Node based codegen = Translate the nodes into code, but keep the graphs to store the used node sequence as graph-asset.​

    So in theory, if we would only use standard Mono-Nodes or custom built nodes that ship with the game, we could as well build assetbundles that contain graphs using only those nodes?

    Basically, the only downside would be that you cant add custom nodes anymore afterwards, correct? Hope i understood you correctly - that would be awesome!
     
    Haneferd likes this.
  29. theor-unity

    theor-unity

    Unity Technologies

    Joined:
    Feb 26, 2016
    Posts:
    171
    exactly !

    it does take a bit of planning - you need your build to contain all nodes you might use later. worst case you can always update your app after a bunch of dlcs to includes more nodes, but in the meantime, you can do multiple asset bundle only updates containing no new code, only modified graphs.
     
    Haneferd and gfrast like this.
  30. gfrast

    gfrast

    Joined:
    May 5, 2014
    Posts:
    25
    thanks, thats great news, theor!

    another question - is there currently a way to kill a running co-routine event using the built-in units? i know you can pause a timer, but it would be more efficient to stop a whole co-routine if necessary in cases where you want to avoid having more of them running in parallel.

    @stuksgens
    i was curious about the performance as well, so i made a couple checks. i tested one cube and rotated it 500.000 times within one frame - these are the average results:

    C#: ~144ms
    Bolt: ~5600ms
    C# via Bolt: ~143ms
    Bolt custom node: ~146ms

    What i take from that is that if you know that a task takes up a lot of performance, you can just convert it into a custom node and get essentially the same speed that you'd get when calling it from C#.

    Of course you wont convert everything into custom nodes, but you should be able to avoid a real performance impact doing that.

    PS: I measured using the System.Diagnostics StopWatch
     
unityunity