Search Unity

I think I've reached the limit of how far I can push c#. Should I switch to C++?

Discussion in 'Scripting' started by yoonitee, Feb 15, 2019.

  1. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Things like managed lists, arrays, function calls and Vector3 math are becoming a bottleneck in my procedural game.

    I like c# because it will compile to any system. But I usually get 90% of sales from PC and maybe 10% or less from Mac.

    I want to get more performance. I tried compiling with IL2CPP but didn't notice really that much difference.

    So perhaps, I'm thinking should I consider porting some of the procedural code to do with mesh calculations into a C++ library? Or maybe try to get more performance from unsafe c# code?

    I'm going to try to use this library: https://github.com/Unity-Technologies/Unity.Mathematics to get faster math calculations.

    What tips can you give me? Is it worth the risk of unsafe code using C++ or should I stick to C#?
     
  2. Joe-Censored

    Joe-Censored

    Joined:
    Mar 26, 2013
    Posts:
    11,847
    So, what percent of your CPU cycles does the profiler say is occupied by your scripts, and what FPS are you hitting at that percentage?
     
  3. StarManta

    StarManta

    Joined:
    Oct 23, 2006
    Posts:
    8,775
    Unlikely. Currently, C#-compiled code is something like 75% of the speed of C++ compiled code, and the overhead of the bridge between C# and native will likely more than cancel out the meager gains.
     
    yoonitee likes this.
  4. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,187
    We would need to know how you've written your code to make anything other than a wild guess. Like @StarManta pointed out there isn't that much of a performance difference between C# and C++, but there is overhead involved in working with C++ and that overhead has the potential to eliminate any benefits from C++. Or worse yet you can even lose performance.

    Just as an example I've linked an article that benchmarked the performance between C# and C++, between C# and the Unity APIs, and between C++ and the Unity APIs. Performance was higher for the first two (C# <-> C++, C# <-> Unity APIs) than for the last one (C++ <-> Unity APIs).

    https://jacksondunstan.com/articles/3952
     
    yoonitee likes this.
  5. Gladyon

    Gladyon

    Joined:
    Sep 10, 2015
    Posts:
    389
    C# is not slow compared to C++.
    If you already have optimized your algorithm and your data layout, you can still have some interesting micro-optimizations in some cases.
    For example, when dealing with Vector3 you should affect the 3 components independently.
    Code (CSharp):
    1. // This is faster than 'dst = src;'
    2. dst.x = src.x;
    3. dst.y = src.y;
    4. dst.z = src.z;
    You can even access the components using the Vector3 as an array (using [0], [1], and [2] instead of x, y, and z), I haven't checked if it's faster or not, but it may make an algorithm more generic and remove a lot of duplicated code.


    But honestly, these micro-optimization won't get your more than 20-30% in very, very specific algorithms.
    The best way to optimize is always to:
    1. profile to find out where the time is spent
    2*. have an algorithm adapted to the situation (which includes lock-free multi-threading if possible)
    2*. have a good data layout in order to reduce CPU cache misses
    3. micro-optimize
    (*) Depending on the situation, a fast algorithm may be faster than a slower one with a better data layout, or slower than a slower algorithm with a good data layout.

    If you're computing things on a large number of data I suggest that you take a look at the Unity ECS, the Unity jobs, and Unity's Burst compiler.
    Depending on your situation, you may have a drastic improvement of the performances by using these 3.
     
  6. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Yes I'm already doing this for vector math. Which is a bit of a pain and makes the vector math libraries pretty useless. Like writing the cross product in components etc.
    Problem with profiling is that it isn't very accurate. Especially Deep Profiling. You end up trying to optimize things that don't need optimizing because they were only slow when being profiled!

    The profiler is telling me that the Add() function for lists is a bottleneck. So I'm thinking of changing most lists to arrays if possible which is difficult when you don't know the eventual size of the array.

    And you can't deep profile a built app.

    The amount of time necessary to learn or test all the ways you can write c# code to be fast seems to outweigh the benefits of just using C++ in some cases. (Like the vector example above).

    IDK. It's frustrating. I mean the profiler did help a bit when I found a piece of code which was inside a for-loop which shouldn't have been. Meaning it was called thousands of times when it should only have been called a few times! Actually fixing that made my game run in O(n) time instead of O(n^2) time with n being the number of objects. So that was a good fix!
     
  7. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,187
  8. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,776
    Simply, go with Jobs, Burst and venture even further into ECS.
     
  9. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Yes, I'll have to learn these things. I've heard them but don't really know how they work. Also, not sure that's suited to one-off things like procedural mesh generation. I thought those things are more for code that is repeated lots of times like swarms of bees and things.
     
    Last edited: Feb 16, 2019
  10. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,532
    I remember back in the day when C++ was considered the slow behemoth and people would code in assembly to get that edge.

    ...

    I think you should code in assembly!
     
  11. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Well, I am starting to miss the "simplicity" of C++. Unity and c# is like a black-box sometimes where you just try things out to see if it works. Whereas making a game with C++ and OpenGL you can see most of what's going on. Mind you, more likely have your game crash in c++ with memory errors.
     
  12. MD_Reptile

    MD_Reptile

    Joined:
    Jan 19, 2012
    Posts:
    2,664
    I'm wondering if you have been doing your profiling in-editor or with builds? I often forget to build out and profile against a standalone, but its an important step!
     
  13. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,776
    Any reason, why you are not using Unreal instead Unity?

    People use ECS to generate meshes as well.
    You probably could make smart mesh generation using multi threading.
     
  14. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,776
    Nah that is behemoth already. Should use machine code directly. :) Definitely.
     
  15. SparrowGS

    SparrowGS

    Joined:
    Apr 6, 2017
    Posts:
    2,536
    01000011 01101111 01100100 01100101 00100000 01101001 01101110 00100000 01100010 01101001 01101110 01100001 01110010 01111001 00100000 01001001 00100000 01110011 01100001 01111001 00101110
     
    Antypodish likes this.
  16. SparrowGS

    SparrowGS

    Joined:
    Apr 6, 2017
    Posts:
    2,536
    Try using a hashset.
    Adding stuff to a list will cost more the bigger the list is iirc?
     
  17. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,776
    I like this online MMORPG example.
     
    DouglasPotesta likes this.
  18. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Yes, because my new laptop can't handle the Unreal editor. Plus it's too complicated for my brain.
     
  19. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,776
    But apparently UE4 blueprint is more accessible.
    However, I asked, because Unreal is on C++.
     
  20. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,401
    Lists are just backed by arrays anyway, so of course it's possible. Just do what lists do.

    Anyway the answer to the question of "should I switch..." is nearly always no. Do things in a smarter way instead.

    --Eric
     
  21. Gladyon

    Gladyon

    Joined:
    Sep 10, 2015
    Posts:
    389
    You have to analyze the profiler results before taking any aciton.
    For example, if you look into the code of the 'Add()' function:
    https://referencesource.microsoft.c...s/generic/list.cs,9cc11588bffd57c1,references
    You'll see that, as expected, it does nearly nothing (a check, a jump, 2 increments and an affectation).
    So, you will have a lot of trouble optimizing it, using an array instead of a list would remove a check, a jump and an increment, and of course a call to a function.
    If the code you have next to the 'Add()' is way larger than these instructions, then optimizing the 'Add()' won't probably change a lot of things.
    Maybe the problem is that you add too many things and you need to come up with a lighter algorithm, maybe the problem is that your list keeps increasing its size (in which case you should instantiate it with a larger initial capacity), maybe it's something else.
     
    Kiwasi likes this.
  22. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    ECS.

    And: it's not really the language but the technique holding you back. Doing it in C++ most likely isn't going to speed it up for you. In fact you probably could speed this up with regular C# - Why not publish all your code and let experienced C# devs look at it?

    But please by all means do it in C++ so you can prove that it's C# not your code design holding it back.

    This will only help you in DOTS (ECS/Jobs/Burst). It's much more likely that the solution you invented is the problem. Not language or library.

    People keep saying it to you but you are not interested in listening. You clearly stated the bottleneck is in Add() and you want to "solve" it by using an incompatible mathematics library?

    Again, just post the whole code for the generation of it, so people can test it, and optimise it. If you are worried about copyright, simply attach a license to it. At minimum, people will then be able to actually advise you instead of guessing.
     
    lordofduct, Ryiah, Antypodish and 2 others like this.
  23. Suddoha

    Suddoha

    Joined:
    Nov 9, 2013
    Posts:
    2,824
    In order to benefit from C++ you'd need to offload huge chunks of work that makes interop calls negigible, and that leverages instructions that are not as easy to get hands on in C#, otherwise you can almost always stay on the C# side.

    But as people have already pointed out, you'll be able to keep up very well when you leverage the new job system and all the relevant stuff, and the best is, that it's already well integrated, and common pitfalls are usually taken care of by the engine that you'd otherwise possibly introduce with your own native plugins.

    Since you also specifically adressed the replacement of lists with arrays, one must say that the standard .NET implementation of collections are general-purpose implementations, so are STL types in C++.

    You clearly have the option to implement your own list type that speeds up many operations.
     
    MD_Reptile, Ryiah and hippocoder like this.
  24. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Hmmm... it's probably just I'm overestimating how much a 4GHz CPU can do in 1/30th of a second. If that includes updating graphics i probably only leaves about 10 million CPU cycles per frame.
    Then if I need to procedurally create 8 meshes with 2000 triangles each. That gives about 600 CPU cycles per vertex to work with. So theoretically that should be plenty. I must be doing something wrong. Well I'll update you if I find what it is! (There's too many lines of code to post here... which might be part of the problem I guess)
    I think I'll just work on the rest of the game and deal with optimising at the end.

    I mean probably calling Add() 26,000 times is not going to do anyone any good.

    So for example in one case, I'm dividing a mesh in two. So I create two lists, go through each vertex in the list and then Add() it to on of the two lists depending on it's position. I'm using lists because I don't know the final size of the arrays. Then those lists get garbage collected. Can you suggest a better way of doing that? I think it's pretty bad the way I'm doing it right now.

    I think I should probably not create a new list each time but call "clear" on lists instead.
     
    Last edited: Feb 18, 2019
  25. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,776
    Simply generate list enough big to hold data. Would need to be over sized.
    Or instead of adding and expanding list by few, expand list every 1000 empty elements. Pick minimum permitted size of the list, and define min size at initialisation. Reuse lists, don't delete / truncate / shrink them. That are just few of many options. Instead of add, us index, to set element, you are will need separate length index per list. Clear would be simply set index to 0. You don't need clear/,remove every list element. Think array way, as been indicated.
     
    yoonitee likes this.
  26. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,401
    Again, lists are backed by arrays. It would be very helpful if people using lists would read up on how they actually work, in order to avoid doing things that are counter-productive.

    Anyway, 8 meshes with 2000 triangles is trivial, and you can handle orders of magnitude more than that in C# without any issues. As I said above, it's not the language, it's doing things in a sensible and/or efficient way.

    --Eric
     
  27. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Well here are my stats in a particular function using the deep profiler:

    Add() 26000 times 4.6% 5.5ms
    get_item 27000 times 2.5% 3ms
    set_item 9000 times 1ms

    GetComponent I've tried to avoid a lot but it's one of the functions that executes slowly in the editor but fast in the build.

    I mean it's not terrible but 5ms here and another 5ms and it's really got to squeeze into under 20ms to create meshes on the fly without noticing a jolt.
     
  28. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,187
    List<T> doubles capacity every time it reaches the point it needs to expand. Since default capacity is 4 (once you've added the first element to it), it would only need to expand thirteen times to reach the point it can handle Add() 26,000 times.

    https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs
     
    IgorAherne and Antypodish like this.
  29. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Yep, but if it has to do an array copy 13 times that's not good for frame rate if you can help not having it. IDK. Probably creating oversized static arrays is my best option and reusing them. It'll probably turn out that this wasn't the bottleneck anyway.
     
    Last edited: Feb 18, 2019
  30. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,187
    Just ran some tests in Unity 2018.2.15. Add() without pre-sizing the array was 1.78ms. Add() with the capacity pre-set to 32,000 was 1.56ms.

    Code (csharp):
    1. public class Foo : MonoBehaviour {
    2.     List<Vector3> vertices = new List<Vector3>();
    3.  
    4.     void Start () {
    5.         vertices.Capacity = 32000;
    6.         for (int index = 0; index < 26000; index++)
    7.             vertices.Add(new Vector3(0.0f, 0.0f, 0.0f));
    8.     }
    9. }
     
    yoonitee likes this.
  31. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,776
    Thx, that is useful. I didn't know, capacity doubles.

    From line 49
    But main importance is, reusing existing collection, instead regenerating new every frame.
    Also, OP mentioned multiple meshes.

    Hence I took assumption, of multiple collections handling, rather than single one.
    I don't know, if all are same size, or different sizes.

    But if hard size is 26k for even each of 8 meshes, that easily can initialize 8 collections before hand. Either array, or list etc. We have plenty of memory this days. At least we reduce copy cycles.
     
  32. SparrowGS

    SparrowGS

    Joined:
    Apr 6, 2017
    Posts:
    2,536
    I think your test also includes the Vector3 constructor in the ms?
     
  33. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,776
    I know GetComponent has been optimized not that long ago. However, instead of get you can simply cache reference.
     
  34. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Another bottle neck is the following. I have a component MyComponent A for example which I'm returning from some function.
    And all I'm doing is this:
    Code (CSharp):
    1. if(A!=null){
    2. //do something with A
    3. }
    And just the check if it's null is huge bottleneck for some reason. Probably can work around this but it's kind of strange it takes so long just to check if A is null the profiler is calling about 4 sub-functions on this.
     
  35. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,187
    I'm lazy and likely doing a number of things wrong with my tests. I only ran each test once instead of averaging them too and all of the results vary a good deal.
     
  36. SparrowGS

    SparrowGS

    Joined:
    Apr 6, 2017
    Posts:
    2,536
    Well yeah, If both tests had this it doesn't really matter now does it, just pointing it out.

    If a null check is causing you a huge bottleneck something is way wrong, does your CPU have like 3 hertz?
     
  37. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    ha, no it's 4Ghz. But it's the way Unity works for some reason. I think != is overloaded when it comes to components and does something crazy.
     
  38. SparrowGS

    SparrowGS

    Joined:
    Apr 6, 2017
    Posts:
    2,536
    yoonitee likes this.
  39. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Well it is in a for loop called about a 1000 times. But even so, I'm surprised it's this that is slowing things down. But it's just one of those annoyances to work round I guess. As there doesn't seem a way to un-overload it.

    OK, well I've dealt with that one by rewriting it as:

    Code (CSharp):
    1. if( !((object)A==null)){
    2.  
    3. }
    hard to say if that's any quicker. But no additional functions are appearing in the profiler so that's good. It's because I'm setting it to null myself so it's OK to do this.

    Again, I think deep profiler probably just adds extra ms for every function call that aren't there.
     
    Last edited: Feb 18, 2019
  40. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,776
    It is nice to now things sure. But we all learn things in action I suppose? We do takes things for granted however, what Unity / C# gives us.

    Yet, it could be quite counter productive, reading every single method before-hand, trying understand, what is actually happen behind the scene. Until we hit bottle necks, like in this thread, or focus on performance as priority. Then we dig deeper.

    Interestingly
    List<T> Class
    https://docs.microsoft.com/en-us/do...1?redirectedfrom=MSDN&view=netframework-4.7.2

    don't even mention, about doubling the capacity on main page. Unless I have missed out.
    So this information, which is not conveyed by default.
    And this is only one small tiny aspect of programming, among million other things :D
     
    yoonitee likes this.
  41. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,187
    You didn't miss anything. Only reason I knew to look for where it was mentioned was that it's a common trait across multiple languages for that collection. C++'s vector is the equivalent of C#'s List<T> and it doubles capacity too.
     
    Antypodish likes this.
  42. Kiwasi

    Kiwasi

    Joined:
    Dec 5, 2013
    Posts:
    16,860
    And you are doing this every frame? And tossing the list out to the garbage every frame? That will certainly tank any machine.

    You are starting at the wrong end for optimization. You shouldn't be looking at how to make an operation you call 27000 times more efficient. Instead you should be looking at how you can call the operation less times. Start from the top and optimize your game design. Then your timing. Then your algorithms. Then your data structures. And so on. Work your way down step by step. Micro optimizations on individual methods should often be the last thing you touch.
     
    SparrowGS likes this.
  43. Owen-Reynolds

    Owen-Reynolds

    Joined:
    Feb 15, 2012
    Posts:
    1,998
    A first year computer science course explains the math. Suppose you insert 1000 items, and start with a capacity of 1. Copying items into doulbe-sized arrays each time gives total copies of: 1+2+4+8 ...+512. The magic of base-2 means you coped 1024 items while adding 1000 items. Pretty cool. We say add-to-end has amortized (average) cost of O(1).

    And I think this makes the point about switching to C++. Maybe C++ runs fully 3 times faster than C#. But C++ is hard to be good with if your first and only language is C#. It would be easier and a bigger speed-up to learn more data structure and algorithm tricks.
     
  44. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    It's kind of the logical thing to do as well I suppose.
    Not every frame, just when new geometry has to be built as you move about. (Like how No Man's Sky works). Maybe once every 2 seconds. Mostly it doesn't work too bad. Just you see it jolt a bit when calculating new geometry.

    Well yes, that's what I'm trying to do. I mean if I have to take a list of random vertices and split them into two groups. There's not really anyway to do this other than go through each one-by-one (not that I can think of). And if there's a lot of vertices it takes a lot of time. So it really does come down to the fastest way I can do that one particular thing. (which is essentially dynamic adaptive tesselation). Unless..... maybe if I stored the vertices of the mesh in a k-d tree....

    One thing I'm doing is taking a patch of geometry which can be any mesh. Subdividing the triangles and then splitting it into 4 sub-patches. So the geometry increases as you get close to it. (Like in Kerbals Space program) If it was in 1D it would be easier because points could be sorted from left to right. But it's a bit trickier in 3D. Maybe there's a better algorithm than what I'm doing.... I don't think doing a sort on the vertices would be any quicker.

    That's just what has to be done for this particular game. I could make another game just not this one! Basically I've bit off more than I can chew.
     
    Last edited: Feb 18, 2019
  45. tiancaiwrk

    tiancaiwrk

    Joined:
    Nov 23, 2017
    Posts:
    35
    if you get a heavy caculation problem, you can try compute shader for some case
     
  46. SparrowGS

    SparrowGS

    Joined:
    Apr 6, 2017
    Posts:
    2,536
    Did you consider running your mesh calculations in a separate thread?

    Or as some has pointed out, the new job system. (I can't say myself, didn't try it yet)
     
    Kiwasi likes this.
  47. tiancaiwrk

    tiancaiwrk

    Joined:
    Nov 23, 2017
    Posts:
    35
    you have large verts, you should not use new list every frame, by adding item to list, the list size of its array will be copied many times , I think you can make a array pool and the size was pow of 2
    Example: this is a part of code, Pseudo code
    Code (CSharp):
    1. public class ArrayAllocator<T>
    2. {
    3. protected readonly Dictionary<int, Queue<T[]>> _freeList = new Dictionary<int, Queue<T[]>>();     // free dict
    4.  
    5.     public virtual T[] Allocate(int size, bool forcePowerof2 = true)
    6.         {
    7.             T[] freeBuffer = GetArray(forcePowerof2 ? MathLib.Math.NextPowerOfTwo(size) : size);
    8.             return freeBuffer;
    9.         }
    10.  
    11.         protected T[] GetArray(int size)
    12.         {
    13.             if(size >= 0)
    14.             {
    15.                 var queue = _freeList.TryGetValue(size);
    16.                 return (queue != null && queue.Count > 0) ? queue.Dequeue() : new T[size];  // buffer allocated
    17.             }
    18.             return null;
    19.         }
    20.  
    21.         public virtual void DeAllocate(T[] data, bool forcePowerof2 = true)
    22.         {
    23.             if(data == null)
    24.             {
    25.                 return;
    26.             }
    27.             int size = forcePowerof2 ? MathLib.Math.FloorPowerOfTwo(data.Length) : data.Length;
    28.             var freeList = _freeList.GetValue(size);
    29.             freeList.Enqueue(data);
    30.         }
    31.  
    32. }
     
    yoonitee likes this.
  48. Kiwasi

    Kiwasi

    Joined:
    Dec 5, 2013
    Posts:
    16,860
    Just throw it on another thread. That should solve 99% of your problems.
     
    Joe-Censored likes this.
  49. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Hmmm... you could be right. That's probably the best thing to do. But I'm sure I'll run into lots of confusing difficulties. Such as: I make geometry on the thread but it doesn't actually exist until a few frames later. And I'd better not put too much stuff on the queue or it will stack up. Well, I might give it a go. I could assign a mesh as "awaitingUpdate" and then put the mesh calculations on another thread.

    I assume most gamers will have multi-core CPUs these days.

    Probably the most sensible thing to do. Even though it hurts my brain.

    Guess I'll have to watch a few videos about the Unity Job System.
     
    Last edited: Feb 19, 2019
  50. Joe-Censored

    Joe-Censored

    Joined:
    Mar 26, 2013
    Posts:
    11,847
    According to Steam's January hardware survey, less than 1% of their users have only a single CPU core. Close to 30% have 2 cores, and over half of their users have 4 cores. The rest have more than 4 cores.
     
    Ryiah, yoonitee and Antypodish like this.