Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Threads and weird performance

Discussion in 'Scripting' started by Kragh, Nov 23, 2020.

  1. Kragh

    Kragh

    Joined:
    Jan 22, 2008
    Posts:
    656
    EDIT: Solved! Changing the Scripting Backend in player settings from Mono to IL2CPP made not only the threads run crazy faster, but made them report the same times. They actually ran so fast (Close to 0 in time), I figured their content may had been stripped because the "distance calculation" didn't leave the confines of the threads, and so might have just been scrapped in the compilation. So I took the distance calculations and put them to use outside the threads (And reported some results), and now my threads perform at a more sensible times, but still WAY faster than with Mono.
    _______________________________
    Hi there. I am baffled. And I need you to hear me out, though I seem to not be able to be concise...

    So I am working on a rather big system that uses lots of multithreading. And for a couple of days I have been trying to track down some weird performance difference between two different threads that performed VERY differently. I have been screaming and pulling out hairs.
    I put in the exact same workload in both, and one thread finished 4-5 times as fast as the other.
    I tried everything. Changed priority on the threads, and a lot of non sensical stuff I won't even admit to.
    The only difference was a "continue" in the bottom of one of the threads, that would bypass some code in a while loop.
    At some point I simply removed the "continue" and all the code below, and suddenly they behaved the same.
    I tried to reproduce it in a standalone project, and I managed to to do it even without the "continue" part.

    I have attached a package (Made in 2020.1.6f1) with one scene and one script.

    It has a script "ThreadTest.cs" and a scene called "ThreadTest"
    This script has two version of a thread that are identical (Plus some simple structs).
    ThreadA() and ThreadB()
    Only difference in the two threads is a condition before I do some actual work:

    Code (CSharp):
    1. if ((stopWatch.ElapsedTicks - startTicks) / System.Diagnostics.Stopwatch.Frequency > delay)
    2. {
    3. //Some code
    4. }
    vs:

    Code (CSharp):
    1. if ((stopWatch.ElapsedTicks - startTicks) / System.Diagnostics.Stopwatch.Frequency > 0.1)
    2. {
    3. //Some code
    4. }
    Nevermind the actual workload inside this condition, it is just there to make the thread do some numbers, and doesn't make much sense in this example. If you really want to know it calculates closest distance between two lines.

    In both threads I wait 0.1 seconds before calculating. Because reasons, who cares. But in one thread the "0.1" is a static value, in the other it is based on a variable called "delay".

    So here is what you need to do to reproduce:

    Start the project (Play). Select the GameObject "ThreadTest" and see how the two threads perform in the public inspector fields.
    If you get the same result as I, thread A is around 4-5 times faster than thread B.
    On my machine Thread A finishes in around 6 ms. Thread B finishes in around 25 ms.

    Now change the code for Thread B: replace the static "0.1" in the delay condition with the variable "delay" (Which should hold the same value and is already present in the thread).

    Start the project again.
    Now they perform identically.

    I simply don't get it. The actual workload of the thread has nothing to do with that condition. It happens INSIDE the scope of the condition (Four loops inside each other, and some calculation). And is the same in both threads.
    What is going on? Please, can someone make sense of this?
    Is it my way of measuring the time with Stopwatch? But why? I know Stopwatch is not a precise measure, but a 4-5 factor in difference. Really?
    And if you change the number of loops (Change the value of the variable "lines" in both threads) the difference is the same, just with higher times. Just be careful, changing the number scales up the amount of work rather quickly...!

    I need to understand this, because it has HUGE implications in performance. I need to either understand what I do wrong, or why thread performance would ever be affected by such a minor detail.
     

    Attached Files:

    Last edited: Nov 27, 2020
  2. Joe-Censored

    Joe-Censored

    Joined:
    Mar 26, 2013
    Posts:
    11,847
    Edit: reread your code, I misunderstood, rewrote my post

    If you want your thread to wait 0.1 seconds, you should just call this instead:
    Code (csharp):
    1. Thread.Sleep(100);
    That way your thread isn't just pegging the CPU core at 100% while it is repeatedly checking for a timer to expire. Instead it will just wake up and resume after 0.1 seconds.

    Something I think is a problem in your code though, is you are dividing 2 longs, which should result in a long. Then you are comparing that result to whatever type "delay" is (float?), and in the second version comparing it to type "double". My guess is it probably won't actually start until 1 second has passed, since the division will result in 0 until it results in 1. There is no decimal point in a "long".

    You might have a slight difference in when each starts processing too if converting long to float is different than the amount of time converting long to double for the comparisons. Not sure. It might also come down to running both threads at 100% CPU repeatedly checking for a timer to expire, causing a scheduling issue for your OS if you don't have any idle CPU cores available.
     
    Last edited: Nov 23, 2020
  3. Kragh

    Kragh

    Joined:
    Jan 22, 2008
    Posts:
    656
    Thx for your reply :)
    1) Yes, there is a lot of chaos of types here, and it is a mess. But it is the same mess across both threads. Just tried to change the types to longs, same times, same differences.
    Remember this is just a test, not actual code. The delay here has no function other than to showcase the problem. So using "Sleep" would... not show the problem ;)

    2) delay is a double. So both threads compare to a double.
    As for it waiting "1 second": No. You are right about long being with no decimal, and as such it can only divide to nearest long, but this is "ticks" I am working with, which has millions per second. And again, same "mess" in both threads, doesn't explain the huge difference between them.
    And also, I first measure the start time after this messy condition delay has been met. For both threads.

    3) Again, I first start measuring time after the condition has been met, no matter how messy it is in terms of value chaos.
    I have 18 cores on this CPU, so I am sure one is available. And the results are completely consistent, no matter what is going on on the computer.

    I thank you for your suggestions, but it would be even nicer if you actually tried out the package I attached :)
     
  4. Kragh

    Kragh

    Joined:
    Jan 22, 2008
    Posts:
    656
    EDIT: BEWARE, updated Thread's code from the package attached in first post. If you want to test with the package, you should put this in instead...

    Ok, here's the code for the two threads. Only difference between them is ThreadB has a clumsy condition wrapped around it, that delays it for 0 seconds. This should be even more clear in terms of the strangeness.

    ThreadA will finish in 2.8 seconds
    ThreadB will finish in 0.6 seconds. No difference in the actual algorithm, only in the condition wrapping ThreadBs workload, that really does nothing (The condition will be true on first inspection in the while loop).
    And no, this is not due to measuring errors on the Stopwatch, with so many loops I can physically see the difference in when the threads finishes and delivers their result.
    Again, this is just test code to showcase the problem I experience in much more complex code (Which I have then destilled to this), so no "But why would you ever have a while loop that terminates instantly in ThreadA?" or "Why are you comparing the same two lines over and over again?"

    Code (CSharp):
    1. void ThreadA ()
    2.     {
    3.         while (runThreadA)
    4.         {
    5.             runThreadA = false;
    6.             double preTicks = stopWatch.ElapsedTicks;
    7.  
    8.             Line3Double lineA = new Line3Double(new Vector3DoublePrecision(10, 20, 30), new Vector3DoublePrecision(100, 140, 180));
    9.             Line3Double lineB = new Line3Double(new Vector3DoublePrecision(-10, -20, -30), new Vector3DoublePrecision(-100, -140, -180));
    10.  
    11.             int lines = 1000;
    12.  
    13.             for (int i = 0; i < 8; i++)
    14.             {
    15.                 for (int j = 0; j < lines; j++)
    16.                 {
    17.                     double aStartX = lineA.startX;
    18.                     double aStartY = lineA.startY;
    19.                     double aStartZ = lineA.startZ;
    20.  
    21.                     double aEndX = lineA.endX;
    22.                     double aEndY = lineA.endY;
    23.                     double aEndZ = lineA.endZ;
    24.  
    25.                     double aDirX = lineA.dirX;
    26.                     double aDirY = lineA.dirY;
    27.                     double aDirZ = lineA.dirZ;
    28.  
    29.                     double aDotSelf = lineA.dotSelf;
    30.  
    31.                     for (int k = 0; k < 8; k++)
    32.                     {
    33.                         for (int l = 0; l < lines; l++)
    34.                         {
    35.                             double wX = aStartX - lineB.startX;
    36.                             double wY = aStartY - lineB.startY;
    37.                             double wZ = aStartZ - lineB.startZ;
    38.  
    39.                             double b = aDirX * lineB.dirX + aDirY * lineB.dirY + aDirZ * lineB.dirZ;
    40.                             double d = aDirX * wX + aDirY * wY + aDirZ * wZ;
    41.                             double e = lineB.dirX * wX + lineB.dirY * wY + lineB.dirZ * wZ;
    42.  
    43.                             double D = aDotSelf * lineB.dotSelf - b * b;
    44.                             double sc, tc;
    45.                             if (D < 0.0000001)
    46.                             {
    47.                                 // the lines are almost parallel
    48.                                 sc = 0.0f;
    49.                                 tc = (b > lineB.dotSelf ? d / b : e / lineB.dotSelf);
    50.                             }
    51.                             else
    52.                             {
    53.                                 sc = (b * e - lineB.dotSelf * d) / D;
    54.                                 tc = (aDotSelf * e - b * d) / D;
    55.                             }
    56.  
    57.                             double shortestX = wX + (sc * aDirX) - (tc * lineB.dirX);
    58.                             double shortestY = wY + (sc * aDirY) - (tc * lineB.dirY);
    59.                             double shortestZ = wZ + (sc * aDirZ) - (tc * lineB.dirZ);
    60.  
    61.                             double distance = shortestX * shortestX + shortestY * shortestY + shortestZ * shortestZ;
    62.                         }
    63.                     }
    64.                 }
    65.             }
    66.  
    67.             double postTicks = stopWatch.ElapsedTicks;
    68.             double time = ((postTicks - preTicks) / System.Diagnostics.Stopwatch.Frequency) * 1000;
    69.             elapsedTimeA = time;
    70.         }
    71.     }
    72.  
    73.     void ThreadB()
    74.     {
    75.         long startTicks = stopWatch.ElapsedTicks;
    76.         double delay = 0;
    77.  
    78.         while (runThreadB)
    79.         {
    80.             if ((double)(stopWatch.ElapsedTicks - startTicks) / System.Diagnostics.Stopwatch.Frequency > delay)
    81.             {
    82.                 runThreadB = false;
    83.                 double preTicks = stopWatch.ElapsedTicks;
    84.  
    85.                 Line3Double lineA = new Line3Double(new Vector3DoublePrecision(10, 20, 30), new Vector3DoublePrecision(100, 140, 180));
    86.                 Line3Double lineB = new Line3Double(new Vector3DoublePrecision(-10, -20, -30), new Vector3DoublePrecision(-100, -140, -180));
    87.  
    88.                 int lines = 1000;
    89.  
    90.                 for (int i = 0; i < 8; i++)
    91.                 {
    92.                     for (int j = 0; j < lines; j++)
    93.                     {
    94.                         double aStartX = lineA.startX;
    95.                         double aStartY = lineA.startY;
    96.                         double aStartZ = lineA.startZ;
    97.  
    98.                         double aEndX = lineA.endX;
    99.                         double aEndY = lineA.endY;
    100.                         double aEndZ = lineA.endZ;
    101.  
    102.                         double aDirX = lineA.dirX;
    103.                         double aDirY = lineA.dirY;
    104.                         double aDirZ = lineA.dirZ;
    105.  
    106.                         double aDotSelf = lineA.dotSelf;
    107.  
    108.                         for (int k = 0; k < 8; k++)
    109.                         {
    110.                             for (int l = 0; l < lines; l++)
    111.                             {
    112.                                 double wX = aStartX - lineB.startX;
    113.                                 double wY = aStartY - lineB.startY;
    114.                                 double wZ = aStartZ - lineB.startZ;
    115.  
    116.                                 double b = aDirX * lineB.dirX + aDirY * lineB.dirY + aDirZ * lineB.dirZ;
    117.                                 double d = aDirX * wX + aDirY * wY + aDirZ * wZ;
    118.                                 double e = lineB.dirX * wX + lineB.dirY * wY + lineB.dirZ * wZ;
    119.  
    120.                                 double D = aDotSelf * lineB.dotSelf - b * b;
    121.                                 double sc, tc;
    122.                                 if (D < 0.0000001)
    123.                                 {
    124.                                     // the lines are almost parallel
    125.                                     sc = 0.0f;
    126.                                     tc = (b > lineB.dotSelf ? d / b : e / lineB.dotSelf);
    127.                                 }
    128.                                 else
    129.                                 {
    130.                                     sc = (b * e - lineB.dotSelf * d) / D;
    131.                                     tc = (aDotSelf * e - b * d) / D;
    132.                                 }
    133.  
    134.                                 double shortestX = wX + (sc * aDirX) - (tc * lineB.dirX);
    135.                                 double shortestY = wY + (sc * aDirY) - (tc * lineB.dirY);
    136.                                 double shortestZ = wZ + (sc * aDirZ) - (tc * lineB.dirZ);
    137.  
    138.                                 double distance = shortestX * shortestX + shortestY * shortestY + shortestZ * shortestZ;
    139.                             }
    140.                         }
    141.                     }
    142.                 }
    143.  
    144.                 double postTicks = stopWatch.ElapsedTicks;
    145.                 double time = ((postTicks - preTicks) / System.Diagnostics.Stopwatch.Frequency) * 1000;
    146.                 elapsedTimeB = time;
    147.             }
    148.         }
    149.     }
    My biggest concern is I have no idea why one is so much faster than the other. And going forward how am I ever to know whether a given thread is actually performing as good as it should/could? I mean, if one minor stupid detail can change the performance by a factor of 4-5, without even making much logical sense, I feel powerless in terms of optimizing.
    And for this system I need to have it run as fast as possible.
     
    Last edited: Nov 24, 2020
  5. Kragh

    Kragh

    Joined:
    Jan 22, 2008
    Posts:
    656
    The new package with update threads code
     

    Attached Files:

  6. Joe-Censored

    Joe-Censored

    Joined:
    Mar 26, 2013
    Posts:
    11,847
    The fact it is ticks is meaningless. A long is a long is a long, and a long cannot be a value greater than 0 while less than 1. It is either 1 or 0. So the lowest value which is greater than 0.1 for a long is 1. You were doing division in that line to convert it to seconds before the comparison, so the lowest value in seconds which is greater than 0.1 of type long is 1 second. But it looks like you've altered your script.

    It is doubtful anyone is going to download the package for safety reasons.


    But no it probably won't though. Look at the math here.
    Code (CSharp):
    1.  
    2.         long startTicks = stopWatch.ElapsedTicks;
    3.         double delay = 0;
    4.  
    5.         while (runThreadB)
    6.         {
    7.             if ((double)(stopWatch.ElapsedTicks - startTicks) / System.Diagnostics.Stopwatch.Frequency > delay)
    8.             {
    9.  
    So startTicks = ElapsedTicks. You then subtract startTicks from ElapsedTicks. Even though a few lines have gone by, it is very possible the result of that subtraction will be 0 so long as ElapsedTicks has not incremented since you set them equal to each other. You then divide 0 by StopWatch.Frequency, and when you divide 0 by anything the answer is still 0. You then check if 0 is greater than delay, which is also 0. So the check resolves to false, because 0 is not greater than 0.

    Personally I'd get rid of all this ambiguity, comment out this line we're arguing about, and just replace if with "if(true)" and test it. You've got to narrow down the problem here, and you're not making that easier by making untested assumptions about what is happening.
     
  7. Kragh

    Kragh

    Joined:
    Jan 22, 2008
    Posts:
    656
    The problem was that IF I wrote "true" (Which I had obviously tried), the difference in timings disappeared and both threads performed equally bad. BAD!. It is like you have not read my post, and think that it is ThreadB that is the problem. It is not.
    It was like by writing this stupid condition (And it IS stupid, we agree on that, no need to try and fix it!) made my actual code (All the stuff INSIDE the scope of the condition!) perform magically good...
    That can't be explained away with focusing on longs, and ticks and equal signs and all that stuff. It is irrelevant, and I really am a little baffled that you continue to focus on it :D.
    It has no relevance...
    That is why I wanted you to just take a look at the package... And of course one can download a Unity package. It is not like its an .exe file, or that you won't be able to inspect its content.

    But you are right concerning the condition, it will prbably return true on second evaluation (Though I suspect more than one tick will actually elapse by entering the while part), not first.
    But as my "preTicks" is first set after it has evaluated to true, I don't really see this changing the timings.
    And it doesn't really matter what the actual time said anyways, because ThreadA simply just finished way later than threadB, no timing measure needed to "feel" that. It was a factor of 4-5, remember...
    So the problem still just was: "Why does the inclusion of the stupid condition make my thread run at 4.5x the speed of a more "clean" thread?". Which is still just strange.

    Anyways... I have solved it. Not by changing the code. But by changing the Scripting Backend in the player settings.
    Changed it form Mono (JIT) to IL2CPP (AOT), which not only made my threads run CRAZY faster overall, but also made them report the same (Or close to) times.

    So. Solved... I'm glad you took a look. But a little sad you used most of your time on the content of the condition.