Search Unity

Why is Unity Physics.Processing 3x Slower on Xbox One X?

Discussion in 'Windows' started by stonstad, Nov 6, 2018.

  1. stonstad

    stonstad

    Joined:
    Jan 19, 2018
    Posts:
    660
    This is a carry over from an earlier discussion, here. Terrific suggestions were offered on how to analyze and improve console performance. The feedback from the Unity community is greatly appreciated.

    The question for this thread is very specific. It is not a question of how to make Unity physics faster. It's a question of why Unity physics is 3x slower on Xbox compared to a PC with proven inferior computing power.

    A few things to consider:
    • This is not an issue of Id@XBOX access. The difference in compute power has been ruled out below.
    • This is not an issue of being GPU bound or having insufficient fill-rate or bandwidth. It's purely a CPU-bound problem relating to Unity physics.
    • This is not compiler or build settings related. It isn't even compile output related. The exact same x64 binary is used on console as PC (Release, No Debugger, x64).

    First -- allow me to demonstrate to you that an Intel i7-8650u 4 core (1.90 GHz base, 4.2 GHz limited burst) processor is slower than an Xbox One X AMD Jaguar 8 core (2.3 GHz base) processor.



    This graph shows execution time (in seconds) for prime number factorization for a single thread up to twelve concurrent threads. With a single thread the AMD Jaguar and Intel I7-8650u are evenly matched. At two threads the Xbox One X pulls slightly ahead and then it holds this lead for any number of threads, even after diminishing returns are reached. There is no significant performance gain realized beyond four concurrent threads.

    In terms of CPU compute, the Xbox One X AMD Jaguar is equal to or faster than an Intel i7-8650. At 4 concurrent threads Xbox One X is 30% faster. This point can be argued but it doesn't matter -- remember that it is 30% faster or 30% slower -- it doesn't really matter and I'll explain why.

    Now here is a test involving N rigid bodies with no collisions. 10 rigid bodies are added to the scene each frame until the frame rate drops below 50 FPS. If the frame rate once again rises above 50 FPS more rigid bodies are added. Equilibrium is reached and this informs how many non-sleeping rigid bodies may be added until the game becomes essentially CPU-bound.



    About this test. There are no lights, no shadows, no post processing. No collisions either -- just cubes spinning from an initial rigid body torque with zero angular drag. They spin in place indefinitely, get it? There are no static colliders or modification to transforms. GPU utilization is minor -- so we are not GPU bound in any way.



    The slower i7-8650u processor is somehow able to run 200% (3x) more rigid bodies than the XBOX One X at 50 FPS. You're probably ready to see profiler telemetry. It's physics all the way down. CPU-bound physics.

    i7-8650u (16ms average)


    Xbox One X (16ms average)


    Wait, you say? They're the same at approximately 16ms. Yes they are -- except the XBox One X is calculating physics for 1/3 as many bodies as the i7-8650u.

    It's nice that Unity preserves the PhysX method signatures in the call hierarchy. The PhysX methods seem relatively fast, and they are rolled up to Physics.Processing. That method is solely responsible for the 3x decrease in performance on XBox One X. The method has had issues once before.

    Prime Factorization Code (Link)
    Rigid Body Creation Code (Link)

    I can share a zip of the rigid body test project. It uses a Unity store asset for the GPU/CPU statistics, which is why I am not directly linking to it here.

    *edited to add information regarding a Unity test project.
     
    Last edited: Nov 6, 2018
  2. yant

    yant

    Unity Technologies

    Joined:
    Jul 24, 2013
    Posts:
    596
    Wondering if comparing the prime factorization to physics simulation is still apples to apples. Prime factorization is a pure number cruncher, as opposed to physics that uses quite a lot of memory operations. Anything from dcache/icache size to branch prediction specifics would affect that, wouldn't it? I've just spoken to Nvidia representatives and they told me there is actually a minor bug that's been there for a while that forces the solver to compute all islands on one thread, but splitting it up is estimated to gain 30-40% perf only. I'm getting it that Xbox CPUs are sort of expected to perform slower than their desktop "equivalents".
     
  3. stonstad

    stonstad

    Joined:
    Jan 19, 2018
    Posts:
    660
    The performance disparity raises questions. i.e. Within Physics.Processing is there branching for different platforms (PC vs XBox)? How many concurrent threads are allocated to PhysX for physics on PC vs XB1X? How are physics threads pooled, created/destroyed, and locked and what is the optimization point for this behavior on PC vs XB1X?
     
  4. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,680
    PhysX doesn't not distinguish between running on PC vs Xbox on UWP.
     
  5. stonstad

    stonstad

    Joined:
    Jan 19, 2018
    Posts:
    660
    OK. Does Unity Physics.Processing though?

    Separately, it looks like the number of threads used by PhysX are assignable. So getting right to the point, we do not assign a different number of threads based on platform?

    I don't doubt anyone's expertise -- I'm just trying to understand the performance discrepancy. I even annoy myself sometimes.

    I downloaded the PhysX SDK -- and I have some thoughts to running a simulation on both test rigs to see if there is a difference in performance. I don't have access to Unity source -- maybe Physics.Processing is ten lines of code and all very straightforward. No room for platform optimization. :p
     
  6. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Memory bandwidth on xb1 is significantly limited in comparison to desktops. Also you should do the test with no rendering to remove the draw call hit on CPU.
     
  7. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,680
    No. The only part of Engine that does is on screen keyboard code.
     
  8. stonstad

    stonstad

    Joined:
    Jan 19, 2018
    Posts:
    660
    It will be interesting to see how the new Unity built-in physics system performs. I was somewhat disappointed that no mention was given at the keynote regarding whether performance is improved by the change.