Search Unity

What is the most performant way of casting a lot of rays

Discussion in 'DOTS Physics' started by Micz84, Nov 3, 2019.

  1. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    369
    I am writing a local avoidance system with ray casts and I thought it will be faster. Here is my job
    Code (CSharp):
    1. [BurstCompile]
    2. unsafe private struct AgentDetectionJob : IJobForEach<Translation, Rotation, NavAgent>
    3.         {
    4.             [ReadOnly]
    5.             public PhysicsWorld physicsWorld;
    6.  
    7.          
    8.             [ReadOnly]
    9.             public float3 forward;
    10.  
    11.             public CollisionFilter filter;
    12.            
    13.             public void Execute([ReadOnly] ref Translation translation, [ReadOnly]ref Rotation rotation, ref NavAgent navAgent)
    14.             {
    15.                 var rayDist = navAgent.moveSpeed * 2;
    16.                 var start = translation.Value + math.mul(rotation.Value, forward * navAgent.radius);
    17.                 RaycastInput raycastInput = new RaycastInput
    18.                 {
    19.                     Start = start,
    20.                     End = translation.Value + math.mul(rotation.Value, forward * rayDist),
    21.                     Filter = filter
    22.                 };
    23.                 var factor =  1f;
    24.                
    25.                 if (physicsWorld.CastRay(raycastInput, out var raycastHit))
    26.                     factor = 0;
    27.                
    28.                 navAgent.obstacleDistanceFactor = factor;
    29.             }
    30.         }
    I have about 7600 agents each performing one ray cast each frame and it takes about 11 ms per thread and I have 8 threads. Here is on update code:
    Code (CSharp):
    1. CollisionFilter filter = CollisionFilter.Default;
    2.             filter.CollidesWith = 1u << 1;
    3.            
    4.             var handle = new AgentDetectionJob()
    5.             {
    6.                 physicsWorld = physicsWorldBuild.PhysicsWorld,
    7.                 forward = new float3(0,0,1),
    8.                 filter = filter,
    9.             }.Schedule(this, inputDeps);
    10.             handle.Complete();
    11.             return handle;
     

    Attached Files:

    • job.PNG
      job.PNG
      File size:
      445.4 KB
      Views:
      284
  2. steveeHavok

    steveeHavok

    Joined:
    Mar 19, 2019
    Posts:
    480
    The handle.Complete() is going to kill you here. Rather than each agent casting a single ray in a job that is completed immediately, it would be better to batch all the rays up, handle all the queries in one system and collect the results in an array that each agent can read from appropriately.
    Most of the Physics samples use a 'Physics Scene Basic Elements' prefab. This contains a RayTraceCamera node that is disabled by default:
    upload_2019-11-4_12-35-1.png
    If you enabled this you should see a raytraced version of the scene in the top right corner e.g.:
    upload_2019-11-4_12-36-15.png
    Check out the RayTrace scripts as a reference for your own system. The RayTraceSystem should be throwing out 10,000 rays per frame and you can check the cost on your own hardware.
     
  3. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    369
    When I do not have this handle.Complete() than it does not work properly, even crushing. Probably I have messed up something. I thought that it will be possible to use raycast command for batched raycasting.
    I will check the example thank you.
     
  4. MNNoxMortem

    MNNoxMortem

    Joined:
    Sep 11, 2016
    Posts:
    685
    @Micz84 - the crash aside - what @steveeHavok likely meant is not that you may not use handle.Complete() but that you have a single job for all your raycasts in one frame. Create one job which gets the input data for all 7600 agents and does the raycasts paralellized and then handle.Complete() this batch job.
     
    steveeHavok likes this.
  5. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,739
    I'm not sure I understand the difference between what you just described and what @Micz84 showed in first post. Isn't the AgentDetectionJob just one job that does parallelized raycasts for all 7600 agents?

    Do you mean that instead of reading translations/rotations, calculating raycast inputs and writing to navAgent, the new job shouldn't do anything except casting rays from a pre-filled array of raycastInputs and storing results, and that's what'll make it much faster?

    What should the new job look like?
     
    Last edited: Nov 5, 2019
    steveeHavok likes this.
  6. steveeHavok

    steveeHavok

    Joined:
    Mar 19, 2019
    Posts:
    480
    Hmm, you might be right there, I need to read the original post a little closer!
    Instead of looping over each block of (input/task/output) logic, we've been tending towards a pattern that collects input information together, performs the task on all the information & recording results into an output buffer, then processes the output data.

    So
    1. foreach(input)=>inputs
    2. foreach(input in inputs)->task(input)=>outputs
    3. foreach(output in outputs)->assign(output)
    rather than
    1. foreach(input)->(task(input)->output, assign(output));

    In refreshing myself with the RayTrace code a little closer, I see it also has a Complete in the mix (which sure be possible to remove it we go back and review it). The Complete is at a more granular level which may be the core difference.
    Step one, for @Micz84 would definitely be to check the raytrace functionality in the samples on local hardware to see how performant it is.
    For me, I need to setup a few benchmark tests to compare raycasting speeds using the different coding patterns.
     
    PhilSA likes this.
  7. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    369
    @steveeHavok I have tested on the character controller sample.
    RayTracerSystem takes about 1.6 ms
    RayTracerJob takes 7 ms total spread across 5 workers on average it takes about 1.4 ms per worker.

    So if it casts 10000 rays each frame then it is more then 10 times faster then my raycasting.
    I am raycasting about 7600 rays and it takes 13 ms and my job is spread across 8 workers so it would be even worse.

    My system:
    Intel i7 7700k
    16 GB ram
    GTX 1080
     

    Attached Files:

    steveeHavok likes this.
  8. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,739
    Do your NavAgent entities have colliders on them? Query performance can get significantly affected by the number of colliders in the world and number of colliders intersecting the maximum range of the cast. So maybe the comparison is not fair
     
    Last edited: Nov 6, 2019
  9. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    369
    You might be right every agent has a capsule collider, rays are quite short about from 3 to 10 meters depending on velocity.
    I will probably make different approach to local avoidance and will not use ray casts at all.
     
  10. steveeHavok

    steveeHavok

    Joined:
    Mar 19, 2019
    Posts:
    480
    Can you drop the RayTrace components into your own project/scene? Then you'd know for sure if its the colliders. With rays that short the number of overlapping agents shouldn't be that bad.
     
  11. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    369
    I have added raytracer to my project. Performance may be related to a number of colliders.
    RayTracerSystem takes about 9.3 ms
    RayTraceJob takes about 40 ms spread across 5 workes (8 ms on average).
    There are more than 10000 agents with capsule colliders and about 8000 static wall colliders.
     

    Attached Files:

  12. MNNoxMortem

    MNNoxMortem

    Joined:
    Sep 11, 2016
    Posts:
    685
    I have had a similar scenario lately. 10k rays with PhysicsWorld.CastRay took ~500ms vs the same Raycast input against the same MeshCollider (exactly same input) took about 9ms with a IJobParallelFor burst compiled job.

    So if you can try to ensure that you separate the raycast calculation from using the results, as the raycasts obviously can be parallelized very well.
     
  13. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    3,311
    Bigger picture just a single raycast like that will fail spectacularly when you get to the actual avoidance logic. It's just not enough information to sort things out and you will get all sorts of game breaking behavior.

    Generic crowd solutions are quite complex at any scale, and are also seldom the entire solution. A bunch of agents all fighting to get to the same point isn't good gameplay.

    However, there are simpler approaches that might be good enough and you can often just make it part of your game design. Sometimes design is the most cost effective way to solve hard problems.

    Like our game has a lot of npc's but it's a few hundred and it's not an rts. We solve this with a simple grid system where agents reserve destination slots and reserve slots N units ahead on the path. We allow agents to cross over each other while moving, which removes one of the harder problems in the puzzle, but it guarantees no conflict at final destination. Together with utilizing grouping to simply avoid having a lot of agents in the same area, it works great for our use case.

    The main problem to solve from a gameplay perspective is what do you do when an agent can't get to where they want to be. You can just let them all fight it out algorithmically, which is a horrible solution from a gameplay standpoint. Or you can attack it from other angles, like change what the agent wants.
     
    Radu392 and SenseEater like this.
  14. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,739
    Can you elaborate on this? Is this what your test was?:

    Test 1:
    • 10k rays
    • in [BurstCompile] IJobForEach
    • 500ms
    Test 2:
    • 10k rays
    • in [BurstCompile] IJobParallelFor
    • 9ms
    Are you sure you didn't just forget the [BurstCompile] in Test 1?
     
  15. MNNoxMortem

    MNNoxMortem

    Joined:
    Sep 11, 2016
    Posts:
    685
    @PhilSA No, the two scenarios are something different:
    Test 1
    10k IJobParalellFor with an input length of 1 (10k Complete() - zero parallelization) : 500ms

    Test 2
    1 IJobParalellFor with input length 10k and 1 Complete() with a batchsize of 64: 9ms

    Technically they were the very same job - just different input parameters.

    Both in editor with Safety Checks, so for sure no meaningful performance benchmark. I merely wanted to visualize how Complete() kills any gains if you do no actual paralellization :) (as stevee has pointed out).
     
    PhilSA likes this.
  16. M_R

    M_R

    Joined:
    Apr 15, 2015
    Posts:
    554
    parallelization can speedup by at most your number of cores (and I don't think your machine has 55 cores)
    what kills the performance in that case is scheduling 10k jobs - each one has overhead

    you can measure parallelization by doing 1 IJob (manually looping) vs 1 IJobParallelFor, or varying the batch count (1 vs 10k)
    you can also try IJobParallelForBatch (but if burst properly inlines the execute method it may not matter)
     
    MNNoxMortem likes this.
unityunity