Search Unity

Resolved Run RaycastCommand on multipleThreads?

Discussion in 'Entity Component System' started by Pr0x1d, Jun 10, 2021.

  1. Pr0x1d

    Pr0x1d

    Joined:
    Mar 29, 2014
    Posts:
    46
    Hi,

    I am making custom lighting solution just for research mainly but might use it in live projects, but I came to a
    slow down, I wanted to raycast faster so I changed to RaycastCommand, but it does not improve performance.

    It is same or slower than just using loop with Physics.Raycast, it does not seem to split onto other threads but instead be on main thread one by one. Is there a way to put it onto other threads?

    Update:
    Fixed by Using IJobParallelFor to create RaycastCommands and then using that handle as a Dependency in RaycastCommand.ScheduleBatch



     
    Last edited: Jun 11, 2021
  2. RecursiveEclipse

    RecursiveEclipse

    Joined:
    Sep 6, 2018
    Posts:
    298
    If you call .Complete immediately it tends to run on one thread, try to feed the dependency into another job if you can.
     
    Last edited: Jun 10, 2021
  3. RecursiveEclipse

    RecursiveEclipse

    Joined:
    Sep 6, 2018
    Posts:
    298
    Also the third parameter is 'minCommandsPerJob', setting this to the same length as your array doesn't let the job divide the array. It will give each thread that number of indices to start unless it runs out.
     
    Last edited: Jun 10, 2021
  4. Pr0x1d

    Pr0x1d

    Joined:
    Mar 29, 2014
    Posts:
    46
    I tried that and didnt seem to work, but I found solution where I had to prepare my RaycastCommands in one Job, use that as dependency for ScheduleBatch and that worked. After that I am now doing more of the logic but this is now fixed.

    Also I have to have my 'minCommandsPerJob' at 15 as in the Job I am iterating 15 times, if I give any other number it just breakes mutlithreading.
     
  5. RecursiveEclipse

    RecursiveEclipse

    Joined:
    Sep 6, 2018
    Posts:
    298
    I may be missing something, or I misinterpret "minCommandsPerJob" to be equal to "innerLoopBatchCount" in IJobParallelFor, but you want lower numbers than what your array length is. If you have 4 threads and 16 items, a parameter of 4 would give each thread 4 to work with, anything >= 16 would only give the work to 1 thread.

    But if you only have 15 raycasts it might actually be slower no matter what due to job scheduling overhead. With RaycastCommand you're kinda limited what you can make a job after the batch, because non DOTS colliders break the jobs system.
     
  6. Pr0x1d

    Pr0x1d

    Joined:
    Mar 29, 2014
    Posts:
    46
    It is 15 raycasts per probe, but I got it working and can go above but does not seem to change performace when giving more 'minCommandsPerJob' currently I have around 16000 probes, they are occluded so around 12000 active * 15 Raycasts so its quite the number
     
  7. RecursiveEclipse

    RecursiveEclipse

    Joined:
    Sep 6, 2018
    Posts:
    298
    So are you in a loop that isn't included in the photos? If thats the case then you could make arrays that are "probe count * 15" in length, when you access the results you can use an offset(should be "probe index * 15"). Then you can bump up the minCommandsPerJob.

    If you're in a loop you're paying the Schedule + Allocation + Complete() cost for each probe.
     
    Last edited: Jun 11, 2021
  8. Pr0x1d

    Pr0x1d

    Joined:
    Mar 29, 2014
    Posts:
    46
    Yes I am in loop and I am doing index * 15, first 14 are directions and 15th is Sun Direction
     
  9. FUTC

    FUTC

    Joined:
    Aug 28, 2020
    Posts:
    6
    I am having the same issue, I am working on simulating a CT scan in Unity and I need to speed up about ~9 million raycasts. I tried to implement a solution with IJobParallelFor and RaycastCommand but my knowledge of Unity jobs is very limited and I definitely goofed something up, The whole calculation still only runs on one of my cpu cores and is now way slower than just sequentially calling Physics.Raycast. Could you post your working implementation that managed to run split into several threads? Unfortunately there seems to be very little information on this problem out there.
     
  10. Pr0x1d

    Pr0x1d

    Joined:
    Mar 29, 2014
    Posts:
    46
    Hi, do not have that setup anymore but basically what you wanna do is:
    1. Create NativeArray for the rays of set size
    2. Create IJobParallelFor Job that fills this NativeArray with rays data like position and direction
    3. Schedule this job into JobHandle
    4. Use this JobHandle inside RaycastCommand.ScheduleBatch as Dependency handle
    This will force Job system to run RaycastCommand in ParallelFor as first Job is IJobParallelFor Job.
     
  11. FUTC

    FUTC

    Joined:
    Aug 28, 2020
    Posts:
    6
    Thanks for the reply, I believe I have implemented these steps though I am not entirely sure if it is actually working.
    Here is a simplified version of my code that I made for testing.
    Code (CSharp):
    1.     [BurstCompile]
    2.     struct SetupJob : IJobParallelFor
    3.     {
    4.         public NativeArray<RaycastCommand> Commands;
    5.         public Vector3 Origin;
    6.         public NativeArray<Vector3> DetectorPoints;
    7.         public void Execute(int index)
    8.         {
    9.             Commands[index] = new RaycastCommand(Origin, DetectorPoints[index]);
    10.         }
    11.     }
    12.  
    Code (CSharp):
    1.  
    2. void CalcIrradiationLengthAsyncSingle()
    3. {
    4.         var results = new NativeArray<RaycastHit>(4000000, Allocator.TempJob);
    5.         var commands = new NativeArray<RaycastCommand>(4000000, Allocator.TempJob);
    6.    
    7.         //getting directions for the rays to be cast in
    8.         NativeArray<Vector3> directions = new NativeArray<Vector3>(commands.Length, Allocator.TempJob);
    9.         for (int i = 0; i < 4000000; i++)
    10.         {
    11.             directions[i] = xrays[i].GetDetectorPoint();
    12.             origins[i] = Vector3.zero;
    13.         }
    14.  
    15.         //set up a new job for the raycast commands to be filles
    16.         var setupJob = new SetupJob()
    17.         {
    18.             Commands = commands,
    19.             Origin = Vector3.zero,
    20.             DetectorPoints = directions
    21.         };
    22.    
    23.         JobHandle deps = setupJob.Schedule(commands.Length, 1, default(JobHandle));
    24.    
    25.         deps = RaycastCommand.ScheduleBatch(commands, results, 1, deps);
    26.    
    27.         deps.Complete();
    28.  
    29.         //evaluating, saving the results
    30.         var hits = 0;
    31.         for (int i = 0; i < 4000000; i++)
    32.         {
    33.             if (!results[i].transform.name.Equals("Detektor"))
    34.             {
    35.                 xrays[i].AddSection(results[i].point);
    36.                 hits++;
    37.             }
    38.         }
    39. }
    40.  
    Sadly I'm not sure if this is working correctly, I did try to look into the unity profiler and it says pretty much all workers are idle, though this code blocks the main thread so no frames are being updated and this also blocks the profiler. Did I correctly follow your description? How can I make sure it is now actually running in parallel?

    Edit: As a test I just casted the same amount of rays sequentially (simply in a for loop) and it takes about the same time (around 9 seconds) so my attempt at parallelizing this probably isn't working.
     
    Last edited: Dec 17, 2021
  12. Pr0x1d

    Pr0x1d

    Joined:
    Mar 29, 2014
    Posts:
    46
    Hi,

    I tried out your code with small modifications so I can acutally run it and it works fine for me.

    So what I found out is make sure you have Use Job Threads enabled
    upload_2021-12-17_18-32-44.png
    This will allow Jobs to use Multiple threads not only Main Thread

    Secondly you have [BurstCompile] tag so make sure you have Burst/Enable Compilation enabled
    upload_2021-12-17_18-34-44.png

    This enables compilation of [BurstCompile] tags across the all scripts that have this tag in it

    Here is the modified code, did not touch the jobs or scheduling

    Code (CSharp):
    1.  
    2. using System.Collections.Generic;
    3. using Unity.Burst;
    4. using Unity.Collections;
    5. using Unity.Jobs;
    6. using UnityEngine;
    7.  
    8. public class RayCastTest : MonoBehaviour
    9. {
    10.     List<Vector3> dirs = new List<Vector3>();
    11.     List<Vector3> outPoints = new List<Vector3>();
    12.  
    13.     //Fill my directions
    14.     private void Awake()
    15.     {
    16.         for (int i = 0; i < 40000; i++)
    17.         {
    18.             dirs.Add(new Vector3(Random.Range(0f, 1f), Random.Range(0f, 1f), Random.Range(0f, 1f)));
    19.         }
    20.     }
    21.  
    22.     //Call it everyframe
    23.     public void Update()
    24.     {
    25.         CalcIrradiationLengthAsyncSingle();
    26.     }
    27.  
    28.     [BurstCompile]
    29.     struct SetupJob : IJobParallelFor
    30.     {
    31.         public NativeArray<RaycastCommand> Commands;
    32.         public Vector3 Origin;
    33.         public NativeArray<Vector3> DetectorPoints;
    34.         public void Execute(int index)
    35.         {
    36.             Commands[index] = new RaycastCommand(Origin, DetectorPoints[index]);
    37.         }
    38.     }
    39.  
    40.     void CalcIrradiationLengthAsyncSingle()
    41.     {
    42.         var results = new NativeArray<RaycastHit>(40000, Allocator.TempJob);
    43.         var commands = new NativeArray<RaycastCommand>(40000, Allocator.TempJob);
    44.  
    45.         //getting directions for the rays to be cast in
    46.         NativeArray<Vector3> directions = new NativeArray<Vector3>(commands.Length, Allocator.TempJob);
    47.         for (int i = 0; i < 40000; i++)
    48.         {
    49.             directions[i] = dirs[i];
    50.             //origins[i] = Vector3.zero; // Not Used?
    51.         }
    52.  
    53.         //set up a new job for the raycast commands to be filles
    54.         var setupJob = new SetupJob()
    55.         {
    56.             Commands = commands,
    57.             Origin = Vector3.zero,
    58.             DetectorPoints = directions
    59.         };
    60.  
    61.         JobHandle deps = setupJob.Schedule(commands.Length, 1, default(JobHandle));
    62.  
    63.         deps = RaycastCommand.ScheduleBatch(commands, results, 1, deps);
    64.  
    65.         deps.Complete();
    66.  
    67.         //Modified (Tons of GC Alloc)
    68.         //evaluating, saving the results
    69.         var hits = 0;
    70.         for (int i = 0; i < 40000; i++)
    71.         {
    72.             if (results[i].transform == null)
    73.                 continue;
    74.             if (!results[i].transform.name.Equals("Detektor"))
    75.             {
    76.                 outPoints.Add(results[i].point);
    77.                 hits++;
    78.             }
    79.         }
    80.     }
    81. }
    82.  
     
  13. FUTC

    FUTC

    Joined:
    Aug 28, 2020
    Posts:
    6
    Thank you so much for your help, you really are a godsend in this situation.
    I didn't know I had to install a preview version of the jobs system through the unity package manager. I did this and enabled "use job threads". Performance of a test case I built which casts 8 million rays successively a couple of times seems about 50% faster now but one problem still persists. I have been monitoring my CPU cores while running my simulation and have observed some strange behavior (this behavior stayed the same after enabling job threads).
    While running the simulation I can see spikes of cpu usage on all of my cores simultaneously, it goes down again for a few seconds and then it spikes again on all cores at the same time. Here is a screenshot of 4 CPU threads in task manager.
    Screenshot 2021-12-18 000356.png
    As you can see the spikes are quite apparent, moving from idle to about 30% usage on all cores and then moving to idle again a second later. This is how all of my 12 CPU cores/24 threads behave when I run the simulation. For me this looks like unity is only running a very short section of the workload in multiple threads and the rest then runs on a single core.

    Screenshot 2021-12-18 000940.png
    As you can see in this second screenshot that right between the first two spikes on the lower 3 cores, the top core is pegged at 100%. The other activity on the top core is just other applications running pretty sure. But this consistently looks like very short bursts (around a second) are running multithreaded and then the heavy lifting is being done by a single thread for several seconds. I would like to minimize the compute time as much as possible because I need to apply this to casting hundreds of millions of rays in batches of a couple million each, so ideally I would like to see unity not only using 20-30% of each core at once, but spending as much time multithreaded as possible and then using as close to 100% of each core as possible. I believe I read that unity by default only has access to a couple of cores but I can't remember where I read this or what the suggested solution was. Also this doesn't seem to be the issue as all cores are being used, just in short weak bursts.

    Do you have any idea what could be the cause of this behavior and how I can further optimize the multithreaded behavior?
     
  14. DreamersINC

    DreamersINC

    Joined:
    Mar 4, 2015
    Posts:
    131
    IMO, for something like this you should be using the Unity Physics for DOTS Package instead of the standard API. This will allow you to write a single burst ijobchunk for raycasting.
    Additional is this is in a build or in editor? I find the only way to honestly performance test dots is in build.
     
    Pr0x1d likes this.
  15. Pr0x1d

    Pr0x1d

    Joined:
    Mar 29, 2014
    Posts:
    46
    If you wanna get better performance as DreamersINC said use DOTS or go for GPU RayTracing, it seems very unnecesary to do as much rays.

    Here is what I found in Profiler
    upload_2021-12-18_16-24-4.png

    1. Preparation of the ray directions take a lot of time before even jobs start running.

    2. GC allocation is humongous [The red part] as your evaluation part is not multithreaded and is constantly reading data and then writing data. For that either use specialized functions without GC mess or you could make this multithreaded too by doing evaluation on seperate threads and then only read the clean output array.

    3. I hope you did not forget to Dispose of the NativeArray's that you made, you can quickly run into full memory as even when having TempJob you need to dispose of the old data as it will just pile up taking up RAM and Paging space until unity crashes.

    So if you write your code clean you can get 100% of all cores, you just have to put all of the overhead onto those cores.
     
  16. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,769
    @FUTC I suggest you learn how to use profiler, to see the issues with multithread code.
     
    Pr0x1d likes this.
  17. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,327
    I think it's caused by the name string compare, when you nuke that and replace by a collidID int compare it goes away.
    After a couple fixes I get a nice boost on an i7-1280p, thanks everyone for the legwork.
    upload_2022-10-9_11-0-21.png

    Code (CSharp):
    1. using System;
    2. using System.Collections.Generic;
    3. using Unity.Burst;
    4. using Unity.Collections;
    5. using Unity.Jobs;
    6. using Unity.Mathematics;
    7. using UnityEngine;
    8. using Random = UnityEngine.Random;
    9.  
    10. public class RaycastCommand_test : MonoBehaviour
    11. {
    12.     List<Vector3> dirs = new();
    13.     List<Vector3> outPoints = new(40000);
    14.     JobHandle _dependency;
    15.     NativeArray<RaycastHit> _results;
    16.     public bool completeOnLateUpdate;
    17.     public bool vanillaRaycast;
    18.  
    19.     //Fill my directions
    20.     void Awake()
    21.     {
    22.         for (var i = 0; i < 40000; i++) dirs.Add(new Vector3(Random.Range(-100f, 100f), Random.Range(-100f, 100f), Random.Range(-100f, 100f)));
    23.     }
    24.  
    25.     public GameObject prefab;
    26.     public int instanceCount = 1000;
    27.     public float radius = 100;
    28.     NativeArray<RaycastCommand> _commands;
    29.     NativeArray<Vector3> _directions;
    30.  
    31.     void Start()
    32.     {
    33.         for (var i = 0; i < instanceCount; i++) Instantiate(prefab, Random.insideUnitSphere * radius, quaternion.identity);
    34.     }
    35.  
    36.     //Call it everyframe
    37.     public void Update()
    38.     {
    39.         if (vanillaRaycast) RaycastVanilla();
    40.         else
    41.             RaycastJobbified();
    42.     }
    43.  
    44.     void RaycastVanilla()
    45.     {
    46.         outPoints.Clear();
    47.         for (var i = 0; i < 40000; i++)
    48.             if (Physics.Raycast(Random.insideUnitSphere * 100, dirs[i], out var hit))
    49.                 outPoints.Add(hit.point);
    50.     }
    51.  
    52.     [BurstCompile]
    53.     struct SetupCommandJob : IJobParallelFor
    54.     {
    55.         public NativeArray<RaycastCommand> commands;
    56.         [ReadOnly] public NativeArray<Vector3> detectorPoints, origins;
    57.         public void Execute(int index) { commands[index] = new RaycastCommand(origins[index], detectorPoints[index]); }
    58.     }
    59.  
    60.     void RaycastJobbified()
    61.     {
    62.         _results = new NativeArray<RaycastHit>(40000, Allocator.TempJob);
    63.         _commands = new NativeArray<RaycastCommand>(40000, Allocator.TempJob);
    64.         _directions = new NativeArray<Vector3>(_commands.Length, Allocator.TempJob);
    65.         for (var i = 0; i < 40000; i++) _directions[i] = dirs[i];
    66.         //origins[i] = Vector3.zero; // Not Used?
    67.         //set up a new job for the raycast commands to be filled
    68.         var setupCommandsJob = new SetupCommandJob() {commands = _commands, origins = _directions, detectorPoints = _directions};
    69.         _dependency = setupCommandsJob.Schedule(_commands.Length, 1, default);
    70.         _dependency = RaycastCommand.ScheduleBatch(_commands, _results, 1, _dependency);
    71.         if (!completeOnLateUpdate) CompleteThenDoTheRestOfTheWork();
    72.     }
    73.  
    74.     void LateUpdate()
    75.     {
    76.         if (completeOnLateUpdate)
    77.             CompleteThenDoTheRestOfTheWork();
    78.     }
    79.  
    80.     void CompleteThenDoTheRestOfTheWork()
    81.     {
    82.         _dependency.Complete();
    83.         outPoints.Clear();
    84.  
    85.         //evaluating, saving the results
    86.         for (var i = 0; i < 40000; i++)
    87.         {
    88.             // collider ID = 0 means no hit
    89.             if (_results[i].colliderInstanceID == 0)
    90.                 continue;
    91.             outPoints.Add(_results[i].point);
    92.         }
    93.         _commands.Dispose();
    94.         _results.Dispose();
    95.         _directions.Dispose();
    96.     }
    97. }
     

    Attached Files:

    Last edited: Oct 9, 2022
    AngryAndrew likes this.