Search Unity

  1. Improved Prefab workflow (includes Nested Prefabs!), 2D isometric Tilemap and more! Get the 2018.3 Beta now.
    Dismiss Notice
  2. The Unity Pro & Visual Studio Professional Bundle gives you the tools you need to develop faster & collaborate more efficiently. Learn more.
    Dismiss Notice
  3. Improve your Unity skills with a certified instructor in a private, interactive classroom. Watch the overview now.
    Dismiss Notice
  4. Want to see the most recent patch releases? Take a peek at the patch release page.
    Dismiss Notice

Long jobs and job priority

Discussion in 'Entity Component System and C# Job system' started by Aithoneku, Sep 13, 2018.

  1. Aithoneku

    Aithoneku

    Joined:
    Dec 16, 2013
    Posts:
    64
    I hope it's OK to touch two topics in this case - main reason for that is that they're IMHO tightly coupled.

    I converted some heavy calculations of a project to jobs. Problem is, these jobs take really long time (cca 100 - 1000ms) which causes following problems:
    • Log warning "Internal: JobTempAlloc has allocations that are more than 4 frames old - this is not allowed and likely a leak" - this is printed for any job (including ones allocating nothing) if it lasts longer than 4 frames. This is really common in the project I'm working on.
    • Higher-priority, shorter jobs are blocked, because all worker threads are working on long-time jobs (the project generates lots of them).
    And finally, some of these jobs actually are not that important as others. Yes, I can prioritize them by sorting them before scheduling; but what about jobs which need to be scheduled later? (Example: prepare 50 jobs, sort by priority, schedule. A few frames later: 10 of them are done. I have 20 new jobs, half of them have higher priority than half of the already scheduled ones.)

    So, is there a way how to prioritize jobs? Is there a way how to reserve some worker threads for shorter, higher-priority jobs?

    Note: the calculations which needs to be done are already separated into lots of jobs; what I'm trying to say, this is not case of "one single huge blob job doing everything".
     
  2. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    282
    What the case if you need a lot of separated jobs? Is just IJob or it's IJobParallelFor or IJobParallelForBatch?
     
  3. Aithoneku

    Aithoneku

    Joined:
    Dec 16, 2013
    Posts:
    64
    For now, it's IJob. It could be distributed using IJobParallelFor, but I don't see how it would help, because of dependencies. The current system works like this: j0 - j1 - j2 - j3 where j(i) is one job. j(i) depends on j(i-1) (except j0, obviously). The only one which can be distributed using IJobParallelFor is j2. The j3 depends on all sub-jobs of j2. Another thing is that this whole job string (j0 - j1 - j2 - j3) represents one "bigger task" and I have lots "job-strings" like that.

    Edit: So even if j0 to j2 were separated into lots of smaller jobs, j3, depending on the previous one, would wait most of the time (up to 1000ms) until previous ones are finished, spawning the warning log.

    So using IJobParallelFor would make one job shorter, yes, but it would not help in any way in prioritizing and with problem that it's blocking other, higher-priority jobs.
     
  4. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    354
    I have a similar problem which can't be solved with IJobParallelFor. I'm researching on IJobParallelForBatch but it's not officially documented, and the page linked here points to an non-existent page, I haven't found anything about it on the web yet... could you please paste a link to some documentation/example.
     
  5. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    282
    Disagree. If your jobs has large loops then IJobParallelFor (With Burst of course) give you HUGE execution speed and your job chain executes very fast. Anyway without some code snippets and core idea for why you use this chain, I can't tell more.
     
  6. Aithoneku

    Aithoneku

    Joined:
    Dec 16, 2013
    Posts:
    64
    If I understand the job system correctly, the main difference between IJobParallelFor and IJobParallelForBatch is only that in case of IJobParallelFor, the execute method means "calculate i-th item" which in case of IJobParallelForBatch it means "execute from i-th to j-th item". That can be useful when one item is processed very quickly (so overhead for calling Execute is significant) or when one iteration of your algorithm needs to work on multiple items at a time.

    In other words, I don't see how either of them helps with the problem. Or, if there is a way how one of them helps, then I believe the other would help, too, in same way.
     
  7. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    354
    Thanks for the quick reply... but if that's the case, I think neither of them can help with my problem.
     
  8. Aithoneku

    Aithoneku

    Joined:
    Dec 16, 2013
    Posts:
    64
    I can try that. Actually, I'm going to anyway. But it's going to take a lots of time and I simple don't believe that it would give me that big execution speed (like from 1000ms to 50ms). That's why I created this thread - to find out, whether it's worth trying or whether to leave job system.

    Everything is about processing data. At some point, I receive data A. I want to transform them to data B. It's very computationally expensive. Jobs j0 and j1 pre-process the A. j2 does the main work. j3 post-process the data preparing them for integration on main thread. Time to do j0 to j2 takes lots of time.

    Each job process the data in little bit different domain (doesn't make sense to implement them all in one job).

    This is just pseudo-code:

    Code (CSharp):
    1. class App
    2. {
    3.    DataProcessorChain m_dataProcessorChain;
    4.  
    5.    // Called at the initialization time
    6.    void InitializeSystem()
    7.    {
    8.        // prepare instances of DataProcess a, b, c and d
    9.        
    10.        m_dataProcessorChain = new DataProcessorChain ();
    11.        m_dataProcessorChain.AttachDataProcessor(a);
    12.        m_dataProcessorChain.AttachDataProcessor(b);
    13.        m_dataProcessorChain.AttachDataProcessor(c);
    14.        m_dataProcessorChain.AttachDataProcessor(d);
    15.    }
    16.    
    17.    // Updates running system
    18.    void UpdateSystem()
    19.    {
    20.        var newlyReceivedData = FetchData();
    21.        if(newlyReceivedData != null)
    22.        {
    23.            foreach(DataBatch data in newlyReceivedData)
    24.            {
    25.                m_dataProcessorChain.ScheduleProcessingData(data);
    26.            }
    27.        }
    28.        
    29.        m_dataProcessorChain.IntegrateOnMainThread();
    30.    }
    31.    
    32.    List<DataBatch> FetchData()
    33.    {
    34.        // Receives data to process or return null if there are none
    35.    }
    36.    
    37.    void IntegrateData(DataBatch processedData)
    38.    {
    39.        // After the data were processed, here they are integrated in main thread
    40.    }
    41. }
    42.  
    43. class DataProcessorChain
    44. {
    45.    // List of data which are being processed
    46.    List<ProcessedData> m_waitingForDataProcessing = new List<ProcessedData>();
    47.    
    48.    
    49.    void ScheduleProcessingData(DataBatch data)
    50.    {
    51.        JobHandle jobHandle = default(JobHandle);
    52.        
    53.        foreach(DataProcessor dataProcessor in m_dataProcessors) // m_dataProcessors contains a, b, c and d
    54.        {
    55.            jobHandle = dataProcessor.ScheduleProcessingData(data, jobHandle);
    56.        }
    57.        
    58.        m_waitingForDataProcessing.Add(new ProcessedData(data, jobHandle));
    59.    }
    60.    
    61.    void IntegrateOnMainThread()
    62.    {
    63.        for(int i = 0; i < m_waitingForDataProcessing.Count; )
    64.        {
    65.            ProcessedData processedData = m_waitingForDataProcessing[i];
    66.            if(processedData.jobHandle.IsCompleted)
    67.            {
    68.                m_waitingForDataProcessing.RemoveAt(i);
    69.                
    70.                // In order to make native containers available on main thread
    71.                processedData.jobHandle.Complete();
    72.                
    73.                // Takes the processed data and integrates with the rest of the game
    74.                app.IntegrateData(processedData.data);
    75.            }
    76.            else
    77.            {
    78.                ++i;
    79.            }
    80.        }
    81.    }
    82. }
    83.  
    84. // This is actually abstract class and a, b, c and d are instances of different classes deriving from the main
    85. // base class implementing different steps of processing the data.
    86. // This pseudo-code just shows what each of them does in general.
    87. class DataProcessor
    88. {
    89.    JobHandle ScheduleProcessingData(DataBatch data, JobHandle dependency)
    90.    {
    91.        Job job = new Job()
    92.        {
    93.            m_fieldA = data.m_fieldA, // input
    94.            m_fieldB = data.m_fieldB, // output
    95.        };
    96.        
    97.        return job.Schedule(dependency);
    98.    }
    99.    
    100.    struct Job : IJob
    101.    {
    102.        [ReadOnly]
    103.        NativeContainer m_fieldA;
    104.        
    105.        [WriteOnly]
    106.        NativeContainer m_fieldB;
    107.        
    108.        
    109.        void Execute()
    110.        {
    111.            // Transform data from m_fieldA to m_fieldB
    112.        }
    113.    }
    114. }
     
  9. Aithoneku

    Aithoneku

    Joined:
    Dec 16, 2013
    Posts:
    64
    Please, beware that I'm not an expert on jobs. What I wrote is just how I understand the situation, don't take it as absolute truth.
     
  10. julian-moschuering

    julian-moschuering

    Joined:
    Apr 15, 2014
    Posts:
    140
    You could make sure that your Jobs are only using a specific number of worker thread by manually adding dependencies between the different long jobs and thus making one or more long-job-chains. The downside is, that even when there are no short Jobs the other CPU cores will not work on the long Jobs.

    As the worker thread count is fixed to the number of cores and IJob can not be interrupted to prevent context switches a good solution using the Job framework is probably not possible. For these kind of tasks I would suggest spawning actual Threads although I'm pretty sure Unity will present a solution for this in the future.
     
  11. julian-moschuering

    julian-moschuering

    Joined:
    Apr 15, 2014
    Posts:
    140
  12. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    282
    A terrible waste of a jobs, why for each data unit create 4 jobs? Why don't get all feched data to one native array, put this array in to m_dataProcessorChain.ScheduleProcessingData(data); without foreach, and after all just create 4 parallel jobs (for each processor)
     
  13. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    354
    That would help with my problem, but to make it optimal, the number of jobs has be based on the number of cores your processor has, not fixed in 4, do you know if there's a way to know the number of cores/processors available?
     
  14. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    282
    No. It’s parallel job it automatically splits on all available worker threads and free resources (base on batch count per thread)
     
  15. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    354
    I understand and I guess 4 is a safe assumption, but if you had 8 cores and you only scheduled 4 jobs you will be missing the chance to get some extra performance.
     
  16. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    282
    Absolutley not, as i see you not understand. I say 4 because topic starter have 4 different data processors with different logic. If one parallel job (IJobParallelFor) can do all job he must be one, and more parallel jobs absolutley don’t give performance gains, on the contrary, there will be a loss on the scheduling of the jobs. Repeat again - IJobParallelFor splits on ALL avaliable worker threads (if resources on this WT free)
     
  17. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    354
    May be I didn't read with enough care. sorry.
     
  18. Aithoneku

    Aithoneku

    Joined:
    Dec 16, 2013
    Posts:
    64
    Thanks for the suggestions. But it seems to me that burst optimizations won't compensate penalties caused by that.

    I see, so they're going to fix that. That's a good news.

    But my problem is that the warning spawn even when I do allocate nothing. Just empty job with loop doing nothing and long enough to take more than 4 frames triggers the warning. That's what I was trying to say with "this is printed for any job (including ones allocating nothing) if it lasts longer than 4 frames" in original post.

    But as I wrote above; if they're going to fix that, it's OK.

    Jobs j0 to j3 on one DataBatch represent one task which must be done. These tasks have different priorities and I need some of them to be finished sooner then the others. If I should put all the data together, I would have to wait not ~1s for first batch to be done, but ~20s when all of them are finished. Another problem is that integration to main thread is expensive. The most expensive part is creating colliders so it's separated into several frames. So the current flow is:
    • Schedule batch #1, batch #2, #3, etc. to ~#20
    • After ~1s when batch #1 is finished, fetch data (jobHandle.Complete())
    • Now, each frame the data are partially integrated (one collider created + lot's of different things done)
    • So at this moment, batch #1 is being integrated in main Unity thread while batch #2 is being processed by job system
    • When the batch is integrated, user can see the results and interact with it. It took ~2s.
    • Soon, batch #2 is completed and it's integration begins while batch #3 is being processed by job system.
    But with flow you suggest, it would be:
    • Schedule batches #1 - #20 in one single job string (j0 - j3)
    • Wait ~20s until last job (j3) is completed
    • Now, begin integrate the data.
    • So at this moment, batch #1 is being integrated in main Unity thread while job system does nothing.
    • When first batch is integrated, user can see the results and interact with it. It took ~21s.
    And I would like to remind that result of j(i) depends on j(i-1) (except j0). So, for example, j1 a j2 cannot run parallel for one data batch.
     
  19. Aithoneku

    Aithoneku

    Joined:
    Dec 16, 2013
    Posts:
    64
    You don't need to think like that. Loot at the Execute method of IJobParallelFor; it takes an index. Now look at the Schedule method; it takes number of iterations (array length) and batch size (innerloop batch count). Now, what does it mean:

    • You have, let's say, 100 items you want to process.
    • Create one (!) job (deriving/implementing IJobParallelFor) and when scheduling, set array length parameter to your item count (100).
    • Now Unity job system will call Execute method 100 times with index (parameter of Execute method) different each time: 0, 1, 2, ..., 99.
    • Each call can be done from different working thread! Unity will split the range into different threads, so you don't need to care about core count, you don't need to care about job worker thread count, you don't need to schedule multiple jobs.

    Now, what's batch size (innerloop batch count)? It means: when working thread starts processing your job, process N (= batch size) items in one run.

    (Warning: big simplification follows.) So, for example, if the batch size (innerloop batch count) is set to 10, each worker thread will process 10 consecutive items in one loop. So if unity job system have 4 working threads, first thread will call Execute with index 0, 1, 2, ..., 9, second thread will call Execute with 10-19, third 20-29 and last 30-39. When first thread is done (depends on many factors), it starts calling Execute with indices 40-49 and so on and so until all items are processed. If unity job system have 8 working thread, all of them will start processing you job (first with indices 0-9, second 10-19, etc. until last 70-79).

    How to choose the batch size (innerloop batch count)? It's up to you. Main factor is - how long does it take to process one item (how long one Execute(index) will take)? If it's fast, it's good to use big batch size in order to decrease internal overhead (when a worker thread starts processing some range, it must notify other threads so they can work on different range). If processing one item takes longer, it's better to use smaller batch size so all items can be distributed across all threads.
     
  20. julian-moschuering

    julian-moschuering

    Joined:
    Apr 15, 2014
    Posts:
    140
    This should give you the actual number of threads used for jobs:
    Unity.Jobs.LowLevel.Unsafe.JobsUtility.MaxJobThreadCount

    Simple processor count is available through System.Environment.ProcessorCount. But this includes HT 'cores'.
     
  21. Zuntatos

    Zuntatos

    Joined:
    Nov 18, 2012
    Posts:
    469
    I'm probably going to run into a similar problem by the time I'll be converting my game to use the jobs system.

    A long running queue of jobs, with priority possibly changing over time. Specific example: loading/generating/saving terrain and creating AI-nav/render meshes for said terrain (all custom C# code). It takes a dozen or more seconds to load up the entire terrain, but if the player moves significantly during this time, I want to change priority on jobs so that closer areas are loaded first.

    Parts of this code will benefits immensely if I can convert it to use the burst compiler - specifically generating the terrain and the rendering meshes is rather math heavy. But I have my doubts that I can use the jobs system, due to the varying priority and long queue of jobs.

    The jobs system seems to be tailored towards game loop related computation, but various tasks like loading terrain is not strictly tied to the game loop yet reasonably easy to make multi threaded.

    I don't really see a reasonable way to shoehorn these background tasks into the jobs system without messing with the game loop related things (animation, physics, culling etc).

    A good inbetween would be to allow using the burst compiler on non-job methods - of course with the same or possibly more restrictions applied. Then we can get the speedup associated with it but use custom scheduling / priority.

    I'm not a compiler/language writer so maybe I am forgetting some gotcha's which prevent using the burst compiler on other methods. It just seems awesome to me to use the burst compiler for some specific math heavy code without using the job scheduling system - since I've already got such a system in place.
     
    Aithoneku likes this.
  22. Aithoneku

    Aithoneku

    Joined:
    Dec 16, 2013
    Posts:
    64
    Very nicely written. However, I read somewhere (sorry, I don't have the link atm, I'll try to find it) that burst compiler is quite tightly coupled with job system and AFAIK Unity doesn't plan supporting it outside the job system.

    Btw. if you already have some own job-system implemented, this (managed code in job system) might help you while converting it to Unity C# job system if you encounter similar problem I did (from your description it seems likely).
     
  23. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    354
    I had a similar problem on Unity 5 (before the Job system), I solved it by queuing terrain chunks into a thread simulator (made with a co-routine) if the priority changed all I had to do was to place the "urgent" job at the beginning of the queue instead of at the end, I guess you're doing something similar. I don't know if such thing is possible with the Job System, probably not, but honestly speaking I haven't made the time to think about it.

    I'm just starting to learn all this stuff, but for what I've read, as long as you use the ECS you can benefit from the bust compiler even if you don't use the job system, please, correct me if I'm wrong. On the other hand, I don't think there's a way to change priorities on the already scheduled jobs.

    Not sure about this, I understood the Burst compiler is tightly coupled with the ECS, don't know if I misunderstanding something. It would be nice if you could remember the link and post it here.

    I understand the way the IJobParallelFor works. I only was trying to figure out how to implement jobs on a hypothetical case, in which you couldn't use IJobParallelFor for some reason. I thought it was my case, but fortunately later I learned it wasn't, and now I find hard to think of a situation in which IJobParallelFor couldn't be used.
     
  24. Aithoneku

    Aithoneku

    Joined:
    Dec 16, 2013
    Posts:
    64
    That's what this thread is about. We don't know about such option, but we need it.

    ECS uses job system to work. So using ECS means using jobs (in background). So when using ECS, you benefit from burst not because of ECS, but because of job system under ECS.

    In that case, I guess it depends on the properties of such case. But in general, I would not take core/thread count into the account. Just create jobs logically - depending on the context, on the problem being solved. But that's just my very general, very uneducated opinion.
     
  25. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    354
    Agree.

    Thanks for the clarification

    I really don't know how the Job System works internally, but in my experience outside Unity, when I have full control of the threads, the maximum performance is achieved when the number of concurrent jobs equals n-1, where n is the number of procesos cores. So if I have 4 cores I would only have 3 worker threads, if I had 8 cores I would have 7 worker threads, and so on. For what I see on the profiler, it seems to me Unity is using the same strategy. That's why I think, in the hypothetical case you couldn't use IJobParallelFor for some reason, and you still wanted to split the work in several chunks, knowing the number of cores would help you make a better decision.
     
  26. M_R

    M_R

    Joined:
    Apr 15, 2015
    Posts:
    215
    you can keep your own "job queue" and only schedule one job at a time:

    Code (pseudocode):
    1. OnUpdate() {
    2.     if (jobHandle.IsCompleted) {
    3.         jobHandle.Complete();
    4.         if (queue.Count > 0) jobHandle = queue.Dequeue().Schedule();
    5.         DoMainThreadWork();
    6.     }
    7. }
    you can then reorder your queue as you like
     
  27. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    354
    Thank you!

    I Assume SystemInfo.processorCount does exactly the same
     
  28. Aithoneku

    Aithoneku

    Joined:
    Dec 16, 2013
    Posts:
    64
    Couldn't find it, but instead I found another thread, which says the opposite so either they changed their minds or I remember it wrongly. In any case, I was wrong in this.
    I understand that, but I still disagree. But I think this is not that crucial thing and Julian answered with "Unity.Jobs.LowLevel.Unsafe.JobsUtility.MaxJobThreadCount" (I didn't know about that). (Btw. don't use "System.Environment.ProcessorCount" for that purpose, because - as Julian already wrote - it can contain number of hyper-threads which can be twice the physical core count. Plus the Unity's value can change in future - for example they might conclude that MaxJobThreadCount can equal to n and not n-1. Or opposite, on high-core-count system they might choose to use only n-2 threads to avoid context switching caused by other processes and operating system itself. Or in case they reserve one thread for something completely different - like audio/video streaming. Or - how I would love to see that - for "starting" additionally loaded scene in order to avoid/decrease freezes when adding scene chunks.)
     
  29. Zuntatos

    Zuntatos

    Joined:
    Nov 18, 2012
    Posts:
    469
    Yeah something like that is what I'm currently planning. I won't try it until 2018 LTS is out for a while though. I may just end up queuing up to say ~0.1 second worth of jobs, possibly depending on eachother to prevent them loading all the worker threads (so that any unity-queued jobs will not have to wait for my jobs).
     
  30. Aithoneku

    Aithoneku

    Joined:
    Dec 16, 2013
    Posts:
    64
    Yes, this is definitely possible, but IMHO it creates quite huge performance penalty, because you won't use full CPU power. Either you schedule one such job at a time and you have power of only one core (plus penalty of delay between ending one job and starting the other), or you schedule n-1 job (where n is worker job count) and then all other jobs have power of only one core. Either way one set of jobs won't use CPU power as much as it could. In these cases, the Burst optimization would have to have a really, really huge effect in order to compensate these problems.

    So for these reasons we're giving up the C# Job System and use standard threads. Should I mark this thread closed?
     
  31. Zuntatos

    Zuntatos

    Joined:
    Nov 18, 2012
    Posts:
    469
    Assuming that burst will properly SIMDify math and do a reasonable translation to native code with it's additional guarantees: You get approximately 2-3x improvement from unity3d mono to c++ (if I recall older benchmarks correctly) and an additional 2-4x depending on how well your code works with SIMD - so plausibly math heavy code will be 4-12x faster.

    Of course there's also the option of doing it manually (compiling rust/c/c++/d into a lib), but that's a lot more involved than tagging a mathy method with an attribute.
     
  32. Aithoneku

    Aithoneku

    Joined:
    Dec 16, 2013
    Posts:
    64
    Note: this answer is meant as just discussion for interest. As I already wrote, we gave up using job system, the integration is just too time consuming.

    (This is meant really just of out curiosity) do you have some links and/or benchmarks? I couldn't find much with quick Googling now, but what I remember from past, I didn't saw reports with such huge improvements. Of course I might remember it wrongly and I'm currently just talking about Burst, not about ECS - I'm aware that ECS is capable of bringing really huge performance improvements (like >10×), but that's not just because of more optimized compiler, but also by architecture design.

    Actually... I spent lots of time by integrating job system with existing codebase and I have some experiences from older project with implementing native library in C/C++ and my conclusion is, that - at least in this case - going with native library is much, very, very much (I cannot emphasize how much) easier, simpler and faster (to implement). But on the other hand, I have to admit I don't have experience with direct usage of SIMD, so I cannot claim I'm able to write as optimized and fast code as possible in C/C++.

    Converting current code to C# job is definitely - at least for the current project I'm working on - not just matter of "tagging a mathy method with an attribute". Quite the opposite, it increase code complexity and decrease readability and makes debugging harder.

    (Side, little bit more personal, note: I'm actually really curious how much speed improvement could Burst compiler bring to the problem I'm currently working on. I would like to try and continue with the integration just to see it - even if it would be thrown away at the end - but as you're probably aware, in real life, we don't have time to make everything and must be picky how we spend the time we have. In this case, it's time to stop trying.)
     
  33. Zuntatos

    Zuntatos

    Joined:
    Nov 18, 2012
    Posts:
    469
    Big sidenote to start with: using proper algorithms is always step 2, after profiling. If you're iterating over a linked list of 10k+ randomly spread in memory, going to c++ won't save you. Changing algorithms to be more cache friendly & changing data formats so you can do operations on whole bunches of data at once is vastly more efficient, and can be anything from a 2x to 500x improvement.

    A benchmark to take with a grain of salt: a janky small raytracing test ported around: https://aras-p.info/blog/2018/03/28/Daily-Pathtracer-Part-3-CSharp-Unity-Burst/

    with these numbers:
    164 Mray/s - unity3d burst + new math lib
    140 Mray/s - unity3d burst
    136 Mray/s - c++
    67.1 Mray/s - C# .net core
    28.1 Mray/s - unity3d IL2CPP
    13.3 Mray/s - unity3d mono

    Later on he made it a bit less janky, complexer scene, better data structures, and optimized the c++ version with SIMD use: https://aras-p.info/blog/2018/04/16/Daily-Pathtracer-10-Update-CsharpGPU/
    But it seems to be missing il2cpp / mono versions of unity3d. Anyway:
    187 Mray/s - c++ sse / optimized
    82.3 Mray/s - burst
    53 Mray/s - .net core
    xx - unity mono

    It is likely a combination of all the things:
    1) mono often takes ~1.5 - 2.5x longer than 'real' .net, due to a less advanced jit compiler
    2) .net often takes ~1.5 - 2.5x longer than 'native' ahead of time compiled code, due to the more dynamic nature of JIT etc
    3) scalar code often takes ~1.5 - 2.5x longer than SIMD'd code, depending on how math heavy it is. Mono does not seems to vectorize anything, and the simd libraries have not changed performance in my tests.

    These numbers are very handwavey and I can't really find a source for them, but it's the feeling I get from reading performance posts over the years.
     
  34. Zuntatos

    Zuntatos

    Joined:
    Nov 18, 2012
    Posts:
    469
    & to add to that - the trend I've seen is that the better you write your code, the more difference mono vs c++ will make.
     
  35. Aithoneku

    Aithoneku

    Joined:
    Dec 16, 2013
    Posts:
    64
    Thanks for the links and info; it's very interesting.