Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question Performance Limitation with using multiple C# Task

Discussion in 'Scripting' started by gdosu, Mar 8, 2022.

  1. gdosu

    gdosu

    Joined:
    Aug 24, 2020
    Posts:
    13
    Hello. First of all, sorry for my poor english.
    This question is somewhat theoretical.
    I'm developing a kind of image streaming application.

    This is how it works. image1.png
    1. Get 1000+ local file list. Each file contains compressed 4K JPEG Image.
    2. Sort file list by order and convert to ConcurrentQueue.
    3. Call Task.Run() multiple time. Each Task is looped until all files are converted.
    4. Wait initial delay (about 10s) for "buffer"
    5. Call Update each frame and change texture of RawImage.

    This is simplified code.

    Code (CSharp):
    1. void Awake()
    2. {
    3.     taskList = new List<Task>();
    4.     for (int i = 0; i < numberOfTasks; i++)
    5.         taskList.Add(Task.Run(()=>Work()));
    6. }
    7.  
    8. void Work()
    9. {
    10.     while (FileQueue.TryDequeue(out var item))
    11.     {
    12.         item.Process();    // run pipeline
    13.         ImageQueue.Enqueue(item);
    14.     }
    15. }
    16.  
    17. void Update()
    18. {
    19.     if (ImageQueue.TryDequeue(out var item))
    20.     {
    21.         MyTexture.LoadRawTextureData(item.data)
    22.         MyTexture.Apply();
    23.     }
    24. }

    For now, I could achieve 15~20 FPS and the goal is 30 FPS.
    Before optimizing, I have to write a report on this.

    Here is table & chart for this.
    image2.png
    As you can see, there is a bottleneck in which the performance reached limitation as the number of task increases.
    (Test machine is Mac OS with 24-core Intel Xeon W)

    Regardless improper performance of pipeline3 function, as the number of tasks increases, the time required for all pipeline function increases too.

    I was asked for a "technical explanation" for this.
    I answered about "Parallel Slowdown" abstractly, but It wasn't be enough and I wasn't a good student, so I'm in trouble.

    Please let me know If there are any related articles, manuals, guides, or even keywords about this so I can investigate on.

    Thanks for your time.
     
    Last edited: Mar 8, 2022
  2. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,380
    I'm not sure what exactly you're asking for here.

    I'm assuming this is an assignment. OK...

    With this example above there are a few things that can cause slow down. Things like:

    Disk IO - I'm assuming you're 'Process' method is doing some IO since you refer to 4k images. That's usually something you load from disk. The storage medium (whatever it may be) has bandwidth limitations. Just because you request some data from it, doesn't mean it'll be ready immediately. And if you ask it for 100 files at once, well, those are all going to have to wait their turns.

    ThreadPool size - Task.Run doesn't just spin up new threads willy nilly. Rather it queues the task to be ran on the next available worker thread from the ThreadPool. If one isn't available, it just sits around and waits until one is ready. The number of threads available are bottlenecked not just by how many cores your processor has, but also by memory constraints (each thread in .net allocates a new block of working memory for its stack and other things to work on). This can especially be hampered by if you targeted 32-bit builds over 64-bits. Just cause you run on a 24 core 64-bit processor doesn't mean your build can handle that. 32-bit builds will still be restricted to the same constraints that any 32-bit software is. And of course there's just the physical limitations of your computer. Your OS will spin up N threads, but it still balances that across your cores. And your computer has only so much RAM regardless of build target.

    Unity itself - Unity is itself a wildcard just due to the fact that you're not necessarily running on a traditional .net CLR. Depending the version and target platform all sorts of variables come into play that can become bottlenecks. Though... I doubt this is the topic that your professor is looking to discuss. But who knows... maybe it is? I don't know what your chapter you're on is even about.

    Which actually gets me to my next point... I don't really feel like listing out every possible bottleneck and just highlighted a couple. Also since I have no idea what goes on in Process which is yet another variable.

    So...

    Usually... (and in my opinion, this is the fatal flaw of traditional learning methods like college/school)... usually, the topic/chapter you're currently learning is a BIG hint as to what the professor expects you to identify the problem to be.

    ...

    Think of it this way. Remember how in early Physics classes your teacher/professor always tells that you can ignore friction. The reason being... you're not there yet. You're learning the basics right now and introducing all that hub-bub just over complicates the subject matter.

    Sure... there are other things going on in the system that could be slowing things down (causing friction). But we're not learning about those right now.

    So. Ask yourself.

    What are you learning about right now?

    What topic is it that your professor is expecting you to pick up on right now?

    Cause at the end of the day... that's usually what a class is asking you. It's not asking you to solve real world problems. It's asking you to figure out what your professor expects you to figure out.

    Hence reason 8712 that I dropped out of college.
     
    Last edited: Mar 8, 2022
    gdosu and JoNax97 like this.
  3. gdosu

    gdosu

    Joined:
    Aug 24, 2020
    Posts:
    13
    Last edited: Mar 15, 2022
  4. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    3,528
    Well, there are countless of possible issues. While it's always great to profile and measure the performance, though I can't make much sense of your chart and table ^^. No units on any values :)

    As others have already mentioned on SO file IO will sooner or later bite you. When using an SSD it's probably less of an issue, however the bus has a limited throughput. Also note that you do not have 24 cores, just 24 logical processors. You only have 12 ALUs and FPUs which are shared by 2 logical processors. Read more about hyperthreading on WP. Since your "Work" is mainly number crunching, the doubling of the physical cores does not result in a doubling of performance. Intel claims due to the pipeline parallelization they can speed up the perfotmance by 15% - 30%. However that's a quite generic claim. As you can read on WP it highly depends on the nature of the threads being executed. In some cases it could even have a negative impact.

    Another thing you have to keep in mind is memory allocation and garbage collection. 4k images require a lot of memory, not only in comressed form but especially uncompressed. Your "Work" is currently a black box for us, so we have no idea what you're doing there.

    Have you actually profiled the performance of your Update method? Loading / uploading 4k images onto the GPU is probably not that fast either and has to be done on the main thread.

    At point 4 you said you let it "buffer" for 10 seconds before you start playing. What is the performance when you actually stop all the background threads and just run Update and let it empty the buffer? Does it actually reach a decent framerate?

    Video streaming is a quite complex topic. In a lot cases the actual updating of the output is the real bottleneck. Though it's hard to tell from the information given :)