Search Unity

  1. Check out the Unite LA keynote for updates on the Visual Effect Editor, the FPS Sample, ECS, Unity for Film and more! Watch it now!
    Dismiss Notice
  2. The Unity Pro & Visual Studio Professional Bundle gives you the tools you need to develop faster & collaborate more efficiently. Learn more.
    Dismiss Notice
  3. Improved Prefab workflow (includes Nested Prefabs!), 2D isometric Tilemap and more! Get the 2018.3 Beta now.
    Dismiss Notice
  4. Improve your Unity skills with a certified instructor in a private, interactive classroom. Watch the overview now.
    Dismiss Notice
  5. Want to see the most recent patch releases? Take a peek at the patch release page.
    Dismiss Notice

UnityAsync: Allocation free async coroutines

Discussion in 'Scripting' started by MuckSponge, Jul 18, 2017.

  1. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    UnityAsync is a coroutine framework for Unity built around the async API. This is not only a more efficient and effective replacement of Unity's IEnumerator coroutines, but also seamlessly integrates Unity with .NET 4+ asynchronous APIs.

    With this library you can:
    • Write allocation-free coroutines
    • Seamlessly integrate with Task-based and async APIs
    • Integrate with IEnumerator-based coroutines and YieldInstructions
    • Easily switch sync contexts (main to background and vice-versa)
    • Define your own custom await instructions (allocation free)
    • Return results at the end of your coroutine
    Performance
    Rest assured; UnityAsync coroutines will generally always perform better than Unity's built-in coroutines because:
    • They rarely cause heap allocations
    • They don't weave in and out of native code
    • They don't rely on a monolithic state machine
    Benchmarks with 100,000 simultaneous update loop coroutines show a performance increase of over 150% (that's 2.5x), and this doesn't test against `YieldInstructions`. A test with uncached `WaitForSeconds` using a random number as input yields a performance increase of over 290% (3.9x). The benchmarks included the time it took to instantiate the coroutines.

    The single downside is that if you want to nest multiple coroutines inside each other, you must store a reference to them, which means returning a `Task` object. This causes a 226 byte heap allocation, whereas Unity's coroutines allocate less memory (but they still allocate on the heap). This often isn't a problem because nested coroutines rarely execute in their entirety every frame and these allocations are still tiny.

    Usage
    Let's say we want to replace a pretty straight-forward update loop IEnumerator coroutine:
    Code (csharp):
    1. using UnityEngine;
    2. using System.Collections;
    3.  
    4. ...
    5.  
    6. IEnumerator UpdateLoop()
    7. {
    8.    while(true)
    9.        yield return null;
    10. }
    11.  
    12. void Start()
    13. {
    14.    StartCoroutine(UpdateLoop());
    15. }
    16.  
    17. ...
    18.  
    UnityAsync coroutines are defined by async methods, which can return void, `Task`, or `Task<TResult>`:
    Code (csharp):
    1. using UnityEngine;
    2. using UnityAsync;
    3.  
    4. ...
    5.  
    6. async void UpdateLoop()
    7. {
    8.    while(true)
    9.        await new WaitForFrames(1);
    10. }
    11.  
    12. void Start()
    13. {
    14.    UpdateLoop();
    15. }
    16.  
    17. ...
    18.  
    Easy-peasy, right? WaitForFrames is an `IAwaitInstruction`. When you await it, it spawns a `Continuation<T>`, which is (automatically) inserted into a queue and evaluated every frame until it is finished; in this case it will take one frame. We could return Task instead of void if we wanted to store the coroutine.

    GitHub
    There are many more usage examples on GitHub. Grab the code and give it a try! Note that you will need to install Unity's IncrementalCompiler package and use .NET 4.5 API, not .NET 3.5.

    Await instructions / awaitables
    Built-in:
    • WaitForFrames
    • WaitForSeconds
    • WaitForUnscaledSeconds
    • WaitUntil
    • WaitWhile
    Unity:
    • IEnumerator*
    • YieldInstruction*
    • AsyncOperation
    • ResourceRequest**
    Others:
    • Task
    • Task<TResult>
    • …anything that implements GetAwaiter()
    *Will spin up a Unity Coroutine, causing allocations.
    **Very small delegate allocation.

    You can now also implement your own custom IAwaitInstructions. It's super simple and very easy to make allocation-free.
     
    Last edited: Nov 10, 2018
    jjobby, am1105, alexzzzz and 2 others like this.
  2. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    Here's an example of a simple FPS counter:
    Code (CSharp):
    1. async void Start()
    2. {
    3.    const int sampleSize = 50;
    4.  
    5.     while(true)
    6.     {
    7.         float t = Time.time;
    8.         await Await.Updates(sampleSize);
    9.         t = Time.time - t;
    10.  
    11.         Debug.Log($"{(int)(sampleSize / t)} FPS");
    12.     }
    13. }
    One could of course turn this into an actual MonoBehaviour and have the frame sample size modified at run time with minimal modifications.
     
    Last edited: Nov 10, 2018
  3. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    5,921
    How is this allocation free?

    boxing structs requires allocation on the heap.

    objects like Tasks require allocation on the heap.

    ...

    Though overall, yeah, support for the newer C#6 abilities like await is pretty dang awesome. Were still stuck on an older version of unity because of a project we're on keeping us stuck to that version... but in due time we'll definitely be moving forward. Looking forward to it.
     
  4. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    The structs aren't getting boxed. I use static generic classes to provide a similar functionality to interfaces but without the boxing. Every awaiter type essentially has its own continuation collection thanks to generics, so nothing's being cast. This does make extending the API difficult because there are no nice abstract classes to use.

    Yes, tasks require allocation on the heap but we can't do much about it. You only need one if you need to return a result and if that's the case the task will probably take some time and the allocation won't have a comparatively big impact if it's not done every frame.

    EDIT: Looks like starting a coroutine costs more in allocations than a task does (nope, got it the wrong way around!). Seems to be a lot less CPU overhead with creating coroutines though.
     
    Last edited: Jul 21, 2017
  5. BlackPete

    BlackPete

    Joined:
    Nov 16, 2016
    Posts:
    945
    What kind of tests/profiles did you run to determine this?

    I've always had fuzzy generalizations that async/await may have slightly more overhead than a coroutine, but since I never bothered to do a deep dive into that, I couldn't be exactly sure what exactly the tradeoff(s) were.
     
  6. makeshiftwings

    makeshiftwings

    Joined:
    May 28, 2011
    Posts:
    3,291
    Interesting; I hope you keep working on this. I got into an argument with a Unity dev a while ago where they insisted that Coroutines were better in all cases and that they were certain that async would cause way more allocations than Coroutines, despite it being obvious that they had never tried it. Unity dev wanted me to create a framework, run perf tests, and collect data to prove that async could be used instead of Coroutines, which is not my &$^#@ job. ;) I'm glad someone out there is doing it for them though.
     
  7. makeshiftwings

    makeshiftwings

    Joined:
    May 28, 2011
    Posts:
    3,291
    One thing to add - C# 7 adds ValueTask<T> which is an allocation-free version of Task<T>. So that will help with async methods that need to return a value.
     
    lordofduct likes this.
  8. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    I thought I'd try to benchmark a worst case scenario; 10,000 coroutines running simultaneously on the main thread, all yielding execution every frame. This means every single coroutine must enter the state machine once per frame. Unity ends up taking roughly 20ms to process these coroutines each frame, while the equivalent async/await coroutines take roughly 8ms. This test was performed in the editor. I imagine in a standalone build the outcome would be proportionally similar but I will admit that I have not tested this as there could be some optimisation I didn't consider.

    I did notice that Unity's WaitForSeconds yield instruction doesn't appear to cause any per frame overhead so I'm wondering if this uses the equivalent of Invoke(string, float) or something. Mine still enters the state machine every frame to check if the duration has been exceeded.

    That thread was pretty much what inspired me to do this. If you wrap everything in a Task, which is what most people would do, it will cause an allocation. I wanted to work out if I could await things that fundamentally were not Tasks. I examined the amazing work of @alexzzzz here and determined that it could be done.

    I presumed the same thing but I'm not so sure that is how it will work. It looks like a ValueTask<T> is only more effective if the result can be returned synchronously. That means it only makes sense to use it for tasks that won't take very long, under the assumption that they will return their result before OnComplete() is called in the awaiter. A ValueTask<T> has an internal Task<T> field which the awaiter uses if the task isn't finished in time, so it would end up being less efficient in most circumstances.

    The whole Task<T> situation sounds worse than it is. It only causes an allocation when you create the Task, and you only do that when you "start" the coroutine (int result = await Foo(), where Foo() returns a Task<int>). If you await a method like this:

    Code (CSharp):
    1. async void Start()
    2. {
    3.     while(true)
    4.         await OneFrame();
    5. }
    6.  
    7. async Task OneFrame()
    8. {
    9.     await Await.NextUpdate();
    10. }
    It will create a Task every frame, as you can't await a void method (so replacing Task with void will make it not awaitable). This is of course a silly way to do it; see my next post for the correct way to handle an update loop without allocations.
     
    Last edited: Nov 10, 2018
    makeshiftwings likes this.
  9. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    There's no good reason why you'd need to create a Task just to wait one frame. You can simply refactor the code to this:
    Code (CSharp):
    1. async void Start()
    2. {
    3.     while(true)
    4.         await Await.NextUpdate();
    5. }
    And no more Task allocations. This is what happens with Unity's coroutines anyway. As soon as you yield return StartCoroutine you're essentially instantiating Unity's own task-like Coroutine object. At least with async/await the entry point (async void Start in this example) requires no allocations, as it's not a Task and it's not a Coroutine. The only gotcha is you can't await it (awaitable methods must return a Task). Await.NextUpdate() returns an awaitable struct, so it doesn't require a Task or get wrapped in a Task like methods must. By making heavy use of generics we can process masses of these structs without boxing.

    Here are a few allocation metrics:
    yield return StartCoroutine() allocates 56 bytes every time (and takes slightly less frame time)
    await AsyncMethod() allocates 336 bytes every time. (and takes slightly more frame time)

    IEnumerators usually allocate less, while async has less CPU overhead per yield statement. Both are capable of not allocating anything while performing the routine itself.

    TLDR
    If you start coroutines often or nest them frequently, use IEnumerators. If you use them as pseudo update loops, don't like writing out dedicated IEnumerator methods and using StartCoroutine API, or like to do most of your asynchronous logic inline, use async.
     
    Last edited: Nov 10, 2018
    makeshiftwings likes this.
  10. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    784
    Is this still going to be thread-safe in the same ways Coroutines are though?
    Also I'd definitely recommend using an FPS counter as well to judge real-world performance impact instead of relying solely on the profiler and its tendency to induce further performance incursion depending on the task.

    Regardless though, looks like some great work and should be handy!
     
  11. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    Hopefully! I lock the collections used for managing the continuations, and they will only ever process the coninuations on the main thread. I'm not entirely sure where you'll end up after the await though. Probably in Unity's SynchronizationContext which means a little bit of extra work to get back onto the thread pool if awaiting any of these awaiters from another thread. Having said all of this, I haven't tried it so be warned!

    I did have Unity crash on me once, and endured some exceptions on scene changes but I think this was due to the singleton MonoBehaviour that manages all of the continuations being destroyed on scene change, which I've since fixed. I'm not really sure what happens if you destroy a MonoBehaviour in the middle of using async coroutines. I think it just keeps executing until it's finished because the continuations are just Actions not inherently tied to the method or object they came from. IEnumerator-based coroutines get around this weirdness probably by checking to see if the MonoBehaviour is still active at each yield statement and stopping if it has been disabled or destroyed. Because these async coroutines are not tied to a MonoBehaviour, this isn't possible.

    A wise idea!
     
    Last edited: Jul 21, 2017
  12. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    Just a heads up peeps; I'm getting a few Unity crashes when trying to use these for production (still not sure what the cause of these are). It also turns out Unity's technique of coupling coroutines to MonoBehaviours is very useful. I have an async coroutine which changes the alpha of a graphic over time. If I unload the scene while this coroutine is still running, the coroutine keeps going and of course the graphic no longer exists and I get null ref exceptions.

    I might look into modifying the API so that you can optionally pass a MonoBehaviour to your awaiters which will tell them to only invoke the continuation if the behaviour still exists.

    Please let me know if you experience any crashes using my code, and provide samples of the code and context causing the problem so we can work out how to fix it.
     
  13. KelsoMRK

    KelsoMRK

    Joined:
    Jul 18, 2010
    Posts:
    5,446
    If the MonoBehaviour running the coroutine is destroyed as part of the scene change then this shouldn't happen. If you DontDestroyonLoad it then that would be expected behavior if it needs resources that are destroyed.

    Ultimately - if you're just comparing which of a set of methodologies is faster then the Profiler should be your source of truth as the performance incurred from it is irrelevant.
     
  14. BlackPete

    BlackPete

    Joined:
    Nov 16, 2016
    Posts:
    945
    Ah. Yes. Definitely run a test in a standalone build. Editor tests are largely meaningless if you're coming up with performance numbers.

    At some point the programmer will have to get into the practice of cleaning up their awaiters when destroying monobehaviours, because your awaiters may not necessarily be the only awaiters in use. But yes, this is definitely something to keep in mind when using async/await and be sure to check the object's lifetime.

    Even with Unity's coroutines, I still stop them before letting objects be destroyed because I feel like letting coroutines die in some unknown state is a good way to add bugs.
     
    monoky and lordofduct like this.
  15. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    I find that comparing a method's contribution to frame time in the Editor is almost always a good indication of how two different methods compare in proportion to each other. What it won't tell you is the true impact.

    But for the sake of argument I've tested standalone. Average frame time contribution from running 10,000 coroutines:
    If we compare frame rate itself without using a profiler (I had to use 100,000 coroutines here as there's some artificial 120 FPS cap on macOS - setting Application.targetFrameRate doesn't help):
    Overhead can be pretty much ignored in the FPS benchmark because the frame time contribution of the coroutines vastly outweighs that of the engine itself (so removing the engine overhead would barely impact the figures).

    The act of destroying an object responsible for executing the coroutine will cause the coroutine to stop on its next yield. This is because the MonoBehaviour's CoroutinesDelayedCalls method will no longer be invoked and therefore the coroutine will never progress, the reference to it will be lost (as the MonoBehaviour is destroyed) and it will be collected as garbage at some point in the future. Calling StopCoroutine will have the exact same effect but it's more work for you. If the coroutine relies on some other object that gets destroyed you'd have to put a null check in the coroutine, as the object could be destroyed at any point during execution of the routine.

    I'm hosting the code on BitBucket so it should make it easier for people to access the latest copy. Will update first post once I finish (adding MonoBehaviour dependencies and performing some refactors ATM).
     
  16. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    Code is now on BitBucket. Redesigned the implementation a fair bit. Reduced awaiting overhead by cutting down on use of structs. In my previous test where I got 37 FPS I'm now getting over 50. Remember Unity coroutines only managed 17.

    Awaits can be tied to specific game objects by using an AsyncBehaviour component. Like how with Unity's coroutines you can call StartCoroutine on a MonoBehaviour, these awaits can be called from an AsyncBehaviour (derived from MonoBehaviour). This allows you to await on different game objects throughout a single coroutine, though typically your awaiting script would derive from AsyncBehaviour so you'd naturally keep it attached to that instance. If the AsyncBehaviour gets disabled, the coroutine will pause (and can be resumed by enabling the behaviour). If it gets destroyed it will stop.

    To use awaits without an AsyncBehaviour, just call "await Await.NextUpdate()", etc. The Await class is there as a shortcut so you don't need to find an instance of AsyncBehaviour. There's always a single instance present in the scene.
     
    KelsoMRK likes this.
  17. BlackPete

    BlackPete

    Joined:
    Nov 16, 2016
    Posts:
    945
    What I meant is that by stopping the coroutine myself, I'm actively thinking about how it should be stopped gracefully, think about the dependencies, and handle any clean-up that might be necessary.

    For example, if I have nested coroutines going on, I'd need to carefully consider whether I have any coroutines that must finish (like writing a save game file) before stopping it outright. If I allow myself to get into the habit of simply stopping coroutines without caring about running subcoroutines, bugs will happen sooner or later.

    With async/await, I use Task.CancellationToken for this. With coroutines, yeah it's more plumbing, but it's a necessary evil.
     
  18. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    Yeah, totally understand that now :) Hopefully these awaiters will reduce the need for plumbing for situations like this, where one can now just use a CancellationToken as Tasks can nest UnityAsync awaits, which makes them analogous to Unity's coroutines.
     
  19. Roujucad

    Roujucad

    Joined:
    Jul 19, 2017
    Posts:
    2
    Hello !

    I'm very excited by your API, it works fine for me and it's really easy to use.

    But after building a little project, i had a problem with VS : debugging.

    Here is a simple code making a rotation, easy.

    Code (CSharp):
    1.  
    2. using UnityAsync;
    3. using UnityEngine;
    4.  
    5. public class AsyncRotator : MonoBehaviour {
    6.     [SerializeField] float speed = 100f;
    7.     async void Start () {
    8.         var i = 0; // Simple variable to test spy when debugging
    9.         while (true)
    10.         {
    11.             await Await.NextUpdate();
    12.             i++;
    13.             transform.Rotate(Vector3.up, speed * Time.deltaTime, relativeTo: Space.Self);
    14.         }
    15.     }
    16. }
    17.  
    Put a breakpoint in the function, attach and play : you can't inspect variables.
    I found this problem with VS 2017 (15.3.3)

    Do you have the same issue ? ^^
     
  20. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    I'm testing under macOS with an experimental MonoDevelop build and it seems fine. I've had huge issues with Visual Studio for Mac and JetBrains Rider. They either find all the .NET 4.6 references correctly but fail miserably at locating UnityEngine or the other way around. I haven't tried VS under Windows yet. Perhaps try playing first (through editor) and then attach. If you're still having problems I'll give it a try on my desktop.

    EDIT: I can confirm that under MonoDevelop (experimental build), it only works if you start the game before attaching. If you attach before starting it won't work. My first guess would be that VS under Windows has the same behaviour.

    Just a heads up; my AsyncBehaviour implements its own constructor which is bad practice for MonoBehaviours so some time in the near future I'll need to resolve this (probably by using a virtual Awake method) which could slightly change the API. You shouldn't need to make any changes if you're just calling Awaiters.NextUpdate(), etc. as that uses a default, ever-present AsyncBehaviour which you don't need to inherit from.
     
    Last edited: Sep 10, 2017
  21. Roujucad

    Roujucad

    Joined:
    Jul 19, 2017
    Posts:
    2
    Attach&Play and Play&Attach returns the same thing for me with VS ^^'

    With a step by step debugging the type of "this" is interesting (screen) Spy.png
    [breakpoint on the "i++;" line]

    With the $this I can see fields but can't see variables in my function's scope.

    If i use a sync Start, this is "normal" but entering in async function change my this.
    I doesn't have same behaviour with a classic .NetCore application.

    Do you know something about that ?

    And yes it would be nice to change AsyncBehaviour with a virtual Awake but it's not an urgence. I always prefer to use static awaiters, i don't really like to strongly tie my component to a lot of super Monobehaviour (i often use my custom MonoBehaviour) so static Awaiter branched to a single AsyncBehaviour is fine for me although we maybe have to be worry about object's destruction.
     
  22. kkc0923

    kkc0923

    Joined:
    May 23, 2016
    Posts:
    9
    Any news? Does still alive this project?
     
  23. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    Not sure what's going on there. My experience with debugging await/async is pretty minimal.

    I haven't really put any more work into this recently but if you would like to see any improvements just let me know :)
     
  24. alexzzzz

    alexzzzz

    Joined:
    Nov 20, 2010
    Posts:
    1,353
    Recent versions of Unity have ResourceRequest class that is like AsyncOperation (derives from it) but supports returning a result (has additional asset property). It's like Task<T> vs Task.

    Example:
    Code (CSharp):
    1. private void Start()
    2. {
    3.     var request = Resources.LoadAsync<Material>("Materials/MyMaterial");
    4.     request.completed += OnRequestCompleted;
    5. }
    6.  
    7. private void OnRequestCompleted(AsyncOperation asyncOp)
    8. {
    9.     var myMaterial = (Material)((ResourceRequest)asyncOp).asset;
    10.     // ... do stuff with myMaterial …
    11. }
    It's a perfect candidate for awaiting:
    Code (CSharp):
    1. var myMaterial = (Material)await Resources.LoadAsync<Material>("Materials/MyMaterial");
    I wrote an awaiter for ResourceRequest, but it allocates a delegate at line 16:
    Code (CSharp):
    1. public static class ResourceRequestExtensions
    2. {
    3.     public static ResourceRequestAwaiter GetAwaiter(this ResourceRequest request) => new ResourceRequestAwaiter(request);
    4.  
    5.     public struct ResourceRequestAwaiter : INotifyCompletion
    6.     {
    7.         private readonly ResourceRequest request;
    8.  
    9.         public ResourceRequestAwaiter(ResourceRequest request)
    10.         {
    11.             this.request = request;
    12.         }
    13.  
    14.         public UnityEngine.Object GetResult() => request.asset;
    15.         public bool IsCompleted => request.isDone;
    16.         public void OnCompleted(Action action) => request.completed += _ => action();
    17.     }
    18. }
    PS
    *crying out loud*
    It's a shame that ResourceRequest is not generic. Resources.LoadAsync<T> should have been implemented returning ResourceRequest<T> instead of loosely typed ResourceRequest.
     
    Last edited: Jan 15, 2018
  25. kkc0923

    kkc0923

    Joined:
    May 23, 2016
    Posts:
    9
    Recently, I'm investigating how async/await and UnityAsync can improve code-compactness and performance. My project hardly depend on UniRx for asynchronous operations on both Game client and in-house toolset in UnityEditor.

    But I found that UnityAsync only works when isPlaying = true because it depend on Update() of MonoBehaviour does not called isPlaying = false (Edit mode).

    Do you have any plan to support Edit mode ?
    (ex: UniRx has own editor thread dispatcher by using UnityEditor.EditorApplication.update)
     
  26. sleepandpancakes

    sleepandpancakes

    Joined:
    Oct 1, 2016
    Posts:
    5
    do you know how this compares performance/memory-wise as compared to More Efficient Coroutines?

    this looks like a really exciting API. the only thing keeping me from using it is not having a way to easily integrate it with IEnumerators
     
  27. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    This is a good idea and is something I'll look into.

    No but that's something worth testing. I haven't checked the codebase yet but it looks like More Efficient Coroutines follows the conventional monolithic state machine IEnumerator approach and to avoid allocations, the yielded data is a float. My guess would be certain behaviour is encoded into the 4 bytes and the state machine decodes this. Such a system potentially isn't as flexible as UnityAsync because the evaluation of the data is tied up in the state machine. UnityAsync has the following strengths:
    • You can declare the awaitable data in an object-oriented fashion (more so in the new version)
    • You can use it via the async/await API, so it has tight integration with the modern .NET world
    • You can define your own custom awaitables (in the version I'll release soon)
    • Most kinds of awaitable data use their own state machine instead of being pushed through a monolithic one, so this reduces internal complexity to some extent and may positively impact performance
      • Instead of decoding the data and choosing one of n actions to perform (like wait 1 frame, wait n frames, wait n seconds, etc.), it calls data.IsCompleted() and the data object handles the logic.
    I've recently recoded this from scratch to gain a number of benefits including being able to write your own awaitable objects to be used with UnityAsync. I'm moving away from BitBucket and I've been sitting on this for far too long so I really should just release it on GitHub so others can play around with it. The motivation behind it was so that I could implement a tweening system without having to write any of the plumbing that goes with it (the new version of UnityAsync does this for you).

    I'll see if I have some time tonight to release the new version and then I can see how it compares to More Effective Coroutines but the thing to take away from this isn't so much the performance gains, but the more versatile, modern API and the ability to define custom awaitable types which encapsulate their behaviour. In a few hours, I created a tweening system using UnityAsync, and the actual awaitable code occupied just 13 lines (with generous formatting).

    Perhaps you could avoid this delegate allocation by storing the delegate as an instance variable invoking it in IsCompleted
    Code (csharp):
    1. code snipped
    It's been a while since I've messed with custom awaiters though so I might be wrong. I was wrong and now I'm embarrassed. Move along, people!
     
    Last edited: Nov 11, 2018 at 6:00 AM
  28. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    Uploaded the new version to GitHub. Massive improvements to the API - should be much easier to use and extend. This came at a slight performance cost (300% performance increase has been reduced to 150%, but this is still 2.5x as fast as Unity's built-in coroutines, and can still get up 300% under some circumstances).
     
  29. N3uRo

    N3uRo

    Joined:
    Dec 10, 2011
    Posts:
    520
  30. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    More Effective Coroutines perform 525% faster (ouch, my ego!). To be fair, just about the fastest you could possibly get the async API with simple frame yield behaviour is still about 350% slower, so it's never going to be as fast. I just tested with a very simple awaitable struct which just inserts the continuation into an array, which gets invoked next frame. I'll do some profiling - see if I can get UnityAsync to perform a bit faster. The main issue (ignoring the unavoidable async overhead) is it needs to copy a struct roughly 4-5 times per awaitable per frame. Might be able to reduce this to 2 times.
     
    Last edited: Nov 11, 2018 at 7:31 AM
  31. dadude123

    dadude123

    Joined:
    Feb 26, 2014
    Posts:
    692
    Wouldn't it be better to just use UniRx.Async? It's a sub package that can be used alone (without the rest of UniRx)
    https://github.com/neuecc/UniRx

    The neuecc guy has already dealt with tons of other problems and features.

    Or does this implementation do something fundamentally different?

    Has anyone compared the two? Afaik neuecc has been working on this for a long time now
     
  32. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    Just tried UniRx.Async with my benchmark test (recording how long it takes to execute 1000 frames with 100,000 coroutines running simultaneously, skipping the first 2 frames). UniRx.Async takes ~42.8 seconds, UnityAsync takes ~17.3 seconds, MEC takes ~3.3 seconds. UnityAsync is about 150% faster, MEC is about 1200% faster. This is without UniRx's MicroCoroutines (which use the IEnumerator API, not async). I was using UniTask.Yield(), not sure if there is a better alternative.

    EDIT: Redid the benchmarks because of a transposition error. The above numbers are averages over 3 runs.
     
    Last edited: Nov 11, 2018 at 12:13 PM
  33. dadude123

    dadude123

    Joined:
    Feb 26, 2014
    Posts:
    692
    Oh those are some surprising results.
    I wonder what causes such a huge slowdown in UniRx.Async.
     
  34. VergilUa

    VergilUa

    Joined:
    Dec 22, 2014
    Posts:
    851
    Just wondering, can you run same test, but with UniRx MicroCoroutines? Just to compare the two.

    MicroCoroutines speed is far more superior in terms of speed / alloc than default coroutines.
    But if async is way better, I might consider transitioning to async instead.
     
  35. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    UniRx's MicroCoroutines average around 3.4 seconds, which is within margin of error of MEC. Note that the MicroCoroutines do not use the async API - if they did, they wouldn't be able to average any more than about 11.8 seconds (please, someone correct me if I'm wrong).

    I could of course package my own IEnumerator coroutine solution within UnityAsync but that ruins the point a bit because it wouldn't use the async API (since changed my mind, could be useful for awaiting IEnumerators). I do believe, however, that it would be possible to implement a pooled IEnumerator<T> based coroutine system with similar performance to MEC and MicroCoroutines but with the advantage that custom yield instructions would be easy to implement. From what I understand, MEC's state machine is hard-coded and MicroCoroutine's state machine is as basic as possible for performance reasons.

    One advantage of UniRx.Async, apart from its massive feature list, is that it can return UniTasks instead of Tasks. Haven't tested this but it should mean you can await async coroutines without causing allocations. I think the documentation or code comments state that it will be allocation-free in some circumstances. What those are, I'm not sure. Definitely would be a good idea to incorporate something like that into UnityAsync.
     
    Last edited: Nov 14, 2018 at 10:14 AM
    VergilUa likes this.
  36. VergilUa

    VergilUa

    Joined:
    Dec 22, 2014
    Posts:
    851
    That's quite nice results.

    I'm not really sure if its even worth transitioning then.
    If I would really need to offload the heavy logic to other threads, I'd use jobs or jobs + ecs + burst instead.

    Slowdown is probably caused by context switching. Unless it isn't, then I don't know.
     
  37. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    Yeah probably something to do with context switching. Difficult to tell exactly because the overhead of deep profiling can skew the results.

    Reduced UnityAsync's execution time to 16.5 seconds by switching to arrays instead of Queue<T>s in latest commit.

    Jobs may have the potential to be very performant but they can be difficult to code and well written async code can probably achieve similar performance if for whatever reason you can't rely on burst compiler optimisations. They have their trade-offs (most C# code and libraries use async Task APIs, for example).
     
  38. dadude123

    dadude123

    Joined:
    Feb 26, 2014
    Posts:
    692
    Hmm, interesting.
    What I need is being able to `await` methods, and being able to complete them from somewhere else using `UniTaskCompletionSource` (so I can call SetResult()).

    Not sure if this can be implemented here at all, or if a sort of TaskCompletionSource is possible to have.
     
  39. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    I might need you to provide an example of what you want to do here. I have to admit my knowledge of the nuances of the async programming framework might be a little patchy here and there so not quite sure what you're asking. TaskCompletionSource.SetResult can be called from anywhere - inside another Task, inside an async void method, inside a regular method, etc.

    On a side note, I'm currently working on a custom coroutine runner to replace Unity's built-in runner (and also so that I can await IEnumerators without relying on StartCoroutine). Results so far are promising, though still have some features to add. I get 4.1 seconds in my benchmark, putting this less than a second behind MEC and MicroCoroutines. The big advantage? It is a drop-in replacement so Unity's YieldInstructions and CustomYieldInstructions work. MicroCoroutines only supports yield return null and MEC only supports its own yield instructions. Still need to add support for a few more YieldInstructions and scheduling on LateUpdate, FixedUpdate, and end of frame but this probably won't impact on performance, and I haven't fully optimised it yet.

    EDIT: Obviously all of Unity's yield instructions cause heap allocations but I'm going to implement a pooling system for reusable ones like WaitForSeconds. I'm sure it won't be quite as performant as MEC for non-null yield instructions but it will probably still be much faster than Unity's built-in performance.
     
    Last edited: Nov 14, 2018 at 10:13 AM
    sindrijo_ likes this.
  40. MuckSponge

    MuckSponge

    Joined:
    Jul 11, 2015
    Posts:
    41
    My IEnumerator coroutine solution is pretty much complete now (minus the pooling and integration with async) and gets 4.2 seconds in my benchmark, which makes it ~27% slower than MEC in terms of per-yield overhead. Now remember, this is a drop-in replacement for Unity's coroutine runner so it makes sense to compare it to Unity's built-in coroutine runner, right? Well, the built-in coroutine runner gets 134.0 seconds in the benchmark which makes it... over 3090% slower than my solution and they should be functionally equivalent :eek: (except I don't support edit mode yet).
     
    sindrijo_ likes this.