Search Unity

Canceling a scheduled IJob

Discussion in 'Data Oriented Technology Stack' started by orionburcham, Jan 26, 2018.

  1. orionburcham

    orionburcham

    Joined:
    Jan 31, 2010
    Posts:
    491
    Concerning the new Job system: Is there any way to cancel a job, once it’s been scheduled?

    The fact that I haven’t found a mention of this in the documentation makes me think I’m thinking about jobs the wrong way.

    Thanks for any info!
     
  2. superpig

    superpig

    Quis aedificabit ipsos aedificatores? Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,184
    Yeah, I think you are. Jobs are supposed to be deterministic and 'as-if-single-threaded'; semantically, scheduling a job is the same thing as completing it. (It's possible that the scheduler _did_ start and finish your job immediately on the main thread upon you scheduling it, after all - there is no guarantee that jobs run on other threads, especially given that the jobsystem is supported on platforms which don't even have threading like WebGL). So, once you've scheduled a job it can't be cancelled, as whether the cancellation would be 'in time' or not would be non-deterministic.
     
  3. Per

    Per

    Joined:
    Jun 25, 2009
    Posts:
    456
    That’s worrying. I think it’s still useful to be able to cancel a job. Not all jobs will be that quick, one of the main reasons for scheduling things on other processors is because whatever it is isn’t quick enough for real-time, there are many situations in which you will want to cancel at least the unstarted jobs and replace them, or have some sort of context to check within your job so it can exit early. Is there some mechanism for this (given that there’s no static variable access)?
     
  4. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    2,015
    Jobs aren't like threads you keep running in the background and auto feeds on data... it'd be nice to have the unity equivalent to that.
     
  5. orionburcham

    orionburcham

    Joined:
    Jan 31, 2010
    Posts:
    491
    Thanks for the info!

    It helps a lot to hear how you guys are thinking about the jobs system- that jobs are intended to be treated 'as-if-single-threaded'. But in truth, they aren't single-threaded on most platforms. It's not too hard to think of cases when it seems like you'd want to cancel them. If you don't mind, please take the following example:

    1. The game detects user input
    2. Processing that input, it decides that the player character should move 1 unit to the right. It schedules a new job to do this. Because of other scheduled jobs, this new job is not executed before the end of the current frame.
    3. At the start of the next frame, the game determines that 'moving 1 unit to the right' was a mistake. Actually, the player character should have moved 1 unit forward.

    In real world time, the "move 1 unit to the right" job has not yet been processed. What should the code do?

    Might seems like a silly example, but this type of case becomes more likely when doing something like client-side prediction. As instructions from an authoritative server reach the client, the client may need to cancel behaviors that are currently in progress (to be replaced by what the server says should have happened). Those turnaround times can be very short if the client is hosting the server locally.

    But back to the first, basic example- what should the code do in this case? How can it guarantee that the player character is never rendered as having moved 1 unit to the right?

    Thanks for any help!
     
    Last edited: Jan 26, 2018
  6. Peter77

    Peter77

    Joined:
    Jun 12, 2013
    Posts:
    4,005
    In WebGL jobs run on the main thread afaik, because there is no multi-threading in WebGL yet.
     
  7. orionburcham

    orionburcham

    Joined:
    Jan 31, 2010
    Posts:
    491
    True. The post should have been more specific. I've added a clarification.
     
  8. recursive

    recursive

    Joined:
    Jul 12, 2012
    Posts:
    591
    You could probably flag something so the result of the calculation is discarded/ignored or the previous state is rolled back to.

    I do this in a couple of places in a heavily threaded project that uses a lot of REST calls. In that project If an image tile is assigned an image, it kicks off an asynchronous process to check/download/load from disk cache and sets a callback to notify the result.

    If the user scrolls through the tiles and the tile containing the image is no longer visible, we cancel the subscription callback, but the fetch process continues to completion (even if it has no subscribers) and notifies any remaining handlers that could have been added in the meantime.

    We put the result of the image download into an LRU cache regardless of where we got it from, so if the user scrolls back, it'll get immediately displayed.
     
    orionburcham likes this.
  9. orionburcham

    orionburcham

    Joined:
    Jan 31, 2010
    Posts:
    491
    That makes sense (and is really cool)!

    Though at least in my example case, it’s not just a calculation that’s being done on a separate thread, but the actual Transform change.

    It might not be acceptable for the job to just go ahead and complete, because for at least a tiny sliver of time, the Transform could be in an undesirable state. That time could potentially fall when the render decided to execute on the main thread.

    We’d need to be able to gaurantee that the incorrect movement never occurred.

    Unless of course I’m missing something, or still not groking the basics. It’s definitely happened before. :p
     
    Last edited: Jan 27, 2018
  10. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,355
    The best answer is you design it so your scenario doesn't exist. Canceling work in threads is generally only something you do when IO is involved or you have long running jobs. Neither of those are scenarios the job system was really designed for.

    The specific best design depends on all the details. But some combination of making all jobs complete in a frame by correctly parallelizing the data would be my goal. So I know that say 99.9% of the time my job to update transform positions completes fast enough that it's not competing with the network tick.

    I haven't had the chance to play around with it much yet, but I would measure the average amount of time it takes for a job to start executing after it's scheduled. There are a number of ways they could do that which would result in anything from low/mid nanosecond ranges to several milliseconds. The more aggressive approaches require spinning the cpu more so it's kind of a fine line you have to balance.

    Back to networking though, my approach has always been make it very tolerant. Don't expect perfect data. Just like you could be working with stale data, the network can also just drop data. So your logic on the client just has to accept that as a fact of life and by design be able to smooth it out in a good way.
     
    orionburcham and recursive like this.
  11. recursive

    recursive

    Joined:
    Jul 12, 2012
    Posts:
    591
    I'll add some more to this as well.

    On top of fault-tolerance, the job system sounds like it's predicated on minimal task switching:
    1. If C# work is queued, do it until it's done.
    2. If C# no work is queued, idle or run native-level jobs.

    This design avoids unnecessary task switching. If we add the ability to cancel, we then have to task switch twice, once to cancel the job, then to start the next one that would be queued anyway.

    Another thing I thought about is this: If you cancel jobs that would be sharing data or operating on shared data, that could potentially introduce other problems as well. For example, if one job is a producer, and the other is a consumer, you'd have to cancel both. If either of those jobs have dependent jobs, you'd have to keep cancelling all of the dependencies, and you'd get a runaway task switching scenario if you've got even a mildly complicated graph or you'd easily wind up with orphaned jobs that never rejoin the main thread.

    While I haven't made a full production project with an ECS, I have played around with Entitas and EgoCS. The best things I've learned to deal with this sort of situation is either the addition of some kind of delete or "changed" flag (which then only applies changes from a buffer to the "frame reference" object, or to accumulate changes, and we resolve what happens at the end of the frame (or some other sync point(s) if necessary).

    This reminds me of a discussion I had with the first AAA Technical Director I worked for, in which the question was asked during the job interview: "If an RTS unit is both damaged below 0 HP and healed in the same frame, which when do you delete the object?"

    The correct answer was to sum up all of the healing and damage, taking into account any other effects active only at the start of the current frame. Then, you test health and only then if the unit died, recycle it before the next frame and play it's death animation. If it's alive, you change it's health and begin applying any newly acquired effects from that frame.

    @orionburcham - For some threading with cancellation abilities, might I recommend UniRX? I recently used it in a product that shipped recently, and I was able to easily build custom schedulers, async webrequest->deserialization chains, and cancellation and error handling logic on top of the API it provides. It's really good for abstracting/wrapping the lower level .NET threading if that's closer to what you need.
     
    Last edited: Jan 27, 2018
    orionburcham likes this.