Canceling a scheduled IJob

orionburcham · Jan 26, 2018

Concerning the new Job system: Is there any way to cancel a job, once it’s been scheduled?

The fact that I haven’t found a mention of this in the documentation makes me think I’m thinking about jobs the wrong way.

Thanks for any info!

superpig · Jan 26, 2018

orionburcham said: ↑

The fact that I haven’t found a mention of this in the documentation makes me think I’m thinking about jobs the wrong way.
Click to expand...

Yeah, I think you are. Jobs are supposed to be deterministic and 'as-if-single-threaded'; semantically, scheduling a job is the same thing as completing it. (It's possible that the scheduler _did_ start and finish your job immediately on the main thread upon you scheduling it, after all - there is no guarantee that jobs run on other threads, especially given that the jobsystem is supported on platforms which don't even have threading like WebGL). So, once you've scheduled a job it can't be cancelled, as whether the cancellation would be 'in time' or not would be non-deterministic.

Per · Jan 26, 2018

That’s worrying. I think it’s still useful to be able to cancel a job. Not all jobs will be that quick, one of the main reasons for scheduling things on other processors is because whatever it is isn’t quick enough for real-time, there are many situations in which you will want to cancel at least the unstarted jobs and replace them, or have some sort of context to check within your job so it can exit early. Is there some mechanism for this (given that there’s no static variable access)?

laurentlavigne · Jan 26, 2018

Jobs aren't like threads you keep running in the background and auto feeds on data... it'd be nice to have the unity equivalent to that.

orionburcham · Jan 26, 2018

Thanks for the info!

It helps a lot to hear how you guys are thinking about the jobs system- that jobs are intended to be treated 'as-if-single-threaded'. But in truth, they aren't single-threaded on most platforms. It's not too hard to think of cases when it seems like you'd want to cancel them. If you don't mind, please take the following example:

1. The game detects user input
2. Processing that input, it decides that the player character should move 1 unit to the right. It schedules a new job to do this. Because of other scheduled jobs, this new job is not executed before the end of the current frame.
3. At the start of the next frame, the game determines that 'moving 1 unit to the right' was a mistake. Actually, the player character should have moved 1 unit forward.

In real world time, the "move 1 unit to the right" job has not yet been processed. What should the code do?

Might seems like a silly example, but this type of case becomes more likely when doing something like client-side prediction. As instructions from an authoritative server reach the client, the client may need to cancel behaviors that are currently in progress (to be replaced by what the server says should have happened). Those turnaround times can be very short if the client is hosting the server locally.

But back to the first, basic example- what should the code do in this case? How can it guarantee that the player character is never rendered as having moved 1 unit to the right?

Thanks for any help!

Peter77 · Jan 26, 2018

orionburcham said: ↑

they aren't single-threaded.
Click to expand...

In WebGL jobs run on the main thread afaik, because there is no multi-threading in WebGL yet.

orionburcham · Jan 26, 2018

True. The post should have been more specific. I've added a clarification.

recursive · Jan 26, 2018

orionburcham said: ↑

Thanks for the info!

It helps a lot to hear how you guys are thinking about the jobs system- that jobs are intended to be treated 'as-if-single-threaded'. But in truth, they aren't single-threaded on most platforms. It's not too hard to think of cases when it seems like you'd want to cancel them. If you don't mind, please take the following example:

1. The game detects user input
2. Processing that input, it decides that the player character should move 1 unit to the right. It schedules a new job to do this. Because of other scheduled jobs, this new job is not executed before the end of the current frame.
3. At the start of the next frame, the game determines that 'moving 1 unit to the right' was a mistake. Actually, the player character should have moved 1 unit forward.

In real world time, the "move 1 unit to the right" job has not yet been processed. What should the code do?

Might seems like a silly example, but this type of case becomes more likely when doing something like client-side prediction. As instructions from an authoritative server reach the client, the client may need to cancel behaviors that are currently in progress (to be replaced by what the server says should have happened). Those turnaround times can be very short if the client is hosting the server locally.

But back to the first, basic example- what should the code do in this case? How can it guarantee that the player character is never rendered as having moved 1 unit to the right?

Thanks for any help!
Click to expand...

You could probably flag something so the result of the calculation is discarded/ignored or the previous state is rolled back to.

I do this in a couple of places in a heavily threaded project that uses a lot of REST calls. In that project If an image tile is assigned an image, it kicks off an asynchronous process to check/download/load from disk cache and sets a callback to notify the result.

If the user scrolls through the tiles and the tile containing the image is no longer visible, we cancel the subscription callback, but the fetch process continues to completion (even if it has no subscribers) and notifies any remaining handlers that could have been added in the meantime.

We put the result of the image download into an LRU cache regardless of where we got it from, so if the user scrolls back, it'll get immediately displayed.

orionburcham · Jan 27, 2018

That makes sense (and is really cool)!

Though at least in my example case, it’s not just a calculation that’s being done on a separate thread, but the actual Transform change.

It might not be acceptable for the job to just go ahead and complete, because for at least a tiny sliver of time, the Transform could be in an undesirable state. That time could potentially fall when the render decided to execute on the main thread.

We’d need to be able to gaurantee that the incorrect movement never occurred.

Unless of course I’m missing something, or still not groking the basics. It’s definitely happened before.

snacktime · Jan 27, 2018

The best answer is you design it so your scenario doesn't exist. Canceling work in threads is generally only something you do when IO is involved or you have long running jobs. Neither of those are scenarios the job system was really designed for.

The specific best design depends on all the details. But some combination of making all jobs complete in a frame by correctly parallelizing the data would be my goal. So I know that say 99.9% of the time my job to update transform positions completes fast enough that it's not competing with the network tick.

I haven't had the chance to play around with it much yet, but I would measure the average amount of time it takes for a job to start executing after it's scheduled. There are a number of ways they could do that which would result in anything from low/mid nanosecond ranges to several milliseconds. The more aggressive approaches require spinning the cpu more so it's kind of a fine line you have to balance.

Back to networking though, my approach has always been make it very tolerant. Don't expect perfect data. Just like you could be working with stale data, the network can also just drop data. So your logic on the client just has to accept that as a fact of life and by design be able to smooth it out in a good way.

recursive · Jan 27, 2018

snacktime said: ↑

The best answer is you design it so your scenario doesn't exist. Canceling work in threads is generally only something you do when IO is involved or you have long running jobs. Neither of those are scenarios the job system was really designed for.

The specific best design depends on all the details. But some combination of making all jobs complete in a frame by correctly parallelizing the data would be my goal. So I know that say 99.9% of the time my job to update transform positions completes fast enough that it's not competing with the network tick.

I haven't had the chance to play around with it much yet, but I would measure the average amount of time it takes for a job to start executing after it's scheduled. There are a number of ways they could do that which would result in anything from low/mid nanosecond ranges to several milliseconds. The more aggressive approaches require spinning the cpu more so it's kind of a fine line you have to balance.

Back to networking though, my approach has always been make it very tolerant. Don't expect perfect data. Just like you could be working with stale data, the network can also just drop data. So your logic on the client just has to accept that as a fact of life and by design be able to smooth it out in a good way.
Click to expand...

I'll add some more to this as well.

On top of fault-tolerance, the job system sounds like it's predicated on minimal task switching:
1. If C# work is queued, do it until it's done.
2. If C# no work is queued, idle or run native-level jobs.

This design avoids unnecessary task switching. If we add the ability to cancel, we then have to task switch twice, once to cancel the job, then to start the next one that would be queued anyway.

Another thing I thought about is this: If you cancel jobs that would be sharing data or operating on shared data, that could potentially introduce other problems as well. For example, if one job is a producer, and the other is a consumer, you'd have to cancel both. If either of those jobs have dependent jobs, you'd have to keep cancelling all of the dependencies, and you'd get a runaway task switching scenario if you've got even a mildly complicated graph or you'd easily wind up with orphaned jobs that never rejoin the main thread.

While I haven't made a full production project with an ECS, I have played around with Entitas and EgoCS. The best things I've learned to deal with this sort of situation is either the addition of some kind of delete or "changed" flag (which then only applies changes from a buffer to the "frame reference" object, or to accumulate changes, and we resolve what happens at the end of the frame (or some other sync point(s) if necessary).

This reminds me of a discussion I had with the first AAA Technical Director I worked for, in which the question was asked during the job interview: "If an RTS unit is both damaged below 0 HP and healed in the same frame, which when do you delete the object?"

The correct answer was to sum up all of the healing and damage, taking into account any other effects active only at the start of the current frame. Then, you test health and only then if the unit died, recycle it before the next frame and play it's death animation. If it's alive, you change it's health and begin applying any newly acquired effects from that frame.

@orionburcham - For some threading with cancellation abilities, might I recommend UniRX? I recently used it in a product that shipped recently, and I was able to easily build custom schedulers, async webrequest->deserialization chains, and cancellation and error handling logic on top of the API it provides. It's really good for abstracting/wrapping the lower level .NET threading if that's closer to what you need.

eddietree · Jan 20, 2021

sometimes Jobs can get stuck and we need a clean way to Abort without having to restart the unity editor, is there a way?

jasons-novaleaf · Jan 21, 2021

the jobs should only run in playmode, so if your playmode doesn't terminate that sounds like a problem. are you sure it's jobs that is causing the problem?

eddietree · Jan 21, 2021

This problem occurs on unity 2020.1, u can easily test this easily by writing a job that has a spinning while loop (while(true){}) and press Play and then try to unpress play.

tertle · Jan 21, 2021

Infinite loops are bad anywhere.

snacktime · Jan 21, 2021

eddietree said: ↑

This problem occurs on unity 2020.1, u can easily test this easily by writing a job that has a spinning while loop (while(true){}) and press Play and then try to unpress play.
Click to expand...

What you want is basically not done in any sane engine/library/framework. Some problems are just a case of don't do that. Like using up all available memory or stack overflows. Aborting a thread forcibly isn't even supported in recent versions of .Net. If something creates a thread that goes out of control, you terminate the process.

Nyanpas · Jan 22, 2021

I use the job system to generate millions of vertices for building generation which is also non-deterministic. This is a case where I would really like to have async jobs where I can also to some extent specify how much CPU power the jobs can use. I am looking into System.Threading.Tasks for a standalone version, but for the Unities I would like some more "integrated" ways of doing it.

eddietree · Jan 22, 2021

just want to abort the job, seems like a reasonable request, or at least force stop within the editor as a debug thing

superpig · Jan 22, 2021

eddietree said: ↑

just want to abort the job, seems like a reasonable request, or at least force stop within the editor as a debug thing
Click to expand...

Unfortunately there's no particularly easy way to do this right now. If you're really stuck you could try using a debugger to poke in a NullReferenceException.

jasons-novaleaf · Jan 22, 2021

eddietree said: ↑

just want to abort the job, seems like a reasonable request, or at least force stop within the editor as a debug thing
Click to expand...

Maybe you could instead split your job up into chunks, and check a .shouldAbort==true status between each chunk.

recursive · Jan 22, 2021

Maybe look into
SharedStatic<T>
as a cancel token that's set from the main thread and read by your job loop code? Haven't tried this.

rulk · Feb 4, 2021

If we do not have a way to cancel a job, how do we do graceful shutdown and memory clean up?

Say I have a number of jobs all operating on one read only persistent NativeArray.
So these jobs had been scheduled and now I'm waiting for a result.
Before jobs have time to complete, I un-press "Play" button in the editor.
If I dispose NativeArray while jobs are still running I would get an error. So basically I need to delay exit from Play mode before all jobs have completed? Can these be done?

What is a correct way of disposing NativeArray in this case?

DreamingImLatios · Feb 4, 2021

Call complete on the JobHandle before disposing. Yes, if your job is extremely long-running, it can make exiting play mode a little laggy.

nyscersul · Mar 9, 2021

Try reconsidering what the job system is as a task scheduler, if you are running a long standing for loop of some kind, you can have the job instead setup to run one iteration per job schedule, and then the main thread can easily manage the progress and also be aware of it, whilst breaking up the longstanding job into bitesize pieces - when it comes time to cancel just dont schedule the next iteration.

Im using a combination of jobs and threads in my game, due to the need for managed code, and the current setup seems to be proving that using jobs exclusively is much more effective, namely in the case of locks and such, so if you set up the job to reuse the data each time it is scheduled, and run that one iteration per job, you have much more control. It may slow things down, for a long running iteration, but, it will also avoid the lagginess you are experiencing.

Generally scheduling anything to run for a long time and not be able to stop is a bad thing. Breaking up the iteration will solve that in any format where you do it. For example it might enable you to run said process on a single core without losing your framerate completely.

Nyanpas · Mar 9, 2021

I suddenly thought of the job system as a boomerang that depending on the skill of the thrower it will come back after being tossed. If it returns it is also up to the skill of the thrower to catch it correctly.

Is this a good analogy?

GXMark · Jul 26, 2023

I really think IJob should have a cancel because there are principally two scenarios which you may want to use another thread. Case 1. Small repetitive functionality which you don't want on your main thread. Case 2. Longer running functionality such as de/serialization activities that cant be split so easily into repetitive work. Now lets say you have thousands of them. If for any reason during the game or at editor time (for example, using ExecuteInEditMode) you decide to want to stop/cancel your processing, then you really don't want to have to wait until the job buffer empties. It laggy / time wasteful and not particularly elegant. So please Unity can you look into working out a cancel hook for your job system as its these kind of little extras which make for a great game engine.

davenirline · Jul 27, 2023

GXMark said: ↑

I really think IJob should have a cancel because there are principally two scenarios which you may want to use another thread. Case 1. Small repetitive functionality which you don't want on your main thread. Case 2. Longer running functionality such as de/serialization activities that cant be split so easily into repetitive work. Now lets say you have thousands of them. If for any reason during the game or at editor time (for example, using ExecuteInEditMode) you decide to want to stop/cancel your processing, then you really don't want to have to wait until the job buffer empties. It laggy / time wasteful and not particularly elegant. So please Unity can you look into working out a cancel hook for your job system as its these kind of little extras which make for a great game engine.
Click to expand...

Why? I think there's just a misunderstanding on what the job system is for. It's not for long running processes. Just let it finish.

GXMark · Jul 27, 2023

You mean as long as its for what you like doing with it !

Thygrrr · Jul 30, 2023

I keep getting errors from the subscene loading job when Playmode is exited before that job has completed. (I load about 100 subscenes over 5 seconds). If there is a recompile or asset refresh coming after exiting, this seems to lead to the dreaded corrupt entity header file bug.

There is no way to design around this because PlayMode may end at any time.

Unity, you absolutely need to put in code that reliably kills these threads on ExitPlayMode if you won't provide a cancellation token or similar to cleanly abort these jobs - and wait for them to actually cancel and complete!

The same is true for async Start and the like. These Tasks keep running after PlayMode ends, and if they instantiate assets, they will then do so in the editor scene instead of the play scene. Unrelated to jobs, but very related to Unitys application life cycle management

Search Unity

Canceling a scheduled IJob

orionburcham

superpig

Drink more water! Unity Technologies

Per

laurentlavigne

orionburcham

Peter77

QA Jesus

orionburcham

recursive

orionburcham

snacktime

recursive

eddietree

jasons-novaleaf

eddietree

tertle

snacktime

Nyanpas

eddietree

superpig

Drink more water! Unity Technologies

jasons-novaleaf

recursive

rulk

DreamingImLatios

nyscersul

Nyanpas

GXMark

davenirline

GXMark

Thygrrr

Search Unity

Unity ID

Useful Searches

Canceling a scheduled IJob

Drink more water! Unity Technologies

QA Jesus

Drink more water! Unity Technologies