Search Unity

Asynchronously getting data from the GPU (DirectX 11 with RenderTexture or ComputeBuffer)

Discussion in 'Shaders' started by jimhug, Nov 21, 2014.

  1. jimhug

    jimhug

    Joined:
    Nov 14, 2014
    Posts:
    7
    I have been building a Constructive Solid Geometry renderer in Unity that does most of the hard calculation work on the GPU. So long as I can keep both the data and the computation on the GPU things go well. Unfortunately, I occasionally need to get a small amount of data back from the GPU to the CPU. I've tried using both ComputeBuffer.GetData and Texture2D.ReadPixels to do this and they both take many milliseconds to return a single word of data. I'm pretty sure the reason for the slowness is that the GPU stalls while reaching a sync point before delivering the data.

    I don't need to get the data immediately when I call. I am happy with several frames of delay before I receive the data. However, I am not happy with the frame rate stutters this causes in my app. I've tried all sorts of ways of rearranging these calls to give the GPU time to complete filling the buffers before the call, but nothing seems to help.

    What I would like is an asynchronous version of either the ComputeBuffer.GetData method or of the Texture2D.ReadPixels function (although the Texture2D one seems less likely to work). To be more concrete, right now I retrieve the data by calling:

    myComputeBuffer.GetData(myDataBuffer);

    This function stalls for a couple of milliseconds and then returns the data. I'd like to call something like:

    myComputeBuffer.GetData( (Array data) => HandleData(data));

    In this case it may take 10s of milliseconds before my HandleData callback is invoked, but there wouldn't be any GPU pauses in the meantime.

    Does this request make sense? Is there a way to do this currently that I just haven't figured out?

    My current scary solution idea is to write a native plugin that uses the underlying DirectX11 asynchronous read functions in order to make this work. However, that looks like a world of conflicting memory and multi-threading pain that I'd love to avoid if there was a better approach.
     
    namanam likes this.
  2. maxxa05

    maxxa05

    Joined:
    Nov 17, 2012
    Posts:
    186
    I'd be interested in this too!
     
  3. braaad

    braaad

    Joined:
    Oct 4, 2012
    Posts:
    102
    Can you clarify what you have tried? Are you saying you have tried waiting a few frames but it's causing it to stutter? Or are you referring to the general issue?

    I just did a few tests on my PC. To latency on reading a buffer is roughly 0.6ms, this is the inherent latency you are going to get reading from the gpu. Since you said a small amount of data I am going to assume bandwidth is not an issue. Anything above this I would assume is caused by the main thread blocking while it waits for the GPU to finish processing your threads.

    To confirm waiting a few frames fixes the issue I caused a stall on purpose by reading the data back immediately. I then called the same code inside a co-routine that waited for a few frames before reading it back. The stalling code was much slower as expected and the co-routine code ran at the expected 0.6ms.

    If you have already tried all this (like i said it wasn't clear to me what you have tried) and it still is stalling are you 100% sure you are waiting long enough for the threads to complete? Are you sure it's not writing back into the buffer before you can read from it?
     
  4. jimhug

    jimhug

    Joined:
    Nov 14, 2014
    Posts:
    7
    Thanks for your suggestions and feedback on what is working for you! Your concrete numbers are helpful and I should have included my own in my initial post.

    I am seeing CPU pauses of up to 6ms when I try to read the data immediately - are these similar to what you see when creating a stall? I have tried waiting one full frame before reading and in that case I get down to about 1.2ms. I may be able to reach your 0.6ms by waiting longer, or it may be that my GTX 650 on my specific motherboard has a larger inherent latency than your system.

    However, my goal it to get rid of that final 0.6-1.2ms of latency. I am building this experience for the Oculus Rift and 0.6-1.2ms is a significant cost to pay on any given frame when you need to hit a stable 75FPS. I am pretty sure that the asynchronous APIs at the lower DirectX 11 level can perform a read with no visible CPU or GPU waiting at all. This works because the call initiates the transfer of data from the GPU and then the callback is not invoked until the memory transfer is complete. Waiting a few threads on a unity coroutine solves the issue of the GPU thread having completed. However, it doesn't solve the problem of asynchronously requesting the data in one frame and then checking to see if it has been successfully received in another.

    Thanks again for your help - Jim
     
  5. braaad

    braaad

    Joined:
    Oct 4, 2012
    Posts:
    102
    I don't see a way around this personally, you might reduce it via a native plugin but that is a lot of work for ~5fps unless you REALLY need it.

    What async API are you referring to? Internally Unity would be using this http://msdn.microsoft.com/en-us/library/windows/desktop/ff476428(v=vs.85).aspx. There is no way around the latency going from GPU to CPU memory, you can only hide it. You will either block the main thread or the rendering thread. I would love to be proven wrong though.
     
  6. Torigas

    Torigas

    Joined:
    Jan 1, 2014
    Posts:
    63
    I know i am necro-ing a thread here but I ran into the same issue. If i read back from the GPU every frame, using ComputeBuffer.GetData() causes a performance drop from 80 fps to about 30 fps.

    Did anyone find a good solution to speed up the GetData() part?
    Currently i'm reading 896 bytes per frame from my buffers. I don't get how it could possibly cause such a massive slowdown.
     
  7. NavyFish

    NavyFish

    Joined:
    Aug 16, 2013
    Posts:
    28
    The slowdown is due to a pipeline stall. CPU execution blocks on the GetData() call, flushes all pending commands to the GPU, waits for the GPU to flush its own pipeline, and then finally initiates the transfer. The transfer itself is fairly quick - it's all the pipeline cleanup/flushing that must take place beforehand.

    The general way to hide this latency is through asynchronous transfers, where the CPU doesn't block on the call, thus allowing the GPU to perform the transfer at a point which is convenient for it to do so. This eliminates the lag felt from the stall, but data will be several frames old when received (although that usually doesn't matter).

    You can do this easily in OpenGL, not sure about D3D (although I'd be very surprised if you can't). Problem is that it's fairly low-level and Unity doesn't expose this functionality (yet, hopefully).
     
  8. Plutoman

    Plutoman

    Joined:
    May 24, 2013
    Posts:
    257
    I would very much like this functionality, currently there's no way to actually get data from the GPU at all in an actual game environment. Could put up a feedback option on their site, that's the only way something might happen (that I could see).
     
  9. Torigas

    Torigas

    Joined:
    Jan 1, 2014
    Posts:
    63
    It appears that this pipeline stall does not happen when you use actual Computeshaders instead of graphical shaders.
     
  10. Plutoman

    Plutoman

    Joined:
    May 24, 2013
    Posts:
    257
    Doing a GetData() from a buffer still causes a pipeline stall, regardless of how the buffer was filled. I use compute shaders extensively to fill buffers, but it's not possible to retrieve anything. Most things it doesn't particularly matter, but it would make a lot more sense for me to start producing voxels via compute shaders if I could retrieve data without blocking threads.

    As it is, it's faster to thread it, because the pipeline stall takes longer than the entire generation process itself, and it stalls the main thread (I don't particularly care about when it comes in, but I need to not block the main thread - otherwise, the player experiences stutter as other parts of the world generate, and there's no way to load balance it).
     
  11. NavyFish

    NavyFish

    Joined:
    Aug 16, 2013
    Posts:
    28
    I wonder if there are any plugins out there which leverage the low-level native graphics API that Unity provides in order to grant access to the data asynchronously. If not, seems like there's an opportunity there..
     
  12. NavyFish

    NavyFish

    Joined:
    Aug 16, 2013
    Posts:
    28
  13. karp505

    karp505

    Joined:
    Jul 24, 2014
    Posts:
    18
    I hadn't seen this thread and started a new one along the same lines a couple of days ago: http://forum.unity3d.com/threads/optimizing-texture2d-readpixels.355916/#post-2303125

    hippocoder agrees there, that there is no real good way to do it in Unity and that a plugin would be required. I would love to see that plugin, and have been thinking about giving it a shot myself. Unity is partially built on top of OpenGL, right? In that case it shouldn't be too difficult, but then again I've never made a plugin before. Does anyone have any good resources on it?
     
  14. NavyFish

    NavyFish

    Joined:
    Aug 16, 2013
    Posts:
    28
  15. rishi-ranjan

    rishi-ranjan

    Joined:
    Oct 7, 2015
    Posts:
    22
    I have tried this with Native plugin to capture the resource using ScreenGrab which uses CopyResource and Map internally. With this script Capture with Texture2D works but it has the penalty of ReadPixels. I want to capture using RenderTexture but after couple of calls to CopyResource, graphics driver crashes.

    I can share my sample code if anyone wants to try it out.
     
    Last edited: Oct 10, 2015
    jason-fisher likes this.
  16. NavyFish

    NavyFish

    Joined:
    Aug 16, 2013
    Posts:
    28
    Certainly interested to see the code. I've been away from this project for awhile, other priorities have had my attention.

    As I'm using Compute Buffers, I need to force Unity to run in D3D11 mode. This sucks, but one nice thing is that the native plugin only needs to support a D3D back end. The links above show how I plan to do it, but I also have never written a native plugin for Unity, so there will be a bit of a learning curve involved. I will report back once I get started, but can't say when exactly that might be.
     
  17. jcowles

    jcowles

    Joined:
    May 25, 2016
    Posts:
    11
    2016, ReadPixels continues to stall the GPU without an alternative.

    A native plugin is still the only way around it (confirmed by unity support)

    Just FYI, if anyone finds this thread looking for a solution.
     
    Torigas likes this.
  18. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    It really ought be added to Untiy tho :/ Otherwise it's one of those "oh... UE4 doesn't have this problem" scenarios I'm sure a closed source engine is keen to avoid.
     
    Flurgle and Martin_H like this.
  19. DrKucho

    DrKucho

    Joined:
    Oct 14, 2013
    Posts:
    140
    im in a similar need, to grab render texture data from GPU in Async mode, unity does async upload of textures to GPU but i guess download is something that not many ppl need , so they don't :(
     
  20. TEBZ_22

    TEBZ_22

    Joined:
    Jan 28, 2014
    Posts:
    37
    Pleas !

    If anyone from Unity are listening, pleas make something like ComputeBuffer.AsynkGetData() that returns a boolean to indicate if data are available or not.

    And a ComputeShader.Done() would be nice to be able to chain Compute Shaders.

    I can't really see why this is so much different from a Camera.Render() , taking a picture of a texture?

    /Thomas
     
    Last edited: Aug 1, 2016
    radiantboy likes this.
  21. joergzdarsky

    joergzdarsky

    Joined:
    Sep 25, 2013
    Posts:
    56
    Is that really necessary? I am not that much into ComputeShaders as I started to use them just a about half a year before for procedural planets. But I thought if you require the computation of a previous dispatch you can stack the dispatches by using different kernels, at least this was what I did.
    In my specific case kernel 1 calculated a 257x257 position grid, kernel 2 reused this data to create a 256x256 normal map, and kernel 3 resued this as well to create 64x64 vertice positions.
    But maybe this is not what you have in mind or suits your usecase?

    IMHO there is a good chance that anyone who uses ComputeShaders to calculate ComputeBuffers or Textures and needs that information on the CPU (e.g. for bounding boxes or LOD calculations) urgently needs that function to avoid stalls.
    I second that request of an async call to get the data back.
     
    Last edited: Aug 2, 2016
  22. TEBZ_22

    TEBZ_22

    Joined:
    Jan 28, 2014
    Posts:
    37
    Well... I'll like to do some, constantly repeating, calculations that goes from a graphic representation of the "world" to something that aren't really graphical at all in the end, but a single numerical value. I'm not in a lot of hurry (3/10s is maybe OK, but faster are better :) ). A rather drastic reduction. I have to figure out thats best to put on the GPU and CPU.
    At the same time, the game goes on, and I like that to be smooth.
     
  23. joergzdarsky

    joergzdarsky

    Joined:
    Sep 25, 2013
    Posts:
    56
    +1

    Getting always back to that same problem when trying to workaround it in my procedural universe game. Getting minmal information async back from GPU to CPU is really a core feature that would help a lot!
     
  24. Zolden

    Zolden

    Joined:
    May 9, 2014
    Posts:
    141
    Have anyone checked if this problem still exists in 5.5? If not, are there any prospects to have "async get data" ever implemented? Otherwise the whole compute shader thing is impractical for realtime use. Or everything should run inside gpu.

    For example, I hoped to use particle system and line renderer to visualize stuff from gpu, but getting data from GPU synchronously ruins fps. So, I doomed to implement my own particle system inside gpu.

    And it's only one example. Currently I have a list of problems that could be fixed if getting data wouldn't stall pipeline.
     
  25. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    I implemented this a while ago for a prototype. The prototype was cancelled before we could properly test it but it was working in our simple use case. I guess I could share the source code if anyone is interested. It was for DirectX 11 and textures only (compute buffers would be easy to add). It is from unfinished prototype so there are likely some bugs and other problems.
    Is anybody interested?
     
  26. Zolden

    Zolden

    Joined:
    May 9, 2014
    Posts:
    141
    Hell yea, dude! Bugs don't matter, I'm interested in the principle. And I think yes, won't be hard to do this for compute buffers.
     
  27. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    Here you go. I cleaned it up a little and added few comments. I verified that it at least works with a very simple example. I also added ComputeBuffer support (copy and paste addition really). Let me know if it works in real world scenario. I'm curious how fast is it with all those memory copies and marshalling.
     
    Lipoly, TerraUnity, kaiyum and 4 others like this.
  28. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    That is really cool. Will have to try this with GPU marching cubes.

    Can you queue/check multiple buffers (chunks) per frame?
     
  29. Zolden

    Zolden

    Joined:
    May 9, 2014
    Posts:
    141
    Thanks alot! I'm going to try it as soon as I get better from that flu I catched. Can't focus on coding currently ;/
     
  30. BLadeLaRus

    BLadeLaRus

    Joined:
    Jul 20, 2014
    Posts:
    6
    Thanks man. Will test it tomorrow. If it is working at least in some cases, it will be a huge help!
     
  31. BLadeLaRus

    BLadeLaRus

    Joined:
    Jul 20, 2014
    Posts:
    6
    Ok. I successfully copied buffer of 1024*1024*20 floats in async mode with you lib in 60-70ms. Looks vary promising. As next step will try to migrate our engine in computeShader.
     
  32. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    Yes, as many as you want.

    Cool. I assume 60ms from request to successful retrieve. It should generally take one or two frames. Did you try ComputeBuffer.GetData before? Can you compare?
     
  33. BLadeLaRus

    BLadeLaRus

    Joined:
    Jul 20, 2014
    Posts:
    6
    Yes, 60 ms from AsyncTextureReader.RequestBufferData till successfull AsyncTextureReader.RetrieveBufferData.
    I performed small compression test between ComputeBuffer.GetData and AsyncTextureReader.
    With buffer of 1024x1024x40 floats:
    60-70ms for AsyncTextureReader
    78-100ms ComputeBuffer.GetData()
    Not so significant, but during this period AsyncTextureReader isn't freezing renderring thread unlike ComputeBuffer.GetData(). And that's cool. Small fps drops still appear, but they almost insensible (may be because of large memcpy).
    I know, all of this is not thread safe. But still if overload your RetrieveBufferData() function and give it just a native pointer structure, then it can be run in different thread and it was also successfull for me. In this case no fps drops at all.
     
  34. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    Yeah, I was curious about fps impact. Not how long it takes. Looks like it is working correctly then.

    I see that GetNativeBufferPtr can cause thread synchronization according to docs. I guess I could cache the pointers in AsyncTextureReader to make it faster and keep user friendliness at the same time. People could send buffer pointer to texture function and vice versa otherwise.

    Did you experience any crashes or freezes? It crashed on me few times when closing Unity editor. I didn't care since I didn't know if anybody will use it...
     
  35. BLadeLaRus

    BLadeLaRus

    Joined:
    Jul 20, 2014
    Posts:
    6
    There was one crash of Unity Editor, when i gave AsyncTextureReader.RetrieveBufferData a smaller array than was original buffer. My mistake. After that no crashes but few times Unity Bug Report appeared, don't know why.
    In built application there were also no crashes. But it's vary simple one, and not so stressfull for lib. Future real use will show.
     
  36. Zolden

    Zolden

    Joined:
    May 9, 2014
    Posts:
    141
    Today I managed to test your code. The plan was to call RequestBufferData and then 2 frames later - RetrieveBufferData. Because as the docs say copy operation must be finished before Map method can be called on the resource without stalling pipeline. That takes 2 frames. I expected RetrieveBufferData to take <1 ms, because only 4 bytes have been requested. But I found that RetrieveBufferData method still takes similar amount of time as ComputeBuffer.GetData, which inclines a pipeline stall.

    Profiler said that it was ComputeBuffer.GetNativeBufferPtr line that caused most delay.

    So, I tried to have a cached IntPtr of buffer I used and pass it to the modified RequestBufferData and RetrieveBufferData methods.

    And then Unity crashed. Further tests showed that the crash didn't happen instantly. About 6-7 cycles of Request/Retrieve methods happened successfully, and only then the crash happens.

    Putting ComputeBuffer.GetNativeBufferPtr back to the Request/Retrieve methods removes the crash chance.

    Conclusion: calling ComputeBuffer.GetNativeBufferPtr method before calling the methods from the plugin syncs GPU and CPU and that reduces or minimizes crash chances. Having ComputeBuffer.GetNativeBufferPtr cashed and passed to the Request/Retrieve methods leaves a possibility plugin method to be called in the wrong time, and that causes the crash, as it seems.

    Michal_ do you have any insights of what's going on and how can this be fixed?


    UPDATE:

    A few more observations:

    RequestBufferData method is the one that causes crash if being changed to get cashed buffer IntPtr as parameter. But works fine if it gets the buffer itself and calls GetNativeBufferPtr from inside.

    RetrieveBufferData method doesn't cause crash if it receives the cashed buffer ptr from outside.

    I also noticed that minimum delay caused by RetrieveBufferData method is being reached if it is called 5 frames or more after the RequestBufferData has been called. Though, I have a rather heavy compute buffer computational contents. So, If RetrieveBufferData is being called 5 or more frames after RequestBufferData, it will only take 10ms in 30% of calls, and <1ms in 70% of calls. In the profiler it looks like for 3-4 calls it takes <1ms, than for 1-2 calls it takes 10ms, and it goes in periods. (GetData takes 40 ms in this task). So, it looks like RetrieveBufferData can read the data really fast, but not every frame by some reason. Maybe its because RequestBufferData still calls GetNativeBufferPtr, and that somehow affects the RetrieveBufferData.
     
    Last edited: Dec 17, 2016
  37. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    I fixed a bug and added several error checks. I'll add native pointer caching next. And maybe I'll add an option to preallocate temp buffers to avoid this operation on first request call.

    @Zolden It is hard to imagine why calling GetNativePointer would give different results then using cached pointer. The only reason for a crash that comes to mind is that the pointer is invalid or the request is called on a wrong thread. Do you have multi-threaded rendering enabled? Do you call AsyncTextureReader on main thread only? I'll take a look if you give me minimal repro project.
     
  38. Zolden

    Zolden

    Joined:
    May 9, 2014
    Posts:
    141
    It seems like the only difference is that GetNativePointer method forces sync. I tested it by passing both a cashed pointer and a buffer to the modified version of your RequestBufferData.

    Code (CSharp):
    1. public static Status RequestBufferDataCS(IntPtr bufPtr, ComputeBuffer buffer)
    2. {
    3.         Status status;
    4.         if (buffer == null || bufPtr == null)
    5.             status = Status.Error_InvalidArguments;
    6.         else {
    7.             buffer.GetNativeBufferPtr();
    8.             status = (Status)RequestBufferData(bufPtr);
    9.         }
    10. }
    Here I pass the cashed pointer to the plugin method. And this works without crashes. Simply because there's buffer.GetNativeBufferPtr() line included, which does nothing except forcing to sync. If I comment this line "buffer.GetNativeBufferPtr();" away, the crash will randomly happen after a few successful requests.

    I checked if the pointer is valid, and it was ok. Also, I didn't enable multi-threaded rendering and made sure it's off. And I call AsyncTextureReader methods from the main thread. Here's my little testing unity 5.5 project.

    There's a simple compute shader, that's begin dispatched from Update(). Then I call methods from slightly modified AsynchTextureReader. And if you comment away that buffer.GetNativeBufferPtr(); line from RequestBufferDataCS method, you may witness the crash.

    RetrieveBufferDataCS method works without the need to call buffer.GetNativeBufferPtr(), and causes no crash, but sometimes it still takes 10-12 ms for it to finish, and that may or may not be related to the fact, that still threads are being forced to sync sometimes. Would be interesting to know if RetrieveBufferData can take < 1 ms every call.
     
    jason-fisher likes this.
  39. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    Yeah, but if multi-threaded rendering is turned off, then there is nothing to synchronize. The only additional cost should be the cost of the function call itself. I'd like to see what exactly GetNativeBufferPtr does...
    Ok, I tried many many times and this what I get. First of all, fastGpuReadTest.Update takes roughly 0ms with GetNativeBufferPtr and without it. There is just no difference. And second of all, it doesn't crash. At all. I tried play/pause/stop like million times. I tried alt-tabbing. I kept it running for minutes. It is rock solid on my PC.
    It feels like it is hw/os/driver/unity related. I have Win10 and recent NVidia drivers. What do you have? Can you try different driver and/or Unity version?

    Also, if you look at RequestBufferData implementation, you'll see that it doesn't really do much. It allocates some memory on first call but then it only calls CopyResource function. An async copy function that can silently fail but shouldn't crash.

    I can try it on different pc when I get the chance but I don't think I have access to other configuration than win10/nvidia...
     
  40. Zolden

    Zolden

    Joined:
    May 9, 2014
    Posts:
    141
    Based on the delay it causes, I would say it inserts a command to the gpu command buffer, and pauses the main thread until that command is executed by gpu. And that may be the synchronization between the main thread and gpu threads. This would also guarantee, that command buffer is empty, and thus its state is predictably the same for each request. But it's only a hypothesis.

    What videocard do you have? Sync based delays depend on the amount of time GPU spends to execute compute buffer threads. If your card is fast enough, it may always be < 1ms.

    I have win 8.1 with nvidia gtx 750m videocard, that has rather up to date drivers. I sent the project to a friend with win10, and he reported the crash as well.

    Though, I managed to reproduce the no crash scenario. The probability of crash depends on gpu load. What the shader do is increments each element of 1024x1024 data array, but does it in a loop. It has been 100 cycles initially, and that caused crash on my computer. But the more I reduce the number of loops, the less the chance of crash. If I have < 10 loops, the crash doesn't happen. If I have more, it may happen like after 50 successful gpu reads. If I have 100, it happens after 2 successful reads.

    So, Michal_, I think you may have a more powerful videocard than I, so crash would happen on your computer if shader is forced to do much more work. So, here is the project, that already has 1000 loops in the shader. See if it will cause the crash. If it won't, just open testGpuRead.compute, and increase the number of loops in the 14th line. Make it as big to lower fps < 60. (The buffer.GetNativeBufferPtr line is already commented away from the request method, so it's already in a "ready to crash" state).

    Also, I'd like to ask BLadeLaRus and jason-fisher to try to run this project too and tell us if there's crash and if it will happen with smaller or bigger number of loops in the shader.
     
  41. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    Windows Server 2016 (Win10 DirectX/NV drivers)
    950GTX @ 375.86, i7 850 (8 cores w/HT)

    5.5b7 - no issues with any values for the loop, tested from 5 to 20000 and left 200 running for several minutes.

    5.5f3 - no issues
    5.6b01 - no issues

    I did also comment out the Debug.Logs, disable vsync and enable graphic jobs. It looks like 220 or so is the maximum loop value I can do and stay above 60 fps.
     
    Last edited: Dec 19, 2016
    MD_Reptile likes this.
  42. MD_Reptile

    MD_Reptile

    Joined:
    Jan 19, 2012
    Posts:
    2,664
    I tested it briefly on my integrated intel HD graphics card (its a pain getting unity to cooperate with my dedicated card in this notebook) and although the test gets very low FPS of ~5 or 6 FPS, it doesn't seem to crash, even after several minutes. Seems to keep succeeding and then delaying a moment saying "notready retrieve" in the console.... but still no crashes I had. This is on 5.5.0f3, win 7 x64, DX11 mode in unity.
     
  43. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    It still doesn't crash for me but I think I know what it is now. I tried to run it with DirectX debug layer enabled and this is what I have found:
    Code (CSharp):
    1. D3D11 CORRUPTION: ID3D11DeviceContext::CopyResource: Two threads were found to be executing functions associated with the same Device[Context] at the same time.
    It still works for me for some reason but it obviously isn't right. So, I went back to actually read the manual :) And it turned out you have to call native code through GL.IssuePluginEvent to make sure it is called on rendering thread. I didn't try it yet. I'll tel you know.

    Edit: I just realized. The retrieve method has to be called from main thread. I can get any data back from IssuePluginEvent. And it runs on different thread anyway. I'm not sure this has a correct solution...
     
    Last edited: Dec 19, 2016
  44. MD_Reptile

    MD_Reptile

    Joined:
    Jan 19, 2012
    Posts:
    2,664
    I don't know if it would be applicable here or not, but what about Append/Consume buffers? Isn't that about the fastest in/out of GPU you can do? That was what I was starting to think that was the way to go for sending data back and forth a lot.... but since I try and avoid that I haven't tried it yet, and can't say if it works like I think it does, and if that would apply here or not!

    EDIT: see this video for an implementation:


    around 11 minutes in ^

    Like, you wouldn't send any actual texture data in or out, just changes to float4 arrays or something.... maybe?
     
    Last edited: Dec 19, 2016
  45. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    Append/consume buffers won't let you read any data on the cpu. They may help you keep entire algorithm on the gpu but that depends on your specific needs. For example, we had ocean simulation running on the gpu and we needed to read the result on cpu to use it in physics simulation. Append buffer wouldn't help us there.

    Anyway, I tried to use GL.IssuePluginEvent with the request method and it works. No more errors. The retrieve method is a bigger problem. I can imagine three ways how to make it thread safe. One would require pinning managed memory. That means using unsafe code. I guess that's out of the question? Then I could manually sync the threads on the native side. That would likely be very slow. And third option is to introduce additional temp buffer, memory copy and use atomic operation to signal when the data is ready. Number 3 is probably the most realistic. I'll try it tomorrow.
     
  46. Zolden

    Zolden

    Joined:
    May 9, 2014
    Posts:
    141
    jason-fisher and MD-Reptile thanks for the testing, guys.

    Glad you found something new, so there's still a hope for me this stuff to work. I'll wait then if you can figure things out.

    Well, retrieve method works ok, it's the request one that causes the crash. Would be cool if there was a solution.

    Have you updated the code on git so I could try if calling through GL.IssuePluginEvent fixed the crash issue?

    I still had no chance of checking if retrive method consumes more time than it should for a situation when request method didn't cause any delay by calling GetNativeBufferPtr. There's a chance it will do fine. Though, I don't know the plugin side specifics. Are map/unmap methods too slow to do them in a main thread? Or that memcpy one?
     
  47. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    That's just a coincidence. Retrieve method has the same problem. It calls DirectX methods (map/unmap) and those are not thread safe. Unity is using ID3D11DeviceContext on render thread and I'm using it on main thread. It is just a matter of time until it crashes for someone.

    Nope. I'll do it tomorrow. I need some sleep
     
  48. MD_Reptile

    MD_Reptile

    Joined:
    Jan 19, 2012
    Posts:
    2,664
    Bummer. That youtube link makes it hard to understand what exactly was going on, but I had thought it basically was describing a way to hand data back and forth in a way similar to using a stack. I guess I misunderstood the uploader.
     
  49. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    @Zolden, have you tried using OnPostRender instead of Update?
     
  50. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    I have solved DirectX race condition problems. Minus bugs I possibly introduced, I'm not super focused. DX debug layer no longer gives me any errors. There are still possible race conditions in the plugin though. I originally wrote it as a single-threaded code, so it needs some changes...

    There are some side effects. There are now two memory copies.
    1. copy from dx object to native sys memory on render thread (the only thread that can touch dx objects)
    2. copy from native sys memory to managed memory supplied by user on main thread (thread that executes your scripts)
    Both are internally handled by the Retrieve...Data() methods and the first can happen in unspecified part of the frame (to avoid thread sync). All that means you have to call the Retrieve method at least twice. But really just call it every frame until it succeeds.
    Another side effect is that the result can be available with one more frame delay. You can try to call retrieve method twice a frame to get the result sooner. In Update and OnPostRender for example. Update will request first copy, it will happen in the middle on the frame and OnPostRender will do the second copy.

    I also added resource pointer caching to avoid another thread sync.
    Lets see how fast it is now.
     
    Noisecrime and jason-fisher like this.