Search Unity

Asynchronously getting data from the GPU (DirectX 11 with RenderTexture or ComputeBuffer)

Discussion in 'Shaders' started by jimhug, Nov 21, 2014.

  1. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    Is there a base delay always measured in frames or are you assuming 60fps? I.e with vsync off and 200fps, at most it may be a few ms faster because it is polling more frequently? is there an actual frame-related delay?
     
  2. Zolden

    Zolden

    Joined:
    May 9, 2014
    Posts:
    141
    Congratulations, it works perfect:

    1. Doesn't crash anymore.
    2. First Retrieve takes 0.01 ms, and the second one takes <1 ms (for 1024x1024 array of float).
    3. Request takes 0.02 ms.

    Now tell me please, will it be possible to make it work conveyor-alike with 6 buffers? The thing is: to guarantee that 2nd retrieve is successful in 100% of checks, I have to wait 4 frames after request, then 2 frames are consumed by two retrieves. But I would like to get stuff from gpu on every frame. So, I plan to do this:
    Code (csharp):
    1.  
    2. Frame 1:   Request(buffer1)   Retrieve1(buffer3)   Retrieve2(buffer2)
    3. Frame 2:   Request(buffer2)   Retrieve1(buffer4)   Retrieve2(buffer3)
    4. Frame 3:   Request(buffer3)   Retrieve1(buffer5)   Retrieve2(buffer4)
    5. Frame 4:   Request(buffer4)   Retrieve1(buffer6)   Retrieve2(buffer5)
    6. Frame 5:   Request(buffer5)   Retrieve1(buffer1)   Retrieve2(buffer6)
    7. Frame 6:   Request(buffer6)   Retrieve1(buffer2)   Retrieve2(buffer1)
    8. ...
    9.  
    Will this work? Won't the second inner copy of data you mentioned interfere between different Retrieve calls? Will Ptr be cashed and returned for each of the buffers?

    Anyway, thank you very much, this is already a huge benefit, to read data without dropping fps on any frame.

    Good idea, gonna try, maybe it will save a frame indeed.
     
    jason-fisher likes this.
  3. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    Just curious if you have an updated test/performance test project to share.

    I would be interested in integrating this with a multi-threaded high speed priority queue/octree setup -- use coroutines to manage the job queue where we can WaitForEndOfFrame() and abort a read if the camera moves too far. Basically, it's an octree dive based on frustum/LOD/distance and the leafs are added to a priority queue per thread with their distance to the camera as the priority value, then these get fed to the multi-threaded job queue where each thread pulls the closest chunk to the camera, generates noise into an isosurface and uses MC or DC to render to triangles.

    I think I want to use an async GPU streaming setup to replace the multi-threaded CPU-based noise/triangle generation and treat multiple buffers like threads. Noise/triangles returned for caching/storage/physics.
     
  4. Zolden

    Zolden

    Joined:
    May 9, 2014
    Posts:
    141
    Sure, here.

    The fun thing is I found a configuration that only takes 2 frames to get the data. I don't understand how it works, but debug.log shows that it does.

    On the first frame in Update() I call Request and the first Retrieve. On the second frame in OnPostRender() I call second Retrieve. And the data is there in 100% of calls.
     
    MD_Reptile likes this.
  5. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    I need to play with this some more .. this is exciting.
     
    Last edited: Dec 21, 2016
  6. MD_Reptile

    MD_Reptile

    Joined:
    Jan 19, 2012
    Posts:
    2,664
    I'm curious why everybody seems to put the compute shader file in the resources folder? Is that just for convenience of not having to drag and drop to assign it in a script file (where your dispatching it or whatever) or is there a bigger reason I didn't know about?
     
  7. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    It wasn't quite working for me .. I was only getting NotReady retrieve and 0 value printed and then the counter would start over again.

    I was able to get it working with a couple of changes but haven't dug into profiler yet.

    Code (csharp):
    1. void Update () {
    2.         if (cycleCounter == -1) {
    3.             Debug.Log("start cycle @" + Time.frameCount);
    4.             _shader.Dispatch(kiBufferWrite1, 32, 32, 1);
    5.             doCycleCounter();
    6.         }
    7.  
    8.  
    9.         if (cycleCounter == 0) {
    10.             if (AsyncTextureReader.RequestBufferData(buffer1) != AsyncTextureReader.Status.NotReady) {
    11.                 Debug.Log("succeed request @" + Time.frameCount);
    12.                 doCycleCounter();
    13.             }
    14.         }
    15.  
    16.         if (cycleCounter == 1) {
    17.             if (AsyncTextureReader.RetrieveBufferData(buffer1, dataArray) != AsyncTextureReader.Status.NotReady) {
    18.                 Debug.Log("retrieve in update: " + dataArray[1024 * 1024 - 1] + " @" + Time.frameCount);
    19.                 doCycleCounter();
    20.             }
    21.         }
    22.  
    23.     }
    24.     public void OnPostRender(){
    25.         if (cycleCounter == 1) {
    26.             if (AsyncTextureReader.RetrieveBufferData(buffer1, dataArray) != AsyncTextureReader.Status.NotReady) {
    27.                 Debug.Log("retrieve in post: " + dataArray[1024 * 1024 - 1] + " @" + Time.frameCount);
    28.                 doCycleCounter();
    29.             }
    30.         }
    31.     }
    32.     public void OnPreRender(){
    33.         if (cycleCounter == 1) {
    34.             if (AsyncTextureReader.RetrieveBufferData(buffer1, dataArray) != AsyncTextureReader.Status.NotReady) {
    35.                 Debug.Log("retrieve in pre: " + dataArray[1024 * 1024 - 1] + " @" + Time.frameCount);
    36.                 doCycleCounter();
    37.             }
    38.         }
    39.     }
    40.     void doCycleCounter(){
    41.         cycleCounter++;
    42.         if (cycleCounter == 2)
    43.             cycleCounter = -1;
    44.     }
    Now the debug output includes frame count and looks like:

    Code (csharp):
    1.  
    2. start cycle @1
    3. succeed request @1
    4. retrieve in update: 100 @3
    5. start cycle @4
    6. succeed request @4
    7. retrieve in post: 200 @7
    8. start cycle @8
    9. succeed request @8
    10. retrieve in post: 300 @12
    11. start cycle @13
    12. succeed request @13
    13. retrieve in post: 400 @17
    14.  
    So the first cycle completes very quickly, just 2 frames later and it's ready during the update cycle -- but the rest seem to be ready in post and 3 or 4 frames later. In those cases, waiting for Update() could cost an additional frame ..

    This is at about 160fps with vsync off -- 5.6ms - 7ms CPU time per frame (most spent on Debug.Log of course). Still need to play with profiler some more, but that sounds like 30ms instead of 60ms @Michal_? About 2 frames still sounds right for 60fps.

    [edit]Added OnPreRender to the code .. updated run looks like:
    Code (csharp):
    1. start cycle @1
    2. succeed request @1
    3. retrieve in update: 100 @3
    4. start cycle @4
    5. succeed request @4
    6. retrieve in pre: 200 @7
    7. start cycle @8
    8. succeed request @8
    9. retrieve in pre: 300 @12
    10. start cycle @13
    11. succeed request @13
    12. retrieve in pre: 400 @17
    13.  
    -- basically, the timing through at least frame 17 is the same, but it's available in OnPreRender instead of OnPostRender.. which is probably more useful to someone?

    And we can see that the shader is updating the same buffer over and over again, so you should be able to stream in data as a multiple-step calculation evolves.
     
    Last edited: Dec 21, 2016
  8. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    The shader is being loaded/referenced by code here, which would default to the Resources folder for loading the .compute file:

    Code (csharp):
    1. _shader = Resources.Load<ComputeShader>("testGpuRead");
     
    Last edited: Dec 21, 2016
  9. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    I don't think you can guarantee when exactly the result is ready. It is partially hw dependent. It is impossible to say exactly without knowing how Unity synchronizes its threads. Let me clarify how it works now. I'll update readme later.

    First you need to know that CPU and GPU aren't working on the same frame at any given time. CPU is always at least one frame ahead of GPU. It is described at the bottom of this page. It is important to know that the difference doesn't have to be one frame. Three frames is the default in fact. It is controlled by the application and user can even change it through drivers' control panel. Now lets see what the plugin does.
    1. You call AsyncTextureReader.RequestTextureData on main thread. It will ask Unity to run the request on render thread (IssuePluginEvent). That should happen later in the same frame. It depends on how much work render thread already has and/or on how Unity syncs render and main threads
    2. RequestTextureData is called on render thread by Unity. It asks GPU to copy the texture to system memory. It will take at least one frame. Probably more. It depends on how many frames CPU is ahead of GPU
    3. The texture is copied to system memory one or more frames later to avoid CPU/GPU sync.
    4. You call AsyncTextureReader.RetrieveTextureData on main thread. It will ask Unity to run the method on render thread. That should happen later in the same frame
    5. RetrieveTextureData is called on render thread by Unity. It will copy from DX texture in system memory to temp buffer in system memory.
    6. You call RetrieveTextureData again on main thread. If the previous step already finished then the texture is copied from temp system memory buffer to your managed buffer.
    This should explain why calling RetrieveTextureData twice a frame can help you. If you call Retrieve in Update method then it will be executed on render thread sometime later that frame. Probably before OnPostRender method. If you call Retrieve again in OnPostRender method then you can successfully retrieve the data.

    Hope it makes sense. It is difficult to explain. It needs a picture :)
     
  10. Zolden

    Zolden

    Joined:
    May 9, 2014
    Posts:
    141
    Well, my knowledge here is limited by the article Michal_ linked. And still I could hardly explain why in my case it works and on your side it doesn't. Maybe it's hardware dependent indeed. And the best way to do it is not to rely on a constant amount of frames between the requests, but rather do a simple state machine, and check for each stage to be finished, before looping to the next one.

    Anyway, I'm deeply satisfied with how the scripts work, and gonna try to use them in a real simulation. Will report in this thread about the results.

    Michal_ do you plan to improve the scripts? For example to add a custom length struct support for data array to be filled with gpu data? I have no problem with getting data through float[] array, but would be convenient to have struct support.

    Also, if I want to use multiple buffers, will the script remember a Ptr for each one? Can it remember more than one Ptr's?
     
  11. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    Take a look at the code I shared above -- that is doing what you are suggesting here and works fine. You can just copy/paste it over the Update/OnPostRender from the test project you shared. The big change really is testing the retrieval return before advancing:

    if (AsyncTextureReader.RetrieveBufferData(buffer1, dataArray) != AsyncTextureReader.Status.NotReady) { }
     
    Zolden likes this.
  12. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    I don't plan any active development. I don't use it in my projects or at work. With that said, I don't mind fixing bugs and adding new features as long as it doesn't take too much of my time.
    I assume you're talking about native buffer/texture pointers? Yes, you can use as many as you want. Just keep in mind that every buffer has two copies in system memory. And there is currently no way how to reclaim that memory even if you no longer need it. Probably not a big deal on desktop.
     
  13. BLadeLaRus

    BLadeLaRus

    Joined:
    Jul 20, 2014
    Posts:
    6
    Greate job. Works solid and stable now. I looked at your native C++ plugin source code and have a question. Can be same behavior reproduced for other versions of DirectX (10,12)? Or even on OpenGL. Because i tryed run my game on few other computers, and some of them have Windows 10 installed with DirectX 12. And it's not working on those PCs.
     
  14. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    Two things:
    1. Yes, it can be extended to support additional rendering APIs. Both OpenGL and DX12 support async resource copy. And I'm sure Vulkan supports it too, but I know next to nothing about Vulkan.
    2. Windows 10 has full support for DX11 and every GPU that supports DX12 also supports DX11. This plugin should work perfectly fine on Win10. In fact, I developed it on Win10. So, unless you build your app specifically for DX12 it should run. Maybe there is a bug in the implementation. I need more information.
     
    Torigas likes this.
  15. Zolden

    Zolden

    Joined:
    May 9, 2014
    Posts:
    141
    So, I've tried to use the script in my actual project, and it worked great. GPU read operations take < 0.1ms, read delay is 2-3 frames, and in practice the outcome is what I actually expected.

    Here's an example. Ground, tank and projectiles are being computed in GPU as a huge amount of physically interacting particles. But VFX effects are being created by particle systems on CPU side, so I pass explosions' coordinates to CPU. And we can see there's no noticeable delay between the GPU side physical explosions and their CPU visualization:

    https://gfycat.com/BrightCautiousCougar

    So, thanks again Michal_, your code have been pretty useful.
     
    Last edited: Jan 6, 2017
    jason-fisher and MD_Reptile like this.
  16. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    That is really cool. Could you solve IK on the GPU and animate a large golem? The effect seems perfect for mud or flesh as it is.
     
  17. Zolden

    Zolden

    Joined:
    May 9, 2014
    Posts:
    141
    In general, solving IK is just some math, so it can be done in GPU. In my project there's already bones implemented, that can support buildings and animate moving things, and they are being computed in GPU as well. For example, tank is kept together by a few bones (visualized as green lines on the image), and its gun's tilt is controlled by two bones whose length is controlled by the player:



    So, I could make a golem-alike creature and animate it by a system of bones. Though, its legs won't be able to intersect in two dimentions, so it would be a kind of unnatural monster with not intersecting legs.
     
  18. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    Looks great! I'm happy it is useful to someone. That it wasn't a waste of time...
     
  19. Aaron-Meyers

    Aaron-Meyers

    Joined:
    Dec 8, 2009
    Posts:
    305
    So glad I found this thread and your plugin! Thank you!
     
  20. strich

    strich

    Joined:
    Aug 14, 2012
    Posts:
    374
    Hey guys really love what you've all done here - And thanks for working on it open-source. I was curious after reading the thread - Seems like you've all been working hard to ensure there is zero sync locking, however in my case I think I may need to ensure the result is available within a max of 1-2 frames. Is is possible with this to just deadlock the CPU and wait for the GPU, if required?
     
  21. Zolden

    Zolden

    Joined:
    May 9, 2014
    Posts:
    141
    I think you might just use ComputeBuffer.GetData() for this case, it makes the most recent gpu data available in the current frame on cpu side by stalling gpu pipeline.
     
  22. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    It is possible to wait for the gpu but that would happen on render thread and you still need to get the result on main thread... So theoretically possible but complicated. If you don't mind the stalls just use Unity's methods.
     
    strich likes this.
  23. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    I am using this inside of a coroutine that looks like this:

    while(!GPUReady()) {
    yield return null; // skip a frame and try again
    }

    If you want to start blocking after N frames, I think you could just do this:

    int maxFrames = 2, frame = 0;
    while(!GPUReady()) {
    if (frame < maxFrames) yield return null;
    frame++;
    }

    But it might be better to do something like this:

    int maxFrames = 2, frame = 0;
    while(!GPUReady()) {
    if (!paused && frame > maxFrames) paused = true;
    frame++;
    yield return null;
    }
    paused = false;

    where 'paused' is a global bool that your other loops watch.
     
    Last edited: Jan 31, 2017
    strich likes this.
  24. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    @Michal_ -- should this work with Texture3D? I'm able to create a Texture2D, SetPixels, Apply and can RequestTextureData without issues but if I switch to Texture3D it crashes Unity in RetrieveTextureData.

    Edit: Nevermind -- ID3D11Texture2D is why. :)
     
  25. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    Yeah, it doesn't. That's a good point. I didn't think about that. It wouldn't be difficult to add if there is a demand. I'll add some error check at the very least.
     
  26. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    https://github.com/digitalsanity/AsyncTextureReader

    I gave it an initial shot .. probably duplicated functions where I could have overridden, but retrieving is no longer crashing and the data looks right. I am not certain if this memcpy is 100% correct -- it seems close anyway:

    Code (csharp):
    1.  
    2.     for (unsigned int depth = 0; depth < desc.Depth; ++depth) {
    3.         for (unsigned int row = 0; row < desc.Height; ++row)
    4.         {
    5.             char* dest = ((char*)cpuResource->cpuBuffer) + row * depth * desc.Width * pixelSize;
    6.             char* src = ((char*)resource.pData) + row * depth * resource.RowPitch;
    7.             memcpy(dest, src, desc.Width * pixelSize);
    8.         }
    9.     }
    10.  
    Should we copy a slice instead of a row of a slice at a time?

    The compute shader can write into this (RWTexture3D) even if it's not a RenderTexture with enableRandomWrite enabled in Unity?

    Edit:

    OK, maybe I wasn't going about this the right way. My intention was to use this to generate a GPU-noise isosurface and extract multiple mipmap levels for octree generation that can be sent to a CPU-based marching-type function to build the mesh. The density data is written in the compute shader to a RWTexture3D RFloat as basically RWTexture3D[float3(id.x,id.y,id.z)] = 0.3.

    I would like to use the resulting associated dataArray[] to feed to the mesher and also back into the GPU noise/density function on occasion.

    .. so I don't think that will work with [float3()]. I will have to keep a separate array/texture that is normalized to integer pixel space because I would like to keep using [float3()] for a separate GPU-based meshing pipeline.
     
    Last edited: Feb 2, 2017
  27. strich

    strich

    Joined:
    Aug 14, 2012
    Posts:
    374
    Hey guys,
    I finally got around to implementing this into my own project to help scale it out. It definitely helps reduce overheads a lot! However I did notice that it will still cost quite a bit of time when finally pulling in a float4[4096*4096] array:
    upload_2017-3-11_18-37-40.png

    The above screenshot is showing some odd results, as the combined 60ms delay is not true and I am getting around 60FPS in those periods. I double checked I don't have vsync on, too.

    Is this best case scenario? I did notice that smaller arrays definitely decrease the delay, which makes me wonder if this is just the best that can be done for large arrays.

    Though I did hope it would be truly async. I suppose marshalling in the resultant array still costs a lot of time?
     
  28. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    That's a one big piece of memory. 4096*4096*float4. That's more than 250 MB. Lets break it down. There's a DMA transfer from GPU memory to system memory, there's a copy to temp system memory for thread synchronization and there's a final copy to managed memory. Possibly with marshalling but I think there is no additional work done for simple float array? So, that is more than 750 MB transferred (more if marshalling isn't free). I guess it just can't be done any faster.
    I mean it is perfectly possible there is a bug or unnecessary bottleneck somewhere in my code. I never profiled it (or truly used it). But it wouldn't surprise me if your use-case was little too much. Original use-case was getting ocean height field for physics simulation. 512x512x4. 1 MB of data...
     
  29. strich

    strich

    Joined:
    Aug 14, 2012
    Posts:
    374
    It is unlikely you're wrong! My testcase is just that - I intend to request much less from the GPU in production but my lazy inefficient tests proved out these results. I was curious to know whether anything was wrong. However it sounds like in your expert opinion those kinds of delays are about correct.
     
  30. joergzdarsky

    joergzdarsky

    Joined:
    Sep 25, 2013
    Posts:
    56
    Hi Michal_

    having asynchronous GPU readback missing in Unity3D was such a pain while creating a GPU ComputeShader based procedural planet engine.
    I haven't tried adding your code into my prototype now, but will do so soon.

    THANK YOU SO MUCH FOR YOUR WORK Michael_ (and also those testing and supporting) !!!

    Really really appreciated, wherever you live I am for sure will be happy to spend a few or better many beers!!!
    :)
     
  31. onehand

    onehand

    Joined:
    Dec 3, 2015
    Posts:
    8
    Wow! Thanks so much @Michal_ for this.

    I tested this with my procedural terrain project but noticed memory leaks to the Unity process after each run, even if I release my buffers OnDestroy().

    Would adding a call to free() after the memcpy() operation in the RetrieveBufferData() method work? I generally have 8 chunks of terrain data active at any given time which amount to around 250MB total (similar to @strich above), so swapping these out asynchronously starts bringing my memory utilization up pretty high.
     
  32. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    Yeah, the memory is returned to the system when the plugin is unloaded. That doesn't happen when you enter/exit playmode, so it looks like a leak.

    I don't know if it would break something. I would have to take a look but it certainly wouldn't release DX staging buffer.

    I think it just needs some sort of "ReleaseTempMemory(Texture)" function to give user a choice. Sharing temp memory would be even better but that's complicated.
     
  33. strich

    strich

    Joined:
    Aug 14, 2012
    Posts:
    374
    So, does every request create new memory that is only freed on plugin unload?

    Another question I expect to run into soon enough is: So this is a static class, can one make multiple request calls to different computebuffer instances in the same frames? Or can this thing only handle single calls at a time at the moment?
     
  34. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    There's a temp memory for every unique texture/buffer you use. Multiple requests for one texture do not consume additional memory. And yes, you request data from multiple textures/buffers at the same time. That's why every texture/buffer instance has its own temp memory.
     
  35. strich

    strich

    Joined:
    Aug 14, 2012
    Posts:
    374
    Gotcha. I might make sense, as another poster noted, to expose a sort of `Release(ComputeBuffer buffer)` method, similar to what the ComputeBuffer itself provides.
     
  36. strich

    strich

    Joined:
    Aug 14, 2012
    Posts:
    374
    Is it possible to start a delayed rolling request every frame for a computebuffer? As it stands right one I get a response every couple of frames and only make a new request when the previous one has come in. Can one make a request every frame and get a response every frame, obviously still with the couple frames delay?
     
  37. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    That is not supported at the moment. It would require more involved logic and more intermediate buffers to avoid potential data collision.
    What you can do right now is to rotate multiple buffers. Use buffer A on 1st frame, buffer B on 2nd frame, buffer A again on 3rd frame and so on. You'll probably need more than 2 buffers. You will have to experiment here.
     
  38. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    Added option to release temp buffers. AsyncTextureReader.ReleaseTempResources(Texture/Buffer). I didn't test it much so let me know if it breaks something.
     
    joergzdarsky likes this.
  39. joergzdarsky

    joergzdarsky

    Joined:
    Sep 25, 2013
    Posts:
    56
    Will test it this weekend, thank you for the update Michal_ !!
     
  40. onehand

    onehand

    Joined:
    Dec 3, 2015
    Posts:
    8
    @Michal_ The ReleaseTempResources() method works wonderfully. Thanks so much for this fantastic plugin.
     
    joergzdarsky and Michal_ like this.
  41. Hacaw

    Hacaw

    Joined:
    Jul 3, 2014
    Posts:
    1
    Hello guys, I've been reading this thread for a while and I have a similar problem.
    I try to take a screenshot in samsung gear vr from Unity and I'm currently using this piece of code.It's working 100% for taking screenshots, but each screenshot takes at least 1sec to complete and blocks the whole application and I need to take 3 screenshots of an object.
    Can I use @Michal_ plug-in to make my function async so it doesn't stall the GPU, I want to show at least a loading screen while the screenshot is being processed?
    P.S: Application.CaptureScreenshot doesn't work in Samsung Gear VR.

    Code (CSharp):
    1.  private IEnumerator SaveScreenshot_RenderToTexAsynch(string filePath) {
    2.         //Wait for graphics to render
    3.         yield return new WaitForEndOfFrame();
    4.  
    5.         RenderTexture rt = new RenderTexture(Screen.width, Screen.height, 24);
    6.         Texture2D screenShot = new Texture2D(Screen.width, Screen.height, TextureFormat.RGB24, false);
    7.  
    8.         Camera.main.targetTexture = rt;
    9.         Camera.main.Render();
    10.  
    11.         RenderTexture.active = rt;
    12.         screenShot.ReadPixels(new Rect(0, 0, Screen.width, Screen.height), 0, 0);
    13.         Camera.main.targetTexture = null;
    14.         RenderTexture.active = null; //Added to avoid errors
    15.         Destroy(rt);
    16.  
    17.         yield return 0;
    18.  
    19.         byte[] bytes = screenShot.EncodeToPNG();
    20.         File.WriteAllBytes(filePath, bytes);
    21.     }
     
    Last edited: Apr 24, 2017
  42. Epitaque

    Epitaque

    Joined:
    Jul 28, 2015
    Posts:
    8
    If you ever get the time @Michal_ I would really appreciate it if you added support for 3D textures. This would allow me to generate 3D noise on the GPU for marching cubes.
     
  43. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    I added support for Texture3D in #76 above, it may work for you -- https://github.com/digitalsanity/AsyncTextureReader

    It's been a while and I don't remember the details at the moment, but there were some other issues/limitations with RWTexture3D and I think I ended up using a striped array in the shader and code. I think I would have needed an intermediate shader/step to copy data to slices anyway.
     
    Zolden likes this.
  44. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    Sorry guys, I didn't check the forum for couple months. Is there still a demand for 3d textures?
     
  45. 00jknight

    00jknight

    Joined:
    Mar 28, 2014
    Posts:
    34
    I've just been writing my GPU algorithms to keep as much data as possible on the GPU. I render with DrawMeshInstancedIndirect and calculate the MVP matrices on the GPU using data computed in compute shaders. Then theres no GetData call at all.
     
  46. unitydevist

    unitydevist

    Joined:
    Feb 3, 2009
    Posts:
    45
    Thank you for making this Michael, this answers a lot of questions I've had about getting texture data out of Unity quickly. I'm also interested in using this to capture RenderTextures or camera RGB as well as depth textures. I don't see support for that and swapping the DebugTexture with a render texture assigned to the camera results in errors when I tried it. Is there a way to make this work?
     
  47. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    I'm not sure I understand. You should be able to render your camera into your render texture and then retrieve the data. As long as you don't use MSAA or some unsupported texture format you should be fine.
    What exactly are you trying to do? Maybe give me a little code snippet...
     
    unitydevist likes this.
  48. R_RT

    R_RT

    Joined:
    Jan 22, 2017
    Posts:
    7
    What he probably wants is to:

    1. Create a new rendertexture.
    2. Set that rendertexture in your example "test"-project as "DebugTexture" in the "Test"-script. And as well set that same rendertexture in the "Target Texture" slot of the camera.

    Then probably iterate over the texture date every frame to retrieve the RGB and depth texture data that the camera is rendering. Then save the textures or plug them back into unity (which would be really cool - could not get it to work for me either by now).

    This results in the request status error: "Error_InvalidArguments", until the camera target texture is set to none again.
     
  49. Michal_

    Michal_

    Joined:
    Jan 14, 2015
    Posts:
    365
    Ok. I'll try that when I have time. You could probably copy camera texture to a different render texture in OnRenderImage as a workaround. Copy it to second render texture and then try to retrieve the data...
     
  50. R_RT

    R_RT

    Joined:
    Jan 22, 2017
    Posts:
    7
    Got it working without errors: The actual point in time within the renderloop for calling "RequestTextureData" seems important. Calling it in start for the first time did not work for me.

    Now I call it the first time (and only once) in OnPostRender, before calling "RetrieveTextureData". Thereafter I call "RequestTextureData" directly after the retrieve command every time. Don't know why, but this works for me.
    But since we get back RGBA values, I am still stuck on how to efficiently retrieve the depth texture values.

    What I do now is to use a shader and get the depth data to the RGB slots within another texture to the read it. This seems to work quite good, but surely is not the most efficient way.