Search Unity

[Windows] Issue with Nvidia NVS 810 and multiple displays

Discussion in 'Windows' started by Tmpsrm, Feb 19, 2019.

  1. Tmpsrm

    Tmpsrm

    Joined:
    Feb 19, 2019
    Posts:
    9
    Hello!

    I am seeing an issue with an Nvidia 810 GPU when using multiple displays. That card is made out of 2 separate GPUs and both are getting maxed out with the first output and have further slowdowns from each additional display.

    I need to be able to output to 6 (+1) different screens, and it seems a bug in Unity is preventing me from doing so. I believe each screen is waiting from the previous one, causing some kind of feedback loop of
    PostLateUpdate.PresentAfterDraw
    and slowing down the game to a crawl.

    Here are some info from the various debugging step I tried:

    • Scene contents do not matter. An empty scene and a complex one have the same results.
    • PostLateUpdate.PresentAfterDraw/Graphic.PresentAndSync/Device.Present
      represents 99.5% of rendering time and seemingly is the issue.
    Hardware:
    • GPU is up to date, tried multiple versions of the drivers
    • Tried using different monitors and cables, no change.
    • Slowness is directly impacted by the number of screens plugged in
    • Issue does not happen on other GPUs I got to try.
    • There is another graphics card plugged in to handle a 7th monitor, that handles the main rendering. I do not know how to change that for testing.
    Software
    • I tried enabling/disabling V-sync in both Unity and the Nvidia control Panel. No change.
    • I tried both in editor and compiled build, development build and not. No change.
    • Rendering multiple displays in the same output is fine, but as soon as the "Game" window is dragged to anotehr display, the slowdowns begin (main display is on another GPU, and does not have the same issue as the NVS 810)
    • Disabling cameras on the additional screens impacts slowness as well.
    • No difference between Forward and Deferred rendering paths.
    • This is with the latest version of Unity on Windows 10.
    I can run additional tests if needed. This is a very important issue and sadly I do not have control over hardware, so another GPU is not an option. I would really appreciate some help to figure this out.

    Thanks!
     
  2. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,678
    Are they regular monitors or some kind of special displays? What does the GPU usage look like in Unity Profiler or task manager? Are the monitors plugged in directly into the GPU? How are you making it draw on multiple displays?
     
    Tmpsrm likes this.
  3. Tmpsrm

    Tmpsrm

    Joined:
    Feb 19, 2019
    Posts:
    9
    Hi Tautvydas-Zilys, thanks for the quick reply!

    Are they regular monitors or some kind of special displays?

    Special displays I guess. They are in fact splitters that are emulating 1080p displays. Note that the issue does not exist in other applications not built in unity, and the displays are set up to request 30FPS.

    What does the GPU usage look like in Unity Profiler or task manager?

    100% in all of them, and same in the Nvidia profiler. The Unity GPU profiler show 99.5% of rendering time dedicated to
    Device.Present
    and clocks in at about 100ms per frame for an empty scene.

    Are the monitors plugged in directly into the GPU?

    The splitters are plugged in directly in the GPU using Nvidia-provided mini display port to HDMI adapters.

    How are you making it draw on multiple displays?

    Multiple cameras and a script that enables additional displays.


    Thanks!
     
  4. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,678
    How many monitors do you reach before the GPU usage gets to 100%? What's the frame rate at that point?

    Does the same thing happen if you set background clear to solid color instead of skybox and use an empty scene?

    Could you open the player log and paste the snippet about GfxDevice? It should look something like this:

    Code (csharp):
    1. GfxDevice: creating device client; threaded=1
    2. Direct3D:
    3.     Version:  Direct3D 11.0 [level 11.1]
    4.     Renderer: NVIDIA GeForce GTX 1060 6GB (ID=0x1c03)
    5.     Vendor:  
    6.     VRAM:     6061 MB
    7.     Driver:   23.21.13.8813
     
  5. Tmpsrm

    Tmpsrm

    Joined:
    Feb 19, 2019
    Posts:
    9
    Hi Tautvydas,

    Sorry about the late answer, I had some issues accessing the computer. Here are the answers to your questions:

    How many monitors do you reach before the GPU usage gets to 100%? What's the frame rate at that point?

    First monitor spikes the first GPU to 100 instantly. If I move the output to a screen plugged in to the second GPU (it's a single card with 2 gpus that have individual outputs), that one spikes instead. Unity reports about 30 FPS.


    Does the same thing happen if you set background clear to solid color instead of skybox and use an empty scene?

    Yes, I am already using solid color. Same thing happens for skybox, and even don't clear.

    Here's the player log:

    Code (CSharp):
    1. GfxDevice: creating device client; threaded=1
    2. Direct3D:
    3.     Version:  Direct3D 11.0 [level 11.1]
    4.     Renderer: NVIDIA Quadro P4000 (ID=0x1bb1)
    5.     Vendor:  
    6.     VRAM:     8089 MB
    7.     Driver:   25.21.14.1881
    Thank you once again for your help; I'll have access to the computer for the next couple hours so I'll be able to try thing or give you more info if needed.
     
  6. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,678
    Does the computer also have Quadro P4000 in it in addition to NVS 810 plugged into its PCIe slots?
     
  7. Tmpsrm

    Tmpsrm

    Joined:
    Feb 19, 2019
    Posts:
    9
    Yes, this is the other graphics card I mentioned. The NVS810 is there to add outputs to the Quadro, which (theorically) handles rendering. That's what Nvidia suggested me to do.
     
  8. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,678
    And which GPU reaches 100%? Is it the quadro or the NVS810?
     
  9. Tmpsrm

    Tmpsrm

    Joined:
    Feb 19, 2019
    Posts:
    9
    The NVS. The Quadro barely any use. Since the NVS has two GPUs in one card, only the GPU that I output to using Unity get maxed out.

    This happens when using the MultiDisplay options and also when just dragging a Game window to that screen in the editor.
     
  10. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,678
    It almost sounds like the transferring frames from the Quadro GPU to NVS GPU is the bottleneck. Windows Task Manager actually tells you which GPU engine is busy - do you know which one it is?

    What happens if you plug the displays into the Quadro (I believe it still has at least 3 outputs, right?) - does the lag go away?
     
  11. Tmpsrm

    Tmpsrm

    Joined:
    Feb 19, 2019
    Posts:
    9
    Here's what the task manager looks like, with one game window in one of the NVS screens. The Quadro also has a game window and the editor and is fine.



    What happens if you plug the displays into the Quadro (I believe it still has at least 3 outputs, right?) - does the lag go away?

    Unfortunately I cannot test that as I don't have access to the machine's plugs. However, from the testing I was able to do before I'm fairly confident that the lag would go away. That being said I do need 7 outputs and cannot change the current configuration of the machine.

    Thanks!
     
  12. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,678
    There is a command line argument that controls which GPU the rendering takes place on ("-gpu"). Can you try it with various numbers? Like "-gpu 0", "-gpu 1", "-gpu 2", "-gpu 3" just to see if it makes any difference?

    Could you also capture a Windows Performance Recorder trace with "CPU usage" and "GPU activity" checkboxes checked? Send that to me via private message.

    Lastly, could you try any other 3d app or game see if it's suffering from the same issue?
     
  13. Tmpsrm

    Tmpsrm

    Joined:
    Feb 19, 2019
    Posts:
    9
    Interesting! Here's the graph with GPU 1 (the Quadro):



    Here is with GPU 2:



    There is an interesting drop in usage on GPU 2! Not sure what that means.

    I tested with another 3d application, the same issue does not happen. GPU usage for the NVS raises only by a small amount on the card that's being used, as expected.

    I also sent you the report privately. Thanks!
     
  14. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,678
    Oh my. Well, some things are super weird. Here's how your frame looks like on the GPU:
    upload_2019-2-22_11-33-21.png

    Light green is Unity. Somehow, there's one draw call that takes 12 ms on the GPU. And then there's a blit operation that takes 9.5 ms. To be honest, I have no idea what it's drawing there.

    Next is Dark Green, which is DWM. One of its draw calls takes 6 ms.

    Finally, the magenta one is TeamViewer. It takes 10 ms of GPU time _every frame_.

    This is puzzling. I bet that teamviewer has something to do with it, but I can't be certain. Is there any possibility you can do the profiling on the machine without remoting in?

    Lastly, I would try using something like Nvidia Nsight or PIX to look at what's is actually using GPU resources. It should give much more detailed information than Windows Performance Analyzer in this case.
     
  15. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,678
    Hey I just noticed this! Are you testing in the editor? That could also be the issue. You should always check performance in Standalone player.
     
  16. Tmpsrm

    Tmpsrm

    Joined:
    Feb 19, 2019
    Posts:
    9
    Hey there! I have tested without team viewer and got the same issue. I just happen to not have access to the device right now but the results are consistent with when I was testing hands on.

    I have been testing both in Unity and outside, the sample I sent you was running the app standalone, but maybe something went wrong? It should not say Unity.exe as it was not running at the time. I'll try getting another fresh recording and look into Nsight or Pix.
     
  17. sergiobd

    sergiobd

    Joined:
    May 14, 2014
    Posts:
    37
    Hi Tmpsrm and Tautvydas-Zilys.

    I'm running into very similar issues. I have 6 monitors plugged in to a NVS810, and an empty scene gives me 99% GPU usage. Except that in my case I have also memory almost reaching 100%.
    Were you guys able to solve this? Please help!
     
  18. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,678
    Is Unity using up the GPU? What is using up the memory? Are there other GPUs present on the machine? If so, which one does Unity use for rendering?
     
  19. sergiobd

    sergiobd

    Joined:
    May 14, 2014
    Posts:
    37
    Yes Tautvydas, my app is using the GPU. NVS 810 actually has 2 stacked GPUs, so, maybe, as you say, the frame is rendered in one GPU and copied to the other one, causing some performance issues. I will try to use 4 monitors (only 1 GPU) and then I'll get back to you. Thanks for your super quick response.
     
  20. sergiobd

    sergiobd

    Joined:
    May 14, 2014
    Posts:
    37
    Actually the most critical problem is the uneven framerate. The GPU profiler shows that RenderTexture.SetActive is the source of the problem, Even in and empty scene with no post-processing effects. Note that I am using HDPR.
    upload_2019-7-12_16-44-51.png
     
  21. Tmpsrm

    Tmpsrm

    Joined:
    Feb 19, 2019
    Posts:
    9
    Hey @sergiobd, were you able to find a solution? I'm gonna get back onto this project soon and I still haven't resolved it myself...

    Thanks!
     
  22. TommyNL

    TommyNL

    Joined:
    Oct 19, 2016
    Posts:
    2
    Hey all,
    Not sure if anyone is still working on it half a year later but I've been trying to find more information about it but there is hardly any of it.

    We've been trying to get 6 projectors (1920x1200) running for a project and figured out some stuff but we have questions as we're facing a project with 9x 4K projectors (I know wicked).

    We tested with a system containing a RTX 2070 and a 1050Ti for additional outputs (Nvidia has max 4 displays per card) and we noticed as well with unity Display.displays[x].Activate(); as soon as you activate a screen on the other display the performance drops dramatically and the 1050TI was at 100%
    So we got a new system with dual RTX 2070 super, thinking that 2 powerful card would do it without an issue but behold... same performance drop. (I almost quit my job over this).
    So I tested a few scenarios
    - 3 screens on 1 GPU on a basically empty project: ~1200 FPS
    - 2 screens on 1 GPU and 1 screen on GPU2 on a basically empty project: ~100FPS
    - 3 screens on 1 GPU running the deadwood demo: ~40FPS
    - 2 screens on 1 GPU and 1 screen on GPU2 running the deadwood demo: ~40FPS

    So it seems it wasn't simply GPU power as a heavy load did not suffer. So with GPU-Z tool running I saw a very particular thing. Once you activate a screen on the other GPU the Link-interface, a.k.a PCI-e bandwith was at a 100%
    Our motherboard was a X570 without SLI support meaning a 16x and 4x slot. I brought my personal rig with dual 8x slots and behold instead of ~100FPS I got ~170FPS... not great but better. For some reason unity likes pumping data back and forth but not utilizing the other GPU.

    Now one thing we didn't have the hardware for was a 16x + 16x slot or SLI (as we didn't had a NVlink) which would turn 2 GPU's into one (sort off) and maybe not have the same issue with PCI-e bandwith?

    But we did try something else. Instead of fullscreen windows, we went with one borderless fullscreen, Aligned all 6 screen in a nice 3x2 grid (doing this in the Nvidia control panel is best) and then start the application via a .bat script at 5760x2400 resolution (all 6 projectors) and behold ~1000FPS and no PCI-e bandwith limitation. We even returned one of the 2070 supers and used our spare 1050TI and still it has the ~1000FPS

    So what we've learned
    - don't use Display.displays[x].Activate(); over multiple GPU's
    - Unity only uses one GPU to render
    What I'd still like to test
    - Does SLI help with the two points mentioned, any difference between DX12 and Vulkan?
    - Would a dual 5700 XT with both 16x @ PCI-e 4th gen make a difference?

    So our next challenge is 9x 4K projectors. It will need at least 3 GPU's just for output but that massive resolution will be a bottleneck, especially if we're limited to one GPU having to do all the work.
    Anyone have some idea's?

    ---------------------------
    As a follow up test I activated DSR at 4x which means each projector is 3840x2400 and I'm running into issues again:
    Having 3 projectors (11520 x 2400) on 1 GPU, empty project, gives about 700FPS
    Dividing that onto 2 GPU's and it start to stutter, slipping back to ~25FPS about one a second

    If I go on all 6 projectors with but only a canvas of 11520 x 2500, so just a 100 pixels on the 3 projectors on the 2nd GPU, and it's down to ~22FPS. PCI-e bandwith is about 85-95%.
     
    Last edited: Jun 10, 2020
  23. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,678
    You're correct, Unity will only ever use one GPU for rendering (with the exception of SLI/CrossFire). If a monitor is hooked up to another GPU, Windows will behind the scenes use PCIe bus to transfer the image there every frame.

    It would be interesting to see if it behaves any better in 2020.2 alpha. We made some presentation changes there.
     
  24. TommyNL

    TommyNL

    Joined:
    Oct 19, 2016
    Posts:
    2
    But I find it very interesting that border less spanning is way less taxing than separate activated displays
    So I tried 2020.2.13 alpha today and I also tried DX12 and Vulkan instead of DX11 (using Unity 2019.3.0f)

    So same test as before with DSR on DX12:
    Having 3 projectors (11520 x 2400) on 1 GPU, empty project, gives about the same 700+FPS
    Dividing that onto 2 GPU's and it got better, it sometimes dropped more to 90 to 170 but way better that 25

    going on all 6 projectors with but only a canvas of 11520 x 2500, so just a 100 pixels on the 3 projectors also got better with staying around 170FPS and dipping to about 80 every now and then

    Going over to Vulcan with the 11520 x 2500 test and... ~650FPS solid...
    Doing the entire 11520 x 4800 and still pushing 330FPS (hot day so reached thermal limits)
    Keep in mind it's basically a blank canvas but it does show that it's not suffering from a lack of PCI-e bandwidth.

    Now that I know that Vulkan does something magical that helps, maybe this also works in Unity 2019. And would the separate activated displays under Vulkan create a bottleneck again, have to test it I guess.
    We're continuing to test, with real 4K screens instead of Nvidia DSR tricks, and a test with actual content. Maybe even add a 3rd graphics card for more pixels to push?
     
  25. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,678
    Unity doesn't treat multiple display differently on DX11 vs DX12 vs Vulkan... On DX11 and DX12, for instance, Unity just creates a DXGI Swapchain for each of the "displays", whereas if you just make single window bigger, it will use a single swapchain. I would guess the performance issues are caused/alleviated depending on what gfx driver does with that.