Search Unity

HDRP rendering with Vulkan API crashes on Ubuntu

Discussion in 'Linux' started by deemolover, May 8, 2022.

  1. deemolover

    deemolover

    Joined:
    Jan 17, 2020
    Posts:
    5
    Hi I think I've run into some issue when running an HDRP environment on Linux for ML-agents training. The version info is as follows:
    • The HDRP version is 10.2.2 and the ML-agents version is 2.0.1;
    • The environment is built with Unity 2020.3.10f1 on Windows 10 targeted at Linux;
    • The environment is used for RL training on Ubuntu 16.04;
    • The command
      /usr/bin/vulkaninfo
      gives Vulkan version 1.1.73.
    The environment crashes with SIGSEGV occationally during training, and it seems to alleviate when we relieve the computation pressure (for example, let agent take one action repeatedly instead of moving around and collide or see a lot of objects). We use visual observations so it's not a headless build and rendering is a must.

    Here are some key traces from Player.log (sorry I don't know how to get correct code highlighting). I also attached the complete log below:

    Code (csharp):
    1. =================================================================
    2. Got a SIGSEGV while executing native code. This usually indicates
    3. a fatal error in the mono runtime or one of the native libraries
    4. used by your application.
    5. =================================================================
    6.  
    7. Caught fatal signal - signo:11 code:1 errno:0 addr:0x4
    8. Obtained 18 stack frames.
    9. #0  0x007f4ca20ee390 in funlockfile
    10. #1  0x007f4ca1e82acc in __nss_passwd_lookup
    11. #2  0x007f4ca39be147 in dense_hashtable<std::pair<vk::DescriptorSetKey const, VkDescriptorSet_T*>, vk::DescriptorSetKey, core::hash<vk::DescriptorSetKey>, GfxDoubleCache<vk::DescriptorSetKey, VkDescriptorSet_T*, core::hash<vk::DescriptorSetKey>, vk::DescKeyEqualTo, GfxDoubleCacheConcurrencyPolicy::LocklessGet, vk::DescKeyEmptyDeleteGenerator, (MemLabelIdentifier)28>::SelectKey, vk::DescKeyEqualTo, stl_allocator<std::pair<vk::DescriptorSetKey const, VkDescriptorSet_T*>, (MemLabelIdentifier)28, 16> >::find_position_with_hash(vk::DescriptorSetKey const&, unsigned long) const
    12. #3  0x007f4ca39b5701 in vk::DescriptorSetLayout::GetDescriptorSet(vk::DescriptorSetUpdater const&, VkDescriptorUpdateTemplate_T*, vk::DescriptorSetKey const&)
    13. #4  0x007f4ca39911fd in vk::DescriptorState::Bind(VkDevice_T*, vk::CommandBuffer*, vk::GpuProgramBase&)
    14. #5  0x007f4ca395b336 in GfxDeviceVKBase::DrawBuffersBatchMode(GfxBatchHeader const&)
    15. #6  0x007f4ca3f57eaf in ScriptableBatchRenderer::RenderMultipleMeshes(RenderNodeQueue const&, RenderMultipleData const&, VertexInputMasks)
    16. #7  0x007f4ca3f59acf in ScriptableBatchRenderer::Flush(BatchBreakCause)
    17. #8  0x007f4ca3f6a49b in ScriptableRenderLoopDrawSRPBatcher(ScriptableLoopObjectData const*, unsigned long, SharedRendererScene const&, DrawRenderersCommand const&, ShaderPassContext&, GfxDevice&, int)
    18. #9  0x007f4ca3f6c078 in ScriptableRenderLoopDrawDispatch(ScriptableLoopObjectData const*, unsigned long, SharedRendererScene const&, DrawRenderersCommand const&, ShaderPassContext&, GfxDevice&, int)
    19. #10 0x007f4ca3f723f0 in ScriptableRenderLoopJob(GfxDeviceAsyncCommand::ArgScratch*, GfxDeviceAsyncCommand::Arg const*)
    20. #11 0x007f4ca396ca64 in GfxDeviceVKBase::ExecuteAsyncSetup(GfxDeviceAsyncCommand*)
    21. #12 0x007f4ca3f90a01 in JobQueue::Exec(JobInfo*, long long, int, bool)
    22. #13 0x007f4ca3f91085 in JobQueue::ProcessJobs(JobQueue::ThreadInfo*, void*)
    23. #14 0x007f4ca3f90142 in JobQueue::WorkLoop(void*)
    24. #15 0x007f4ca4203ed8 in Thread::RunThreadWrapper(void*)
    25. #16 0x007f4ca20e46ba in start_thread
    26. #17 0x007f4ca1e1a51d in clone
    I'm not familiar with rendering stuff and let alone Vulkan, so I'm not sure if it's the right way to post. Also happy with providing further debug info if required. Thanks for any idea on this issue!
     

    Attached Files:

  2. florianpenzkofer

    florianpenzkofer

    Unity Technologies

    Joined:
    Sep 2, 2014
    Posts:
    479
    A crash with that callstack was fixed a while ago. Please try upgrading to a newer version of Unity 2020.3.

    If I remember correctly then disabling Graphics Jobs should also avoid this crash.
     
    deemolover likes this.
  3. nrader95

    nrader95

    Joined:
    Aug 19, 2019
    Posts:
    24
    What about out-of-video-memory situations?
    HDRP on Linux is silently crashes in such cases, is it gonna be ever fixed?
     
  4. deemolover

    deemolover

    Joined:
    Jan 17, 2020
    Posts:
    5
    Thanks for the information! We tested disabling Graphics Jobs in player settings with 2020.3.10f1 but that didn't work, so we updated to 2020.3.34f1 and kept the Graphics Jobs enabled. So far we haven't encountered any issue and it looks pretty stable.
     
  5. florianpenzkofer

    florianpenzkofer

    Unity Technologies

    Joined:
    Sep 2, 2014
    Posts:
    479
    We made improvements GPU memory handling (cases like https://issuetracker.unity3d.com/is...g-shader-graph-sample-project-with-vulkan-api).
    Improvements were mainly to fall back to host memory when there is no more free GPU memory, throttle on resource larger uploads etc, trimming scratch memory buffer etc.
    So far we do not move resourced between GPU and host memory when GPU memory is freed up.