Search Unity

Question Aborted (core dumped) while training the 3DBall example.

Discussion in 'ML-Agents' started by SkeVinci, May 8, 2022.

  1. SkeVinci

    SkeVinci

    Joined:
    May 8, 2022
    Posts:
    6
    I have just started to use ml-agents and when I used the command
    mlagents-learn config/ppo/3DBall.yaml --run-id=3DBallTest --force
    to train the 3DBall example and pressed the Play button, it seemed well for just a moment and then it stuck. I have to terminate my unity editor from the task manager and restart the computer. Otherwise, that moment won't happen even though I just repeat the above operation.
    I wonder how I can solve this problem and train the 3DBall successfully. Thanks!
    Here is the information in the terminal.

    [INFO] 3DBall. Step: 12000. Time Elapsed: 21.158 s. Mean Reward: 1.206. Std of Reward: 0.743. Training.
    terminate called after throwing an instance of 'c10::Error'
    what(): HIP error: hipErrorNoDevice
    Exception raised from deviceCount at /pytorch/aten/src/ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h:98 (most recent call first):
    frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fd7f6991d12 in /home/skevinci/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
    frame #1: <unknown function> + 0x57d4f1 (0x7fd7f738c4f1 in /home/skevinci/.local/lib/python3.8/site-packages/torch/lib/libtorch_hip.so)
    frame #2: torch::autograd::Engine::start_device_threads() + 0x442 (0x7fd828a1f252 in /home/skevinci/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
    frame #3: <unknown function> + 0x124df (0x7fd865e3f4df in /lib/x86_64-linux-gnu/libpthread.so.0)
    frame #4: torch::autograd::Engine::initialize_device_threads_pool() + 0xd5 (0x7fd828a1c785 in /home/skevinci/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
    frame #5: torch::autograd::Engine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>) + 0x2f (0x7fd828a24faf in /home/skevinci/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
    frame #6: torch::autograd::python::pythonEngine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>) + 0x3c (0x7fd837328ddc in /home/skevinci/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
    frame #7: torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0xacd (0x7fd828a2446d in /home/skevinci/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
    frame #8: torch::autograd::python::pythonEngine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0x4e (0x7fd837328bde in /home/skevinci/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
    frame #9: THPEngine_run_backward(THPEngine*, _object*, _object*) + 0xe3f (0x7fd837329caf in /home/skevinci/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
    frame #10: PyCFunction_Call + 0x59 (0x5f5e79 in /usr/bin/python3)
    frame #11: _PyObject_MakeTpCall + 0x296 (0x5f6a46 in /usr/bin/python3)
    frame #12: _PyEval_EvalFrameDefault + 0x623d (0x570f1d in /usr/bin/python3)
    frame #13: _PyEval_EvalCodeWithName + 0x26a (0x5696da in /usr/bin/python3)
    frame #14: _PyFunction_Vectorcall + 0x393 (0x5f6403 in /usr/bin/python3)
    frame #15: _PyEval_EvalFrameDefault + 0x5706 (0x5703e6 in /usr/bin/python3)
    frame #16: _PyEval_EvalCodeWithName + 0x26a (0x5696da in /usr/bin/python3)
    frame #17: _PyFunction_Vectorcall + 0x393 (0x5f6403 in /usr/bin/python3)
    frame #18: _PyEval_EvalFrameDefault + 0x900 (0x56b5e0 in /usr/bin/python3)
    frame #19: _PyEval_EvalCodeWithName + 0x26a (0x5696da in /usr/bin/python3)
    frame #20: _PyFunction_Vectorcall + 0x393 (0x5f6403 in /usr/bin/python3)
    frame #21: PyObject_Call + 0x62 (0x5f55f2 in /usr/bin/python3)
    frame #22: _PyEval_EvalFrameDefault + 0x1f1b (0x56cbfb in /usr/bin/python3)
    frame #23: _PyEval_EvalCodeWithName + 0x26a (0x5696da in /usr/bin/python3)
    frame #24: _PyFunction_Vectorcall + 0x393 (0x5f6403 in /usr/bin/python3)
    frame #25: _PyEval_EvalFrameDefault + 0x900 (0x56b5e0 in /usr/bin/python3)
    frame #26: _PyFunction_Vectorcall + 0x1b6 (0x5f6226 in /usr/bin/python3)
    frame #27: _PyEval_EvalFrameDefault + 0x900 (0x56b5e0 in /usr/bin/python3)
    frame #28: _PyFunction_Vectorcall + 0x1b6 (0x5f6226 in /usr/bin/python3)
    frame #29: _PyEval_EvalFrameDefault + 0x900 (0x56b5e0 in /usr/bin/python3)
    frame #30: _PyFunction_Vectorcall + 0x1b6 (0x5f6226 in /usr/bin/python3)
    frame #31: PyObject_Call + 0x62 (0x5f55f2 in /usr/bin/python3)
    frame #32: _PyEval_EvalFrameDefault + 0x1f1b (0x56cbfb in /usr/bin/python3)
    frame #33: _PyEval_EvalCodeWithName + 0x26a (0x5696da in /usr/bin/python3)
    frame #34: _PyFunction_Vectorcall + 0x393 (0x5f6403 in /usr/bin/python3)
    frame #35: _PyEval_EvalFrameDefault + 0x900 (0x56b5e0 in /usr/bin/python3)
    frame #36: _PyFunction_Vectorcall + 0x1b6 (0x5f6226 in /usr/bin/python3)
    frame #37: PyObject_Call + 0x62 (0x5f55f2 in /usr/bin/python3)
    frame #38: _PyEval_EvalFrameDefault + 0x1f1b (0x56cbfb in /usr/bin/python3)
    frame #39: _PyEval_EvalCodeWithName + 0x26a (0x5696da in /usr/bin/python3)
    frame #40: _PyFunction_Vectorcall + 0x393 (0x5f6403 in /usr/bin/python3)
    frame #41: _PyEval_EvalFrameDefault + 0x900 (0x56b5e0 in /usr/bin/python3)
    frame #42: _PyFunction_Vectorcall + 0x1b6 (0x5f6226 in /usr/bin/python3)
    frame #43: _PyEval_EvalFrameDefault + 0x71e (0x56b3fe in /usr/bin/python3)
    frame #44: _PyFunction_Vectorcall + 0x1b6 (0x5f6226 in /usr/bin/python3)
    frame #45: _PyEval_EvalFrameDefault + 0x71e (0x56b3fe in /usr/bin/python3)
    frame #46: _PyFunction_Vectorcall + 0x1b6 (0x5f6226 in /usr/bin/python3)
    frame #47: _PyEval_EvalFrameDefault + 0x71e (0x56b3fe in /usr/bin/python3)
    frame #48: _PyEval_EvalCodeWithName + 0x26a (0x5696da in /usr/bin/python3)
    frame #49: PyEval_EvalCode + 0x27 (0x68db17 in /usr/bin/python3)
    frame #50: /usr/bin/python3() [0x67eeb1]
    frame #51: /usr/bin/python3() [0x67ef2f]
    frame #52: /usr/bin/python3() [0x67efd1]
    frame #53: PyRun_SimpleFileExFlags + 0x197 (0x67f377 in /usr/bin/python3)
    frame #54: Py_RunMain + 0x212 (0x6b7902 in /usr/bin/python3)
    frame #55: Py_BytesMain + 0x2d (0x6b7c8d in /usr/bin/python3)
    frame #56: __libc_start_main + 0xf3 (0x7fd865e77083 in /lib/x86_64-linux-gnu/libc.so.6)
    frame #57: _start + 0x2e (0x5fb12e in /usr/bin/python3)
    Aborted (core dumped)
     
    redrover67 likes this.
  2. korzen303

    korzen303

    Joined:
    Oct 2, 2012
    Posts:
    223
    Hi @SkeVinci , any luck with this. I have got exactly same problem on Ubuntu 22.04. Cheers