Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Bug Setting Torch Device results in runtime error

Discussion in 'ML-Agents' started by ChillX, Feb 2, 2022.

  1. ChillX

    ChillX

    Joined:
    Jun 16, 2016
    Posts:
    145
    Command Line arguments:
    --torch-device="cuda:1"

    YAML config file settings:
    torch_settings:
    device: cuda:1


    Python stack trace:

    Traceback (most recent call last):
    File "d:\cxmlunity\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 176, in start_learning
    n_steps = self.advance(env_manager)
    File "d:\cxmlunity\ml-agents\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
    File "d:\cxmlunity\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 234, in advance
    new_step_infos = env_manager.get_steps()
    File "d:\cxmlunity\ml-agents\ml-agents\mlagents\trainers\env_manager.py", line 124, in get_steps
    new_step_infos = self._step()
    File "d:\cxmlunity\ml-agents\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 408, in _step
    self._queue_steps()
    File "d:\cxmlunity\ml-agents\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 302, in _queue_steps
    env_action_info = self._take_step(env_worker.previous_step)
    File "d:\cxmlunity\ml-agents\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
    File "d:\cxmlunity\ml-agents\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 544, in _take_step
    step_tuple[0], last_step.worker_id
    File "d:\cxmlunity\ml-agents\ml-agents\mlagents\trainers\policy\torch_policy.py", line 204, in get_action
    run_out = self.evaluate(decision_requests, global_agent_ids)
    File "d:\cxmlunity\ml-agents\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
    File "d:\cxmlunity\ml-agents\ml-agents\mlagents\trainers\policy\torch_policy.py", line 171, in evaluate
    tensor_obs, masks=masks, memories=memories
    File "d:\cxmlunity\ml-agents\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
    File "d:\cxmlunity\ml-agents\ml-agents\mlagents\trainers\policy\torch_policy.py", line 133, in sample_actions
    obs, masks, memories, seq_len
    File "d:\cxmlunity\ml-agents\ml-agents\mlagents\trainers\torch\networks.py", line 639, in get_action_and_stats
    inputs, memories=memories, sequence_length=sequence_length
    File "C:\Users\username\.conda\envs\UnityML\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "d:\cxmlunity\ml-agents\ml-agents\mlagents\trainers\torch\networks.py", line 245, in forward
    encoding = self._body_endoder(encoded_self)
    File "C:\Users\username\.conda\envs\UnityML\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "d:\cxmlunity\ml-agents\ml-agents\mlagents\trainers\torch\layers.py", line 169, in forward
    return self.seq_layers(input_tensor)
    File "C:\Users\username\.conda\envs\UnityML\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "C:\Users\username\.conda\envs\UnityML\lib\site-packages\torch\nn\modules\container.py", line 117, in forward
    input = module(input)
    File "C:\Users\username\.conda\envs\UnityML\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "C:\Users\username\.conda\envs\UnityML\lib\site-packages\torch\nn\modules\linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
    File "C:\Users\username\.conda\envs\UnityML\lib\site-packages\torch\nn\functional.py", line 1690, in linear
    ret = torch.addmm(bias, input, weight.t())
    RuntimeError: Expected tensor for 'out' to have the same device as tensor for argument #2 'mat1'; but device 1 does not equal 0 (while checking arguments for addmm)
     
  2. ChillX

    ChillX

    Joined:
    Jun 16, 2016
    Posts:
    145
    The workaround I'm using is to edit ml-agents\mlagents\torch_utils\torch.py
    and hardcode the torch device using the following edits:

    Comment out:
    _device = torch.device("cpu")

    Replace with:
    _device = torch.device("cuda:1")
    torch.cuda.set_device(1)

    in function set_torch_config
    Hardcode cuda device below the if statement which checks torch_settings.device
    device_str = "cuda:1"
    _device = torch.device(device_str)

    in dev default_device()
    hardcode cuda device
    # return _device
    return torch.device("cuda:1")

    With these three edits the cuda device is now hardcoded to cuda:1 with the exception of threaded mode.