Search Unity

how to use example encironments in python?

Discussion in 'ML-Agents' started by plussun, Aug 10, 2020.

  1. plussun

    plussun

    Joined:
    Nov 22, 2019
    Posts:
    1
    hello!
    I am learning to use ML-Agents along with Pytorch to study RL algorithm.
    I follow the google Colab tutorial "ML-Agents Q-Learning with GridWorld", and the script works well.
    https://colab.research.google.com/drive/1nkOztXzU91MHEbuQ1T9GnynYdL_LRsHG#scrollTo=pbVXrmEsLXDt
    However, when i connect this code to my Unity Hub, only changed the code
    env = default_registry["GridWorld"].make()

    to
    env = UnityEnvironment(file_name=None)

    it could work at first few steps, then an error occur.
    like this:

    GridWorld environment created.
    Training step 1 reward -0.9999999776482582
    Training step 2 reward -0.7777777603930898
    Training step 3 reward -0.7777777579095628
    Training step 4 reward -0.9999999776482582
    Training step 5 reward -0.9999999776482582


    ---------------------------------------------------------------------------
    KeyError Traceback (most recent call last)
    <ipython-input-20-959c72c29382> in <module>
    34
    35 for n in range(NUM_TRAINING_STEPS):
    ---> 36 new_exp,_ = Trainer.generate_trajectories(env, qnet, NUM_NEW_EXP, epsilon=0.1)
    37 random.shuffle(experiences)
    38 if len(experiences) > BUFFER_SIZE:

    <ipython-input-19-34a59cce85fb> in generate_trajectories(env, q_net, buffer_size, epsilon)
    54 # Create its last experience (is last because the Agent terminated)
    55 last_experience = Experience(
    ---> 56 obs=dict_last_obs_from_agent[agent_id_terminated].copy(),
    57 reward=terminal_steps[agent_id_terminated].reward,
    58 done=not terminal_steps[agent_id_terminated].interrupted,

    KeyError: 1


    I find that at each step beginning, after env.reset() is called, env.get_steps() shouldn't return anything, but when using the example environment GridWorld, it could get steps even reset.
    I wonder it is because the example environment has a script make steps, but i cant find where to close it, make it a pure environment to be trained. Do anyone know how to use these example environment with python?

    here are some of the codes:
    Code (CSharp):
    1. from mlagents_envs.registry import default_registry
    2. import matplotlib.pyplot as plt
    3. from mlagents_envs.environment import UnityEnvironment
    4. import time
    5. %matplotlib inline
    6.  
    7. # Create the GridWorld Environment from the registry
    8. #env = default_registry["GridWorld"].make()
    9. env = UnityEnvironment(file_name=None)
    10. print("GridWorld environment created.")
    11.  
    12. # Create a new Q-Network.
    13. qnet = VisualQNetwork((64, 84, 3), 126, 5)
    14.  
    15. experiences: Buffer = []
    16. optim = torch.optim.Adam(qnet.parameters(), lr= 0.001)
    17.  
    18. cumulative_rewards: List[float] = []
    19.  
    20. # The number of training steps that will be performed
    21. NUM_TRAINING_STEPS = 70
    22. # The number of experiences to collect per training step
    23. NUM_NEW_EXP = 1000
    24. # The maximum size of the Buffer
    25. BUFFER_SIZE = 10000
    26.  
    27. for n in range(NUM_TRAINING_STEPS):
    28.   new_exp,_ = Trainer.generate_trajectories(env, qnet, NUM_NEW_EXP, epsilon=0.1)
    29.   random.shuffle(experiences)
    30.   if len(experiences) > BUFFER_SIZE:
    31.     experiences = experiences[:BUFFER_SIZE]
    32.   experiences.extend(new_exp)
    33.   Trainer.update_q_net(qnet, optim, experiences, 5)
    34.   _, rewards = Trainer.generate_trajectories(env, qnet, 100, epsilon=0)
    35.   cumulative_rewards.append(rewards)
    36.   print("Training step ", n+1, "\treward ", rewards)
    37.  
    38.  
    39.  
    40. env.close()
    41.  
    42. # Show the training graph
    43. plt.plot(range(NUM_TRAINING_STEPS), cumulative_rewards)
     
  2. anupambhatnagar

    anupambhatnagar

    Unity Technologies

    Joined:
    Jun 22, 2017
    Posts:
    4
    It is possible for an Agent to both be terminated and request a decision at the same time if it dies, then reset is called and then a request decision is called immediately.