Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

how to use example encironments in python?

Discussion in 'ML-Agents' started by plussun, Aug 10, 2020.

  1. plussun

    plussun

    Joined:
    Nov 22, 2019
    Posts:
    1
    hello!
    I am learning to use ML-Agents along with Pytorch to study RL algorithm.
    I follow the google Colab tutorial "ML-Agents Q-Learning with GridWorld", and the script works well.
    https://colab.research.google.com/drive/1nkOztXzU91MHEbuQ1T9GnynYdL_LRsHG#scrollTo=pbVXrmEsLXDt
    However, when i connect this code to my Unity Hub, only changed the code
    env = default_registry["GridWorld"].make()

    to
    env = UnityEnvironment(file_name=None)

    it could work at first few steps, then an error occur.
    like this:

    GridWorld environment created.
    Training step 1 reward -0.9999999776482582
    Training step 2 reward -0.7777777603930898
    Training step 3 reward -0.7777777579095628
    Training step 4 reward -0.9999999776482582
    Training step 5 reward -0.9999999776482582


    ---------------------------------------------------------------------------
    KeyError Traceback (most recent call last)
    <ipython-input-20-959c72c29382> in <module>
    34
    35 for n in range(NUM_TRAINING_STEPS):
    ---> 36 new_exp,_ = Trainer.generate_trajectories(env, qnet, NUM_NEW_EXP, epsilon=0.1)
    37 random.shuffle(experiences)
    38 if len(experiences) > BUFFER_SIZE:

    <ipython-input-19-34a59cce85fb> in generate_trajectories(env, q_net, buffer_size, epsilon)
    54 # Create its last experience (is last because the Agent terminated)
    55 last_experience = Experience(
    ---> 56 obs=dict_last_obs_from_agent[agent_id_terminated].copy(),
    57 reward=terminal_steps[agent_id_terminated].reward,
    58 done=not terminal_steps[agent_id_terminated].interrupted,

    KeyError: 1


    I find that at each step beginning, after env.reset() is called, env.get_steps() shouldn't return anything, but when using the example environment GridWorld, it could get steps even reset.
    I wonder it is because the example environment has a script make steps, but i cant find where to close it, make it a pure environment to be trained. Do anyone know how to use these example environment with python?

    here are some of the codes:
    Code (CSharp):
    1. from mlagents_envs.registry import default_registry
    2. import matplotlib.pyplot as plt
    3. from mlagents_envs.environment import UnityEnvironment
    4. import time
    5. %matplotlib inline
    6.  
    7. # Create the GridWorld Environment from the registry
    8. #env = default_registry["GridWorld"].make()
    9. env = UnityEnvironment(file_name=None)
    10. print("GridWorld environment created.")
    11.  
    12. # Create a new Q-Network.
    13. qnet = VisualQNetwork((64, 84, 3), 126, 5)
    14.  
    15. experiences: Buffer = []
    16. optim = torch.optim.Adam(qnet.parameters(), lr= 0.001)
    17.  
    18. cumulative_rewards: List[float] = []
    19.  
    20. # The number of training steps that will be performed
    21. NUM_TRAINING_STEPS = 70
    22. # The number of experiences to collect per training step
    23. NUM_NEW_EXP = 1000
    24. # The maximum size of the Buffer
    25. BUFFER_SIZE = 10000
    26.  
    27. for n in range(NUM_TRAINING_STEPS):
    28.   new_exp,_ = Trainer.generate_trajectories(env, qnet, NUM_NEW_EXP, epsilon=0.1)
    29.   random.shuffle(experiences)
    30.   if len(experiences) > BUFFER_SIZE:
    31.     experiences = experiences[:BUFFER_SIZE]
    32.   experiences.extend(new_exp)
    33.   Trainer.update_q_net(qnet, optim, experiences, 5)
    34.   _, rewards = Trainer.generate_trajectories(env, qnet, 100, epsilon=0)
    35.   cumulative_rewards.append(rewards)
    36.   print("Training step ", n+1, "\treward ", rewards)
    37.  
    38.  
    39.  
    40. env.close()
    41.  
    42. # Show the training graph
    43. plt.plot(range(NUM_TRAINING_STEPS), cumulative_rewards)
     
  2. anupambhatnagar

    anupambhatnagar

    Unity Technologies

    Joined:
    Jun 22, 2017
    Posts:
    4
    It is possible for an Agent to both be terminated and request a decision at the same time if it dies, then reset is called and then a request decision is called immediately.