Search Unity

Need help with unity ppo reinforcement learning !!!

Discussion in 'ML-Agents' started by hiepg2, May 14, 2021.

  1. hiepg2

    hiepg2

    Joined:
    Dec 16, 2020
    Posts:
    7
    I'm learning about how ppo reinforcement learning works in training model. Specific is the HummingBird example from ImmersiveLimit from this link :
    https://learn.unity.com/course/ml-agents-hummingbirds?uv=2019.3

    Now i'm having some question, hope that someone can help me with these :
    1. How does the 3 script Flower, FlowerArea and HummingBirdAgent transfer data ? (input, output, processing data)
    2. How does the ppo reinforcement learning apply to this example ?

    3. After training with pytorch, I cant use my new .onxx files for agent, it said "is_continous_action" is missing, which not happen when the original .nn was used ?
     

    Attached Files:

  2. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    What version of ML-Agents package and Python are you using? You might be on an incompatible version combination - you can see the compatible versions on the main ML-Agents page.
     
  3. hiepg2

    hiepg2

    Joined:
    Dec 16, 2020
    Posts:
    7
    so i have to use the same version that trained model to re train that model ?
     
  4. hiepg2

    hiepg2

    Joined:
    Dec 16, 2020
    Posts:
    7
    i install latest mlagents and python 3.7 btw
     
  5. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    Yeah if you have the latest mlagents, you should use the latest C# package. There was a big version jump with the 2.0.0-pre3 package which makes it incompatible with prior versions of the Python code.
     
  6. hiepg2

    hiepg2

    Joined:
    Dec 16, 2020
    Posts:
    7
    thank you, i have another question : To build up a new model, you must have a dataset for the training . But i dont see any code lines or part of this learning process anywhere in unity (HummingBirdAgent script), or is it in Python ?

    I have done some diggin at mlagents API python package but havent found anything yet.
     
  7. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    49
    in RL your dataset happens on the fly as your agent interacts with the environment. it is different than supervised learning.
     
  8. hiepg2

    hiepg2

    Joined:
    Dec 16, 2020
    Posts:
    7
    can you show me the code that generate the dataset ? I'm desperate looking for it to finish my final exam. My teacher keep asking about it but i cant find it anywhere :(
     
  9. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    49
    there is no "dataset" you need to look at any of the ML agents youtube video tutorials to understand the difference, i think.
     
  10. hiepg2

    hiepg2

    Joined:
    Dec 16, 2020
    Posts:
    7
    i 've seen so many videos about ml tutorial and understand that there is no "dataset", but according to my teacher there must be somewhere in python API use things it collected (observations, rewards) to make the choice of many actions for agent. I haven't found the code line show that transfering process (observations, rewards -> actions) anywhere in python. I think it's kinda hard
     
  11. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    49
    are you trying to work from unity directly (C#?), maybe you havent looked at the python API (not gym wrapper).
    look at the docs > collab examples > example 01:

    for episode in range(3):
    env.reset()
    decision_steps, terminal_steps = env.get_steps(behavior_name)
    tracked_agent = -1 # -1 indicates not yet tracking
    done = False # For the tracked_agent
    episode_rewards = 0 # For the tracked_agent
    while not done:
    # Track the first agent we see if not tracking
    # Note : len(decision_steps) = [number of agents that requested a decision]
    if tracked_agent == -1 and len(decision_steps) >= 1:
    tracked_agent = decision_steps.agent_id[0]

    # Generate an action for all agents
    action = spec.action_spec.random_action(len(decision_steps))

    # Set the actions
    env.set_actions(behavior_name, action)

    # Move the simulation forward
    env.step()

    # Get the new simulation results
    decision_steps, terminal_steps = env.get_steps(behavior_name)
    if tracked_agent in decision_steps: # The agent requested a decision
    episode_rewards += decision_steps[tracked_agent].reward
    if tracked_agent in terminal_steps: # The agent terminated its episode
    episode_rewards += terminal_steps[tracked_agent].reward
    done = True
    print(f"Total rewards for episode {episode} is {episode_rewards}")


    here you see the actions are being chosen random and fed into the agent, at env.set_actions()

    and at each step you can check the observations through.

    decision_steps.obs

    for a more complicated example, thats example 02 with a custom neural network (i am also stuck/or slowly advancing at this step, i am currently trying to modify the custom network to output multiple outputs and train my own environment). let me know if this helped.
     
  12. hiepg2

    hiepg2

    Joined:
    Dec 16, 2020
    Posts:
    7
    thank you, i did look in the code of python API but in trainer.py and optimizer.py upload_2021-6-1_6-56-31.png but i didn't check the file you mention. (directory : ml-agents\mlagents\trainers\ppo)
     
  13. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    49
    pls also check your inbox i sent you a private message.