Search Unity

  1. We are migrating the Unity Forums to Unity Discussions. On July 12, the Unity Forums will become read-only. On July 15, Unity Discussions will become read-only until July 18, when the new design and the migrated forum contents will go live. Read our full announcement for more information and let us know if you have any questions.

Path planning with PPO

Discussion in 'ML-Agents' started by dani_kal, Apr 29, 2020.

  1. dani_kal

    dani_kal

    Joined:
    Mar 25, 2020
    Posts:
    52
    Hello to everyone!!!

    After training period we have the trained Neural Network.
    In my case I have an agent who navigates from one initial point to another.
    If I want to export the only one path (the positions from the beginning to the goal )with the max reward, how I can do this?

    Thank you in advance!!!
     
  2. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi @dani_kal,
    Do I understand correctly that you want to have the trained neural network do the pathfinding over and over and only export the path with the maximum reward? What kind of path is it? Would another algorithm suffice for this task such as A*?
     
  3. dani_kal

    dani_kal

    Joined:
    Mar 25, 2020
    Posts:
    52
    Good evening !!
    Thank you for your answer!!!
    Specifically,I have trained an agent to go from an initial point to a final.

    Lets say that we have an agent at position (-5,-5) and his goal is at (5,5) with obstacles between them.
    After the training, the agent knows how to navigate from the start-point to the goal-point.
    I want to extract this path (which is the path with the maximum reward).
    This would be something like [(-5,-5)( -4,-4), (-4,-3) ... (5,5)]. These are the positions in which the agent was found from the beginning to the end with the max reward.
    My question is that I have to save each time the positions and the rewards during the whole training and find from them the max reward so the optimal path.
    Or is there another way that we can have it?

    Thank you again!!!
     
  4. Claytonious

    Claytonious

    Joined:
    Feb 16, 2009
    Posts:
    908
    Once your agent is trained to follow the optimal path, you could then run the agent in inference mode with your own script that records his actions.
     
    dani_kal likes this.
  5. dani_kal

    dani_kal

    Joined:
    Mar 25, 2020
    Posts:
    52
    Ok !!!Thank you very much!!!