Search Unity

Q-Learning in Python API - Error in Collab

Discussion in 'ML-Agents' started by ademord, May 31, 2021.

  1. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    49
    Hello
    I am using the code from the 2nd collab (Q-Learning). In my environment the episode ends when the agent falls of the platform / touches a wall.

    The QNetwork is receiving then a vector of observations sized [0, n_observations], which results in a vector of actions of shape 0 as well (which down the line of the collab it results in an error).

    How can continue so that the agent gets the negative reward for failing the episode?
     
  2. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    It is possible that when all the agents die, there are only termination signals and no observation signals. In this case, you should not try to use the QNetwork when there are no agents that need an action.
    I would add a " if len(decision_steps) == 0 " check an not try to select actions when it is true. If no agent requested a decision and is just signaling that it is done, you do not need to make a call to env.set_actions.
     
    ademord likes this.
  3. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    49
    thank you for your reply!

    side question then, performance wise should a QNetwork behave similarly to a PPOTrainer? do you know if there would be any radical differences or things I should be aware of?
     
  4. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    In an environment as simple as gridworld, I expect both PPO and Q-Learning to solve perfectly. On more complex environments, this will not be the case. The PPOTrainer of ML-Agents has a lot of features that a vanilla Q-Learning does not have, so I would expect PPO to perform better.
     
    ademord likes this.
  5. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    49
    Thank you! I will try to generate in the future a table where I can compare all the diferent algorithms and when each one is more appropriate, if you know of such a resource and could point me towards it I would highly appreciate it.