Q-Learning in Python API - Error in Collab

ademord · May 31, 2021

Hello
I am using the code from the 2nd collab (Q-Learning). In my environment the episode ends when the agent falls of the platform / touches a wall.

The QNetwork is receiving then a vector of observations sized [0, n_observations], which results in a vector of actions of shape 0 as well (which down the line of the collab it results in an error).

How can continue so that the agent gets the negative reward for failing the episode?

vincentpierre · Jun 1, 2021

It is possible that when all the agents die, there are only termination signals and no observation signals. In this case, you should not try to use the QNetwork when there are no agents that need an action.
I would add a " if len(decision_steps) == 0 " check an not try to select actions when it is true. If no agent requested a decision and is just signaling that it is done, you do not need to make a call to env.set_actions.

ademord · Jun 2, 2021

vincentpierre said: ↑

It is possible that when all the agents die, there are only termination signals and no observation signals. In this case, you should not try to use the QNetwork when there are no agents that need an action.
I would add a " if len(decision_steps) == 0 " check an not try to select actions when it is true. If no agent requested a decision and is just signaling that it is done, you do not need to make a call to env.set_actions.
Click to expand...

thank you for your reply!

side question then, performance wise should a QNetwork behave similarly to a PPOTrainer? do you know if there would be any radical differences or things I should be aware of?

vincentpierre · Jun 14, 2021

In an environment as simple as gridworld, I expect both PPO and Q-Learning to solve perfectly. On more complex environments, this will not be the case. The PPOTrainer of ML-Agents has a lot of features that a vanilla Q-Learning does not have, so I would expect PPO to perform better.

ademord · Jun 18, 2021

Thank you! I will try to generate in the future a table where I can compare all the diferent algorithms and when each one is more appropriate, if you know of such a resource and could point me towards it I would highly appreciate it.

Search Unity

Unity ID

Useful Searches

Q-Learning in Python API - Error in Collab

ademord

vincentpierre

ademord

vincentpierre

ademord