Training with bad demonstrations

MiguelCoK · Apr 12, 2020

Hi:
I have a self-play environment with 2 agents competing with each other. Both agents share the same brain, i.e., both have the same behaviorName in the BehaviorParameters component. Im using imitation learning so im recording some demonstrations. These are my questions:

If i use a DemonstrationRecorder component in only one of those agents, do i get the resulting demo file also with the observations and actions from the other agent (due to both having the same behaviorName)?

Is a good thing to feed the imitation learning process with poor performance demonstrations (demonstrations of an agent loosing all games) in order for the agent learn what is wrong?

In order to record good demos i need another person to play with me because i haven't enough hands to play both agents. However, another person playing with me is not posible right now. So, what i want to do is train without imitation first, and then record demonstrations playing against that (no so good) agent. And then train again with imitation. Maybe i could repeat this process reloading the earlier trained model on each new iteration. I want to know if that is a good approach. And if is there another way to achieve this (obtain good demos for imitation learning)

andrewcoh_unity · Apr 13, 2020

1. Only the agent with the DemonstratinoRecorder will have its demonstrations recorded.
2. Unfortunately, while this is intuitive, it would be counter productive given the imitation learning algorithms we use. GAIL and BC encourage an agent to do exactly as is done in the provided demonstrations. Providing bad demonstrations would only be encouraging the agent to do bad things!
3. You can train the agents using self-play and then change the behavior type of one agent to 'inference only' in the behavior parameters script. You can control the other agent via the Heuristic() function while recording demos.

MiguelCoK · Apr 16, 2020

Ok, thank you very much

Search Unity

Training with bad demonstrations

MiguelCoK

andrewcoh_unity

Unity Technologies

MiguelCoK

Search Unity

Unity ID

Useful Searches

Training with bad demonstrations

MiguelCoK

andrewcoh_unity

Unity Technologies

MiguelCoK