Search Unity

Training with bad demonstrations

Discussion in 'ML-Agents' started by MiguelCoK, Apr 12, 2020.

  1. MiguelCoK

    MiguelCoK

    Joined:
    Aug 22, 2017
    Posts:
    20
    Hi:
    I have a self-play environment with 2 agents competing with each other. Both agents share the same brain, i.e., both have the same behaviorName in the BehaviorParameters component. Im using imitation learning so im recording some demonstrations. These are my questions:
    1. If i use a DemonstrationRecorder component in only one of those agents, do i get the resulting demo file also with the observations and actions from the other agent (due to both having the same behaviorName)?
    2. Is a good thing to feed the imitation learning process with poor performance demonstrations (demonstrations of an agent loosing all games) in order for the agent learn what is wrong?
    3. In order to record good demos i need another person to play with me because i haven't enough hands to play both agents. However, another person playing with me is not posible right now. So, what i want to do is train without imitation first, and then record demonstrations playing against that (no so good) agent. And then train again with imitation. Maybe i could repeat this process reloading the earlier trained model on each new iteration. I want to know if that is a good approach. And if is there another way to achieve this (obtain good demos for imitation learning)
     
  2. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    1. Only the agent with the DemonstratinoRecorder will have its demonstrations recorded.
    2. Unfortunately, while this is intuitive, it would be counter productive given the imitation learning algorithms we use. GAIL and BC encourage an agent to do exactly as is done in the provided demonstrations. Providing bad demonstrations would only be encouraging the agent to do bad things!
    3. You can train the agents using self-play and then change the behavior type of one agent to 'inference only' in the behavior parameters script. You can control the other agent via the Heuristic() function while recording demos.
     
  3. MiguelCoK

    MiguelCoK

    Joined:
    Aug 22, 2017
    Posts:
    20
    Ok, thank you very much