Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

How to record a "good demo" for imitation learning?

Discussion in 'ML-Agents' started by igsrd3unitytest, Mar 17, 2020.

  1. igsrd3unitytest

    igsrd3unitytest

    Joined:
    Mar 9, 2020
    Posts:
    5
    I trained a PushBlock sample model with gail, and the input demo file was recorded by myself. But it just hanged around and never push block into green area. The result model from ExpertPush.demo by Unity behaved normal, so the difference must be the way I play this game. Does there anyone success to train this sample with self-made demo? Any tips? :confused:
     
  2. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    Usually the key is "more" demos - the ExpertPush has quite a few. Also, try turning off `use_actions = true` in GAIL, and if that isn't sufficient turn off `behavioral_cloning`. I'd also try increasing the stack size in the Behavior Parameters in the Unity editor.

    One of the issues is that in human demos for the cube environments, there tends to be a lot of stopping (i.e. stop and turn to point the cube, then go). So the agent learns to imitate the stopping, but because the policy has no memory, it will just stop forever. The changes above should make it less sensitive to this.
     
    igsrd3unitytest likes this.
  3. igsrd3unitytest

    igsrd3unitytest

    Joined:
    Mar 9, 2020
    Posts:
    5
    Thanks for reply! I'll try these tips. :)
     
  4. AndrewGeroge

    AndrewGeroge

    Joined:
    Jul 19, 2019
    Posts:
    3
    I'm also stuck a bit. I have a scene like pyramid example, even a bit more complicated. However the main task for agent is to reach the target crossing the entire arena by diagonal.On arena there are obstacles, so the agent should find a way to the target avoiding obstacles. Agent and Target have constant coordinates. Plus I use rayperception and several rewards, penalties. For any collision with a wall or other obstacle agents is punished, it is also punished by time. However it is awarded if coming closer to the target and if reaches the target.
    So I try to record several episodes of demo for this scene (~10) and then start training using the same parameters as pyramid example... and no result. I ran training on 9 cloned arenas for 8 hours and still agent cannot reach the target in fair manner. It is often stuck between a wall and tree and etc. I tried to increase number of layers in the network, but did not help. I do not know... probably there are should be some balance between numbers of rewarsd/penalties and IL. Like more demonstrations - less awards... any ideas will be appreciated. Thanks! I can add more details if anybody interested to help me.
     
  5. LexVolkov

    LexVolkov

    Joined:
    Sep 14, 2014
    Posts:
    62
    I had this, when I forgot to put the desired tag in the ray sensor. And the agent was simply blind.
     
  6. thomasphifer

    thomasphifer

    Joined:
    Mar 26, 2016
    Posts:
    13
    I have another question on this topic.

    I am recording a demo for a very long and difficult task (even for a human to do) so I sometimes fail the task. Is it ok to record some failure episodes? Or do I need to start all over even if I have a single failure?

    The majority of my recorded episodes in my demo recording are successful. 90%+

    Could my Trained Brain ever reach 100% success rate with this Demo? Or would it make the same mistakes I did and only ever reach 90% success?
     
    Last edited: Aug 31, 2020
  7. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    As long as the demonstration contains the Agent getting a negative reward for failure, it should be fine.