Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Using demonstrations for training - episodes, rewards and experience is 0(Bug/Intended?)

Discussion in 'ML-Agents' started by noobnewbier, Mar 7, 2020.

  1. noobnewbier

    noobnewbier

    Joined:
    May 24, 2017
    Posts:
    6
    I am unsure if this is a bug or is intended. As I am setting up the environment slightly differently, I post it here in case that is what is causing the issue - it will require significant amount of work for me to set up my environment like how the demo did, as I am trying to integrate it with a game.

    Environment:
    Tensorflow Version : 2.0.1
    Unity Version : 2018.4.9f1
    ML Version : 0.13.1


    The current problem I am having is that I am able to train my agent, however when I am trying to add demo files for imitation learning, the demo files doesn't look qutie right, as shown in the attached snippet. All episodes, rewards and experience is 0.

    Capture.PNG
    However, as shown in the following reward graph, I am clearly setting up rewards correctly. I also tried making sure that my reward function is called and assigning rewards accordingly
    part-of-cumulative-reward.PNG


    I am suspecting the issue might be that I am destroying the agent whenever it is "done", after the
    AgentOnDone
    method is called. This is due to the current architecture of the software - destroying it completely once it is done make many other operation significantly easier.

    Although I can rewrite my code to mimic what the demo does, I would be very grateful if I can avoid this: thus the question - is this intended? And will this cause any issue on my training using the demo files?
     
  2. TreyK-47

    TreyK-47

    Unity Technologies

    Joined:
    Oct 22, 2019
    Posts:
    1,810
    We'll pass this over to the team to review. Which version of C# and Python are you using here?
     
  3. noobnewbier

    noobnewbier

    Joined:
    May 24, 2017
    Posts:
    6
    @TreyK-47
    Thanks for helping

    C# version : 7.3
    Python version : 3.6.9
    Capture.PNG

    In case if you are wondering why the reward fluctuates - I am using curriculum learning, and the drop in the graph is where the level changes.