Search Unity

Time horizon setting ?

Discussion in 'ML-Agents' started by msh8912, Apr 14, 2020.

  1. msh8912

    msh8912

    Joined:
    Jan 3, 2019
    Posts:
    1
    The agents are strangely not learning well in the environment that I made with Unity.

    I think the reason is from time Horizon. In my environment, it takes 1000 time steps for an agent to receive a reward of 1.0. Every step is important until the agent gets to the Reward.

    I saved all 1000 time steps in my environment and I have a Discount Factor 0.99. I think the 1.0 reward from 1000 steps becomes a meaningless value and the agent can't make a good decision.

    So I increased the discount factor and tested it, but it keeps failing.

    What approaches should I use to deal with this problem?
     
    Last edited: Apr 15, 2020
  2. MarkTension

    MarkTension

    Joined:
    Aug 17, 2019
    Posts:
    43
    What is your current horizon setting? If you think that's the case, maybe first try to test that by doing an environment with less steps until you get the desired behavior? You might find that something else plays a role.

    Also you could try giving intermediate rewards, or training with curriculum learning in your environment if possible?