Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Time horizon setting ?

Discussion in 'ML-Agents' started by msh8912, Apr 14, 2020.

  1. msh8912


    Jan 3, 2019
    The agents are strangely not learning well in the environment that I made with Unity.

    I think the reason is from time Horizon. In my environment, it takes 1000 time steps for an agent to receive a reward of 1.0. Every step is important until the agent gets to the Reward.

    I saved all 1000 time steps in my environment and I have a Discount Factor 0.99. I think the 1.0 reward from 1000 steps becomes a meaningless value and the agent can't make a good decision.

    So I increased the discount factor and tested it, but it keeps failing.

    What approaches should I use to deal with this problem?
    Last edited: Apr 15, 2020
  2. MarkTension


    Aug 17, 2019
    What is your current horizon setting? If you think that's the case, maybe first try to test that by doing an environment with less steps until you get the desired behavior? You might find that something else plays a role.

    Also you could try giving intermediate rewards, or training with curriculum learning in your environment if possible?