Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

GAIL&RL balance

Discussion in 'ML-Agents' started by AndrewGeroge, Mar 27, 2020.

  1. AndrewGeroge

    AndrewGeroge

    Joined:
    Jul 19, 2019
    Posts:
    3
    I'm new in this topic, so sorry if asking silly questions.
    When I build a scene similar to Pyramid example and recorded several demo I've confirmed how demos helped agent to learn faster. In Pyramid example there is basically one reward when reaching the goal and 1 penalty depending on the steps.

    However when I add more obstacles on the scene and new rewards/penalties (like reward for getting closer to the goal and penalty for the collision to an obstacle) I've noticed that demos stopped helping anyhow.
    Does anybody understand the best practice of using demos? Is there any balance between awards/demos?
     
  2. christophergoy

    christophergoy

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi @AndrewGeroge,
    Shaping reward functions is one of the most difficult parts of creating an environment. For the Pyramids example, the sparse reward function allows the Agent to be "curious" and try to discover what it needs to do in order to succeed. Modifying this reward function may have unintended consequences. The trade-off with sparse reward functions is that they usually take longer to train than a less sparse reward function. It seems like you are seeing unintended behavior from a less sparse reward function, which is somewhat expected.