Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

More control over updating the experience buffer.

Discussion in 'ML-Agents' started by EternalMe, Jun 28, 2022.

  1. EternalMe


    Sep 12, 2014
    From what I understand a sequence of actions is added to experience buffer when the episode ends or `time_horizon` is reached.

    RL still suffers from "credit assignment problem" right? Because if 70% of the actions were good and 30% not, it doesn't care, for all 100% the probability of repeating will be lowered (or I say banned).

    If I lower `time_horizon` to some very small amount, this can also create similar/opposite problems. Not capturing the whole failing sequence. Or lets say agent does something bad that dooms him to fail at some later point, but then, after, it does actually good stuff to get out of the situation... it fails, so last and actually good actions are banned. Etc..

    For my specific case, I can actually look back at the whole episode and kind of figure out the bad and the good parts. And theoretically I could add them correctly to experience buffer. And this leads to my main questions- Is there such interface to do this? Like

    addExpierience(stepFrom, stepTo, reward)
    or similar? Or workaround?