Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice

More control over updating the experience buffer.

Discussion in 'ML-Agents' started by EternalMe, Jun 28, 2022.

  1. EternalMe


    Sep 12, 2014
    From what I understand a sequence of actions is added to experience buffer when the episode ends or `time_horizon` is reached.

    RL still suffers from "credit assignment problem" right? Because if 70% of the actions were good and 30% not, it doesn't care, for all 100% the probability of repeating will be lowered (or I say banned).

    If I lower `time_horizon` to some very small amount, this can also create similar/opposite problems. Not capturing the whole failing sequence. Or lets say agent does something bad that dooms him to fail at some later point, but then, after, it does actually good stuff to get out of the situation... it fails, so last and actually good actions are banned. Etc..

    For my specific case, I can actually look back at the whole episode and kind of figure out the bad and the good parts. And theoretically I could add them correctly to experience buffer. And this leads to my main questions- Is there such interface to do this? Like

    addExpierience(stepFrom, stepTo, reward)
    or similar? Or workaround?