Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

More control over updating the experience buffer.

Discussion in 'ML-Agents' started by EternalMe, Jun 28, 2022.

  1. EternalMe

    EternalMe

    Joined:
    Sep 12, 2014
    Posts:
    183
    From what I understand a sequence of actions is added to experience buffer when the episode ends or `time_horizon` is reached.

    RL still suffers from "credit assignment problem" right? Because if 70% of the actions were good and 30% not, it doesn't care, for all 100% the probability of repeating will be lowered (or I say banned).

    If I lower `time_horizon` to some very small amount, this can also create similar/opposite problems. Not capturing the whole failing sequence. Or lets say agent does something bad that dooms him to fail at some later point, but then, after, it does actually good stuff to get out of the situation... it fails, so last and actually good actions are banned. Etc..

    For my specific case, I can actually look back at the whole episode and kind of figure out the bad and the good parts. And theoretically I could add them correctly to experience buffer. And this leads to my main questions- Is there such interface to do this? Like

    addExpierience(stepFrom, stepTo, reward)
    or similar? Or workaround?