Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question When and how often do Agents collect observations (with CollectObservations)?

Discussion in 'ML-Agents' started by macsimilian, Aug 7, 2023.

  1. macsimilian

    macsimilian

    Joined:
    Sep 19, 2020
    Posts:
    19
    Are the agents monitoring the state of observations at every frame? Or only right before giving a decision? Or something else?
     
  2. smallg2023

    smallg2023

    Joined:
    Sep 2, 2018
    Posts:
    102
    every fixed update by default
     
  3. macsimilian

    macsimilian

    Joined:
    Sep 19, 2020
    Posts:
    19
    Any way to change that or make its observation times manual? My game is a turn-based board game. Its state doesn't change that often. The state changes much more incrementally, at specific times when players/agents make decisions to move cards, pieces, etc. It's not time or tick based at all. Ideally I would like to be able to call a function to manually make the agents take observations when I know the board has changed...

    EDIT: Found a (potential) solution in this thread:
     
    Last edited: Aug 9, 2023
  4. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    Yup, that's your answer. If you turn off automatic stepping and invoke the EnvironmentStep() method manually, you'll get the behavior you're looking for.
     
  5. macsimilian

    macsimilian

    Joined:
    Sep 19, 2020
    Posts:
    19
    Is there any point in making observations not directly before requesting a decision (or after giving a reward)? Should I do an environment step whenever the board changes, or just before the agent makes a decision? I ask because by default there is no memory (and I shouldn't need memory for this game - everything can be decided looking at one state), so wouldn't any observation not directly before the decision be a waste?

    Or does the agent actually benefit from seeing more granular changes in the state even though it's not making a decision each time?

    Also how does this apply to ML Agents projects in general? The default LateUpdate observations and then using a DecisionRequester would make observations/decisions very out of cadence. In that case would the agent only use the observations that happened to have been made right before its decision?
     
    Last edited: Aug 11, 2023
  6. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    If there's no memory task and no stacked observations then there would likely not be much benefit to giving the agent observations between actions.


    It can sometimes be a benefit to the model updates if the observations are novel within a batch_size of experiences. But it's not something I would worry much about. It's a different situation when we're talking about memory tasks obviously.

    Observations and decisions do not need to be taken in a specific cadence. The reason for this is because the model is generating actions for every observation loop regardless of you requesting them. If you don't query for actions on any arbitrary step they are just thrown away and the environment is not updated with those actions.

    Basically for every environment step observations are sent to the model. Action outputs are generated for every step, if you don't query for them and apply them to the environment they are discarded and have no effect on training.
     
  7. macsimilian

    macsimilian

    Joined:
    Sep 19, 2020
    Posts:
    19
    Thank you for your answer! What exactly would be the difference between using stacked observations vs. relying on the batch_size (and what are the differences between the two)? I think my agents could benefit from seeing more incremental changes in the state since they would see the velocity certain variables change at. And yes each observation would be novel.

    Interesting, I rely heavily on masking. Most decisions aren't even possible at most times as they are masked. So those decisions being generated would also be largely incorrect. I guess that's probably not something to worry about though.
     
  8. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    Stacked observations are just all observations from the previous n steps (n being the number of stacked obs you've set) being sent to the model on the current step. It functions as a rudimentary memory however it is not scale-able beyond toy problems with low numbers of observations.

    The batch_size parameter can't be used directly to effect any kind of memory but I mentioned it in the context of how the agent cant benefit from seeing more observations whether or not an action is used. More easily understood changes in the experience buffer can help minimally by speeding up training, but in most environments this will be negligible.


    I'm not sure about the mlagents implementation in particular but usually masked actions are just resampled until a non masked action is returned. The model starts by choosing random actions anyway and as it learns the stochastic action space will avoid those actions on its own because they never produce rewards (because they are thrown away or resampled).