Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Question Good principles for setting up actions and rewards

Discussion in 'ML-Agents' started by dewet99, Mar 3, 2023.

  1. dewet99


    Jun 13, 2021
    Hi there,

    What are good principles for exactly where and when to give rewards with regards to actions? In my project, I have the following:

    In the OnActionReceived() function, I call TakeDiscreteAction(), which tells the agent which actions to take. Then, after that, still in OnActionReceived(), I call several different functions which gives rewards based on different flags that should be set now. For example, I have a pressure plate. In the pressure plate's OnCollisionEnter(), I get a reference to the agent that has collided with the plate via the Collision passed to OnCollisionEnter(). The agent has a public bool named hasPressedButton, which is set to true when the agent steps on the pressure plate. Then, in the OnActionReceived() function of the agent, I call a function RewardButtonPress()[/ICODE], which gives a reward "if (hasPressedButton){AddReward(...)}" and then the flag is set to false again. The Unity side logic of all this works, I can see that AddReward() is called as soon as the agent steps on the pressure plate. Here comes the caveat though:

    I use an external algorithm, implemented myself, that interacts with the environment via the Python API. My question is, will the correct (action, observation, reward) be passed to the Python environment? Because if I understand correctly, OnActionReceived() is called every step, independent of FixedUpdate(), whereas the OnCollisionEnter() flags are set in either Update or FixedUpdate. And FixedUpdate =/= Step. I'll demonstrate my thinking below with a few lines of code:

    Code (CSharp):
    1. public override void OnActionReceived(ActionBuffers actionBuffers)
    2.     {
    4.         TakeDiscreteAction(actionBuffers);
    5.         currentEpisodeStep += 1;
    7.         // Give reward for the actions taken:
    8.         // Give reward for button pressed
    9.         RewardButtonPress();
    11.     }
    Now, in the above code, will the sequence of events be as follows:

    OnActionReceived() is called
    TakeDiscreteAction() is called

    The agent takes the actions. For argument's sake, let's say the action that the agent took caused him to stand on the pressure plate (I call it a button, same same).

    Button's OnCollisionEnter is called and sets the agent's hasPressedButton flag to true.
    RewardButtonPress() is called and gives reward because hasPressedButton is true.

    Similarly, for other actions: Will the actions be taken and the relevant flags get set BEFORE the rest of OnActionReceived is called or not?

    I hope my question is clear, and sorry for the wall of text.
  2. hughperkins


    Dec 3, 2022
    That is a wall of text. And not very clear. What is clear is that you are trying to use python outside of the standard mlagents approach, and it's very possible that you might find it easier to skip mlagents altogether, use a json rpc layer to control unity from python, and implement RL yourself, inside your python, e.g. using stable baselines3 or similar?

    I have a video where I descrbie such a json rpc layer, built over Chili, AustinHarris JsonRpc, and a couple of other libraries, 1 Control Unity from Python WITHOUT mlagents - YouTube
  3. dewet99


    Jun 13, 2021
    I'll have a look at the video you linked, that might actually be preferable to using MLAgents. Thanks :)