Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Regarding env.get_steps(behavior_name)

Discussion in 'ML-Agents' started by camigord, Oct 14, 2020.

  1. camigord

    camigord

    Joined:
    May 29, 2019
    Posts:
    7
    Hi,
    I am having some troubles understanding how to step the environment in the latest release. I am currently using the 3DBall scenario and I do not really understand what is happening sometimes after calling
    env.get_steps(behavior_name)


    Code (CSharp):
    1. for behavior_name in behavior_names:
    2.     env.set_actions(behavior_name, actions[behavior_name])
    3.  
    4. env.step()
    5.  
    6. for behavior_name in behavior_names:
    7.     decision_steps, terminal_steps = env.get_steps(behavior_name)
    8.     print(len(decision_steps), len(terminal_steps))
    After a couple of steps, one of the agents in the scene terminates and I do receive a single entry inside
    terminal_steps
    but none inside
    decision_steps
    .

    How am I supposed to get the information for the remaining agents?
    Thanks!
     
    monishmedhavi likes this.
  2. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    Hi

    In the 3DBall environment, there are 12 agents requesting decisions every 5 fixed updated. If one of the Agents drops the ball between these 5 fixed updates and before all of the others, this Agent will NOT wait for the other Agents to request decisions to signal Python that it dropped the ball. This means that Python will receive 1 terminal_steps and 0 decision_steps (since one agent terminated and the others did not request a decision yet). The information is not lost, in Python, you can look at the relevant data of the terminated agent and then call env.step to move the simulation (until an other agent terminates or requests a decision) and call env.get_steps again to retrieve that new data.

    In ML-Agents, data is communicated to Python either when an Agent requests a decision or terminates but not in between. Agents are not required to request decisions or terminate in sync.

    Here is an illustration of what is going on in Unity and Python (n/a mean that no message was exchanged at all)
    Code (CSharp):
    1. Unity :
    2. agent 1           :   decision |       |         |                    |              |            | termination & decision          |
    3. agent 2           :   decision |       |         |                    |              |            | decision                        |
    4. agent 3           :   decision |       |         | termination        |              |            | decision                        |
    5.  
    6. Python :
    7. env.get_steps     :   (3, 0)   | n/a   | n/a     |  (0, 1)            | n/a          | n/a        | (3, 1)                          |
    8.  

    This means that in Python, if you call
    Code (CSharp):
    1. env.step();
    2. decision, terminal = env.get_steps(behavior_name)
    3. print(len(decision), len(terminal))
    3 times, you will see :
    Code (CSharp):
    1. (3,0)
    2. (0,1)
    3. (3,1)
     
    monishmedhavi likes this.
  3. camigord

    camigord

    Joined:
    May 29, 2019
    Posts:
    7
    Thanks a lot, that clarifies a lot.

    I have an additional question which probably does not apply to the 3DBall environment, but which may be relevant for me later on.

    In the 3DBall scenario, is there a way to receive less than 12 agents (and more than 0, of course) inside
    decision_steps
    ? What I mean by that is, are there any cases where agents may request decisions at different frequencies? Because, in the examples above, even when an agent terminates, it still resets on time to send information back together with the other agents.

    Thanks again for the help
     
    monishmedhavi likes this.
  4. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    You are perfectly right, Agents are not required to request decisions or terminate in sync.
    If you have 3 agents with a decision requester set to request a decision every 5 fixed updates and 2 agents with a decision requester set to request a decision every 7 fixed updates, you will receive decision steps as follows:
    after 5 steps : 3 agents
    after 7 steps : 2 agents
    after 10 steps : 3 agents
    after 14 steps : 2 agents
    after 15 steps : 3 agents
    after 20 steps : 3 agents again
    after 21 steps : 2 agents
    ...
    after 35 steps : 3 + 2 agents (35 divisible by 5 and 7)

    Note that if you do not use a decision requester and manually request decisions with "Agent.RequestDecision" then you can have much more complex scenarios.
     
    monishmedhavi likes this.
  5. camigord

    camigord

    Joined:
    May 29, 2019
    Posts:
    7
    That clarifies a lot. Thanks for taking the time.
     
  6. monishmedhavi

    monishmedhavi

    Joined:
    Feb 7, 2020
    Posts:
    2
    Code (CSharp):
    1. [QUOTE="vincentpierre, post: 6418421, member: 1370256"]
    2. Unity :
    3. agent 1           :   decision |       |         |                    |              |            | termination & decision          |
    4. agent 2           :   decision |       |         |                    |              |            | decision                        |
    5. agent 3           :   decision |       |         | termination        |              |            | decision                        |
    6. Python :
    7. env.get_steps     :   (3, 0)   | n/a   | n/a     |  (0, 1)            | n/a          | n/a        | (3, 1)                          |
    8. [/QUOTE]
    Can you explain what does this termination and decision for agent 1 means? I understand that Agents are not required to request decisions or terminate in sync.
    that's why when agent 3 terminated there was no decision request , only a termination (0,1) , but what does (3,1) mean?
     
  7. camigord

    camigord

    Joined:
    May 29, 2019
    Posts:
    7
    (3,1) means that agent1 terminated (and you can find the last obs and final reward inside
    terminal_steps
    ) and then it managed to reset on time to return a new observation (the first observation for a new episode inside
    decision_steps
    ).
     
    monishmedhavi likes this.
  8. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    You are exactly right !
     
  9. monishmedhavi

    monishmedhavi

    Joined:
    Feb 7, 2020
    Posts:
    2
    Thanks now it's clear