Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Question Problem about decision steps and terminal steps in mlagents_envs

Discussion in 'ML-Agents' started by Dream_Surpass, Mar 3, 2023.

  1. Dream_Surpass

    Dream_Surpass

    Joined:
    Dec 2, 2022
    Posts:
    10
    I used customized RL algorithm to train agent by mlagents_envs just like tuturiols https://github.com/Unity-Technologies/ml-agents/tree/develop/colab.

    If there were multiple agents in one Behavior Name, I found when one agent terminate(or exist in terminal steps), decision steps will contain nothinig. And follow the code in tuturiol, in this step we will pass the empty obs to network and get empty actions. Then step the env forward. Is this noraml?

    When we pass empty action to env and call step function, will the env actually step forward? If yes, here might exist one bug. Firstly, many agents may take heuristic actions in this step. Moreover, if one agent terminates, and several agents terminate continuously, then the first terminated agent terminates again during these steps, we may not create the dict for this agent because only in decision_steps we create the dict.

    upload_2023-3-3_10-11-15.png


    Any ideas? Thanks a lot.
     
  2. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    178
    That's why I invented Peaceful Pie. Because mlagents is not episode-centric but agent centric. If you wish to run episodes over multiple agents, LLAPI etc won't tell you when the episode is finished. You cannot distinguish between the case of an agent terminating, two agents spawning, and the episode is continuing; versus an agent terminates, end of episode, next episode starts, and two agents spawn into that new episode.

    See my signature for Peaceful Pie.
     
  3. Dream_Surpass

    Dream_Surpass

    Joined:
    Dec 2, 2022
    Posts:
    10
    Thanks for your suggestion. I will try Peaceful Pie later.