Search Unity

Some advice and suggestions. (python status, MaxStep vs MaxEnvironmentSteps and ray sensor useage)

Discussion in 'ML-Agents' started by nathan60107, Apr 22, 2021.

  1. nathan60107

    nathan60107

    Joined:
    Apr 18, 2021
    Posts:
    5
    I'm using ml-agents release 15. And I want to provide some advice about ml-agents
    1. During training, we may want to change the environment according to training status. For example, when the agent can finish mission 1, we will add mission 2 to the enviroment. But it seems have to use a side-channel to do that now? It will be better user can access this information by some build-in class. For example:

      Academy.Instance.episode // get how many episode training python has finished
      Academy.Instance.avgReward // get average reward now

      All information show in the python console should be available on the Unity side. Or is there some way without a side-channel to get them now?
    2. In the Soccer example, there is a MaxStep in AgentSoccer.cs, which is set to 3000 in Inspector.
      upload_2021-4-22_16-44-5.png
      And a MaxEnvironmentSteps in SoccerEnvController.cs, which is set to 5000 in Inspector.
      upload_2021-4-22_16-44-37.png
      After my test (fix ball and don't let them get point), the agent will start a new episode when steps over 3000, but the environment will be reset only if step over 5000. So if the ball doesn't goal over 5000 steps, the agent's first episode will be 0~2999 step, and the second episode will be 3000~4999 step. Why this example do this setting? Will it be better than just let agent episode and environment reset at the same time? And its reason. That will be a reference for all users to set the training process.
    3. I'm using the ray sensor. And in the document only this part mention about ray sensor. Although I know the usage is just put RayPerceptionSensorComponent3D into agents and don't need to change the space size of vector observation. But it is not mentioned in the document (or just I didn't find it?). It will be better if the document teaches us how to use the ray sensor.
    Finally, thank you for creating this convenient tool for both game creator and ml researcher.
     
  2. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    1. This is a good feature request, but not something that's easy to add right now. In the meantime, I would recommend using the existing curriculum feature; there's an example with this in the WallJump example's config file, and the values are read here at training time.
    2. I'll look into this. Might just be a bug in the example.
    3. It's mentioned earlier in the same page (although there's a grammatical error): "NOTE: you do not need to adjust the Space Size in the Agent's Behavior Parameters when using an ISensor SensorComponents." I'll fix the wording (since I think I'm the one that wrote it in the first place).
     
    nathan60107 likes this.
  3. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    nathan60107 likes this.
  4. nathan60107

    nathan60107

    Joined:
    Apr 18, 2021
    Posts:
    5
    Thank you for repair so fastly.
    About 3. I think "the note" should be mention in every single component. For me, I first check the Table of Contents, and goto raycast observations by clicking it. I only read that part of this doc, so I will not see that note above it. Learning-Environment-Design-Agents.md is somehow like a dictionary, the user finds the component he/she interesting in. Users may not read all of them. That is why I think it needs to be added to every single component.
     
  5. nathan60107

    nathan60107

    Joined:
    Apr 18, 2021
    Posts:
    5
    @celion_unity
    Add some small wording problems.
    1. In extrinsic -> strength, it is recommended to set strength as range 1.00, which is not a range but a value.
    2. In curiosity -> strength, it is recommended to set 0.001-0.1, but default is 1.0. Isn't it strange that default is not in recommended range? The same situation appends at rnd -> strength.
    3. We need more detail about the difference between curiosity and rnd. It seems the same to users.
    4. In self-play, it mentions "If your environment contains multiple agents that are divided into teams, you can leverage our self-play training option by providing these configurations for each Behavior". A typical example is the soccer, in which the target of both teams is the same: kick ball into the opponent's goal. But if two teams have a different target, is it still suitable to use self-play? For example, team A will try to catch team B's members, and team B should avoid being caught. Because two teams use different networks to decide their behavior, so I think it is not what self-play means.
      If my thought is right, it should say: "only both teams use the same model is self-play."
      But I see "asymmetric games" in self play, so I it seems that both of them are self-play.
    5. Following the previous question. In SoccerTwos example all of them use SoccerTwos model, and StrikersVsGoalie uses different models for different color players. So SoccerTwos is self-play, while StrikersVsGoalie is not. I think you can provide the training config for users to know how the model was trained. It will be a good reference for users to set the config not mention in Making a New Learning Environment, which they can only see their description in Training Configuration File.
     
    Last edited: Apr 27, 2021
  6. nathan60107

    nathan60107

    Joined:
    Apr 18, 2021
    Posts:
    5
    @celion_unity
    Just a reminder, I didn't tag you in #5 so maybe no notice for you.