Search Unity

  1. We are migrating the Unity Forums to Unity Discussions by the end of July. Read our announcement for more information and let us know if you have any questions.
    Dismiss Notice
  2. Dismiss Notice

Question The details of the self-play algorithm implementation

Discussion in 'ML-Agents' started by Dream_Surpass, Jun 27, 2023.

  1. Dream_Surpass

    Dream_Surpass

    Joined:
    Dec 2, 2022
    Posts:
    18
    I want to know the details of the self-play algorithm implementation in mlagent. Which paper should I refer to?

    Some self-play papers say that they use two memory pools. One is used for supervised learning and the other is used for RL training. Does this exist in mlagent's self-play?

    And how does the buffer in mlagent's self-play work? For example, side 1 collects 2048 trajectories and saves them in buffer. Now the training side switches, whether the data collected by side 1 in the buffer needs to be cleared? Or I should use different buffer for two side agents?

    Could anyone tell me about this? Thanks a lot.
     
  2. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
  3. ice_creamer

    ice_creamer

    Joined:
    Jul 28, 2022
    Posts:
    34