Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Resolved Buffer.Shuffle - Performance-Gains for "sequence-length" of 1?

Discussion in 'ML-Agents' started by unity_-DoCqyPS6-iU3A, Apr 12, 2021.

  1. unity_-DoCqyPS6-iU3A

    unity_-DoCqyPS6-iU3A

    Joined:
    Aug 18, 2018
    Posts:
    26
    Hello everyone,

    I was looking at the timers.json-file from my runs, specifically at the "_update_policy" section.

    "_update_policy" only has 1 child-entry ("TorchPPOOptimizer.update"), and that only accounts for around 33% of the used time. So I also added the "@timed"-keyword to the "shuffle"-method in trainers/buffer.py to see where the rest of the time was needed.

    To my surprise the "shuffle"-method used up the rest of the time.

    For me that just doesn't feel right.
    Optimizing a policy, with multiple layers, and hundreds of units takes less time than *shuffling* an array?

    Of course the shuffle method does more than just randomizing: it keeps blocks of "sequence_length" together.

    But it looks like "sequence_length" is always "1" for me. Maybe it's different when using LSTMs?

    Maybe the code can be changed to differentiate situations:
    -use the current code for shuffling an array with sequence_length > 1
    -and ditch the for-loop and shuffle in-place (is that the right term?) for all other cases.

    Or is randomizing ~180.000 entries really that expensive and it doesn't make a difference?


    (currently trying to find performance gains before tuning hyperparameters. For this test: 130 Agents across 2 environments. Time-horizon 600, buffer-size 180.000)

    Code (CSharp):
    1.  Version information:
    2.   ml-agents: 0.24.0,
    3.   ml-agents-envs: 0.24.0,
    4.   Communicator API: 1.4.0,
    5.   PyTorch: 1.7.1+cu110
    Code (CSharp):
    1.                         "trainer_advance": {
    2.                             "total": 466.6255257000179,
    3.                             "count": 63893,
    4.                             "self": 10.226360500029728,
    5.                             "children": {
    6.                                 "process_trajectory": {
    7.                                     "total": 109.03346759998814,
    8.                                     "count": 63893,
    9.                                     "self": 108.47671139998826,
    10.                                     "children": {
    11.                                         "RLTrainer._checkpoint": {
    12.                                             "total": 0.5567561999998816,
    13.                                             "count": 4,
    14.                                             "self": 0.5567561999998816
    15.                                         }
    16.                                     }
    17.                                 },
    18.                                 "_update_policy": {
    19.                                     "total": 347.3656976,
    20.                                     "count": 22,
    21.                                     "self": 10.945687599998791,
    22.                                     "children": {
    23.                                         "AgentBuffer.shuffle": {
    24.                                             "total": 219.47304630000073,
    25.                                             "count": 66,
    26.                                             "self": 219.47304630000073
    27.                                         },
    28.                                         "TorchPPOOptimizer.update": {
    29.                                             "total": 116.94696370000048,
    30.                                             "count": 1008,
    31.                                             "self": 116.94696370000048
    32.                                         }
    33.                                     }
    34.                                 }
    35.                             }
    36.                         }
    Code (CSharp):
    1.     @timed
    2.     def shuffle(
    3.         self, sequence_length: int, key_list: List[AgentBufferKey] = None
    4.     ) -> None:
    5.         """
    6.        Shuffles the fields in key_list in a consistent way: The reordering will
    7.        be the same across fields.
    8.        :param key_list: The fields that must be shuffled.
    9.        """
    10.         if key_list is None:
    11.             key_list = list(self._fields.keys())
    12.         if not self.check_length(key_list):
    13.             raise BufferException(
    14.                 "Unable to shuffle if the fields are not of same length"
    15.             )
    16.         s = np.arange(len(self[key_list[0]]) // sequence_length)
    17.         np.random.shuffle(s)
    18.         for key in key_list:
    19.             tmp: List[np.ndarray] = []
    20.             for i in s:
    21.                 tmp += self[key][i * sequence_length : (i + 1) * sequence_length]
    22.             self[key][:] = tmp
     
  2. unity_-DoCqyPS6-iU3A

    unity_-DoCqyPS6-iU3A

    Joined:
    Aug 18, 2018
    Posts:
    26