Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question LSTM Unity ML-Agents

Discussion in 'ML-Agents' started by TulioMMo, Jan 28, 2022.

  1. TulioMMo

    TulioMMo

    Joined:
    Dec 30, 2020
    Posts:
    29
    Hi Everyone,

    I am trying to use LSTM in my project (using PPO), but I have a couple of questions.

    1st: Are there any suggestions when choosing an appropriate sequence_length and memory_size in respect to time horizon? In the Hallway example memory size = 128, but time horizon is only 64.. is that usual?

    2nd: Is there an article exploring more about the technical (and theoretical) differences between "sequence_length" and "memory_size"? In https://github.com/Unity-Technologi...hanced-agents-using-recurrent-neural-networks, there is an explanation, but it says that both "memory_size" and "sequence_length" sizes determines how much information from the past is remembered by the agent.. for me its a little ambiguous since "memory_size" > "sequence_length".

    Thanks for the help :)
     
  2. ChillX

    ChillX

    Joined:
    Jun 16, 2016
    Posts:
    145
    sequence_length := How many steps over which to collect the memory information
    memory_size := How many floats worth of information to remember.

    Hallway:
    hyperparameters:
    batch_size: 128
    buffer_size: 1024
    memory:
    sequence_length: 64
    memory_size: 128
    time_horizon: 64

    Means
    While training with a batch size of 128. gather 128 floats of memory information over the past 64 steps. This is zero padded for steps before 64 in each batch of 128. When training episode step data will be added to the buffer in 64 step blocks. Therefore it will always have at least 64 frames of data in each batch.

    Its all explained here: https://github.com/Unity-Technologi...e_19_docs/docs/Training-Configuration-File.md

    Regarding what's the best value for the sequence length that is explained here: https://github.com/Michaelwolf95/Hierarchical-ML-agents/blob/master/docs/Feature-Memory.md
     
    TulioMMo likes this.
  3. TulioMMo

    TulioMMo

    Joined:
    Dec 30, 2020
    Posts:
    29
    Thank you very much! I had misunderstood the concept of "memory_size".

    I have some other questions, though..

    1) Assuming an example where I have a vector of observations of size 3, without stacking: st = [ x, y z], where x, y and z are float variables.

    Then, if sequence_length: 64, I would require a memory_size of 64*3 = 192 to store all memory from the previous 64 steps? I assume that if memory_size is smaller, then some information from sequence_length would be lost then?

    2) Is the LSTM substituting the actor and critic neural networks, outputting the state-value function and a softmax with actions to take? Or is it just substituting the critic?

    Thanks once again
     
  4. ChillX

    ChillX

    Joined:
    Jun 16, 2016
    Posts:
    145
    The way LSTM works is not a simple memory system like stacking.

    ML Agents uses torch.nn.lstm
    ml-agents/ml-agents/mlagents/trainers/torch/layers.py

    """
    Creates a torch.nn.LSTM and initializes its weights and biases. Provides a
    forget_bias offset like is done in TensorFlow.
    """
    lstm = torch.nn.LSTM(input_size, hidden_size, num_layers, batch_first=batch_first)


    Next under Torch Forward for the module


    lstm_out, hidden_out = self.lstm(input_tensor, hidden)
    output_mem = torch.cat(hidden_out, dim=-1)


    The observations from the agent are being fed into the LSTM layers. Each layer has a hidden state and output values. The hidden state is passed from the first LSTM layer to the next and to the next but never output back to the NN. The final output value is what is passed to the other layers of the NN as a second input concatenated with the observations. The hidden state is preserved by passing it back and forth to Unity and back to ML Agents. The output is what is used by the rest of the agents NN

    To fully understand LSTM you have to go through docs like these two:
    https://medium.com/@kangeugine/long-short-term-memory-lstm-concept-cb3283934359
    https://towardsdatascience.com/lstm-neural-network-the-basic-concept-a9ba225616f7

    But knowing how LSTM works has not helped me in anyway with respect to using ML Agents. I studied it only because previously I was doing custom NN implementations.

    So think of it this way. If Sequence length is 64 then this means over 64 steps the LSTM layer will choose what it wants to remember and what it wants to forget / ignore.

    In other words:
    With Stacking we are explicitly feeding the neural network with the full set of observations of the previous X steps.

    With LSTM we are letting the NN choose which bits of the previous states that it wants to remember. The NN's choices are going to be dependent on which observations helped it most with getting a higher reward.

    If you think about it all the observation data in previous state history is not going to be useful for the current action. Only some bits will be useful. For example if you think of hide and seek. Did I check behind that tree in the previous 64 steps is probably more useful than the distance from agent to the game map boundary in the last 64 steps. Or as a human I choose to remember that I made an appointment yesterday for a meeting today because it is of value to me. However I do not remember how many birds were on the porch because it is of no value to me.

    Stacking is brute force memory which has everything
    LSTM is smart memory which has to be trained and until its trained it is stupid memory

    Or as a human:
    Stacking is photographic memory of everything both useful and useless
    LSTM is normal memory where we only remember useful things

    So whats the catch. The catch is it takes a while for the LSTM layers to learn what is useful and what is not useful. And until it learns what is useful the memory is of little or no value to the rest of the agent NN. Also if the environment is drastically changed with like curriculum learning then the LSTM might have to learn some things all over again.

    One question to ask before using stacking or LSTM is, does memory actually help the agent. We might think it helps but it may be entirely useless. Example In a game of chess memory serves no value. One way to determine this is as follows.
    Do a training run with Curiosity and No Stacking or LSTM until learning plateaus
    Then do a training run with Curiosity and Stacking 3 frames but no LSTM. Run it for the same number of steps as the previous run.

    If learning speed / performance with stacking is negative then this means memory serves no value to the simulation. What I've found is that on simulations where memory serves no value if you enable Curiosity and stacking then the learning goes all haywire and agents often chase a negative reward instead of trying to get a higher reward or at least do far worse than with stacking disabled.

    Its an incremental process. Maybe start with the minimum feature set and get the model to learn something. Then start adding features and testing to see what is helpful and what is not. Every game / simulation is different. What works in one simulation may not work in another.
     
    Last edited: Feb 3, 2022
  5. TulioMMo

    TulioMMo

    Joined:
    Dec 30, 2020
    Posts:
    29
    Thank you very much for the thorough answer and links!! I run some experiments with your suggestions.
     
  6. ice_creamer

    ice_creamer

    Joined:
    Jul 28, 2022
    Posts:
    33
    hello,if i use LSTM and want to use it to predict something to addobservation,what should i do? How can i get this value?
    Thank you!