Search Unity

Training simulation stopped after 7 hours of training for unknown reasons

Discussion in 'ML-Agents' started by ammad99, Dec 21, 2020.

  1. ammad99

    ammad99

    Joined:
    Nov 13, 2020
    Posts:
    14
    Hello Everyone!

    I am using ml agents version 1.6.0 and unity version 2019.4.14f1 my training stopped automatically. the screen shot of the problem is as follows:
    upload_2020-12-21_10-22-35.png

    I don't know for what reason the training stopped. I would be glad if someone can help me on this issue?

    Thanks!
     
  2. ammad99

    ammad99

    Joined:
    Nov 13, 2020
    Posts:
    14
    so more screenshots to trace the problem...
    upload_2020-12-21_10-29-30.png
     
  3. ammad99

    ammad99

    Joined:
    Nov 13, 2020
    Posts:
    14
    My hyperparameters:
    behaviors:
    CarBehavior:
    trainer_type: ppo
    hyperparameters:
    batch_size: 1024
    buffer_size: 10240
    learning_rate: 0.0003
    beta: 0.005
    epsilon: 0.2
    lambd: 0.95
    num_epoch: 3
    learning_rate_schedule: linear
    network_settings:
    normalize: true
    hidden_units: 64
    num_layers: 2
    vis_encode_type: simple
    reward_signals:
    extrinsic:
    gamma: 0.98
    strength: 1.0
    keep_checkpoints: 5
    checkpoint_interval: 500000
    max_steps: 5000000
    time_horizon: 64
    summary_freq: 50000
    threaded: true

    Continous Actions:
    Accelerate mapped values(0 to 1)
    brake mapped values (-1 to 0)

    Goal:
    maintain a distance from the car next to agent

    observation:
    distance to the next car
     
    Last edited: Dec 21, 2020
  4. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi @ammad99,
    Did you see anything in the unity Editor or Player log? Did your environment crash? Could you provide more info for us, please?
    Cheers,
    Chris
     
  5. ammad99

    ammad99

    Joined:
    Nov 13, 2020
    Posts:
    14
    Hi @christophergoy Many thanks for your message. Please see the following screenshots that can be helpful in tracing the problem.

    In my console window i got this:
    upload_2020-12-22_10-11-36.png

    Although i stopped the simulation but my task manager shows me these numbers for memory:
    upload_2020-12-22_10-12-27.png
    upload_2020-12-22_10-17-47.png

    What i observed was my environment was stuck in the middle of the simulation because my agent was at its mean position where as the moving car was stuck somewhere in the middle of the road (not at mean position).

    Also to mention that i am trying to train only one agent....
    At the moment my unity software is stuck so have to forcefully close it

    previously i was using ml agents 0,21 and ml agent 1.6 in unity but then i changed it to 1.5 because i thought maybe 1.6 is not compatible with 0.21 ml agents (tensorflow) still the problem is there :(..

    Any help in this regard would be highly appreciated because i don't know what is the problem

    maybe it's a memory issue but don't know how to solve it :(
     
    Last edited: Dec 22, 2020
  6. ammad99

    ammad99

    Joined:
    Nov 13, 2020
    Posts:
    14
    digging more into the problem i found out that when the training is stuck then pressing "ESC" key helps before it shows you this error which is that the environment took too long to respond......
     
  7. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    my 2 cents:
    32 gigs of ram should be enough to run almost every homemade sym. Looking at your memory graph, it seems like your code is accumulating data without ever discarding them, leading to the "out of memory" message.
    Maybe a list became too big? huge arrays stored in multiple copies? gameobjects being deactivated instead of destroyed, and accumulate over time?
    I suggest you to check if everything is properly initialized between episodes.