Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Simulation of an autonomous Rock Breaker

Discussion in 'ML-Agents' started by Kjelle69, Nov 20, 2021.

  1. Kjelle69

    Kjelle69

    Joined:
    Nov 9, 2016
    Posts:
    53
    I have created a simulation scene of an industrial stationary Rock Breaker. I manage to apply ML-Agents control with the mission for the machine to position the hammer onto a rock ready for the breaking action.

    When the rock is stationary it takes just about 100 k episodes before the machine can find the correct position, the arm has four hydraulic cylinders which are controlled by the ML-Agent.



    The next level is to move around the rock and here the training fails, I cant get the results to converge so my guess is that I have to optimize the parametrization somehow. Do you have any suggestions of where to begin?

    Gail maybe, where I create an imitiation scenario or?
     

    Attached Files:

    Last edited: Nov 20, 2021
  2. gft-ai

    gft-ai

    Joined:
    Jan 12, 2021
    Posts:
    44
    It would be better to provide some more informations as to how your environment setup like observations & reward structure before anyone can suggest anything useful for you
     
    Kjelle69 likes this.
  3. Kjelle69

    Kjelle69

    Joined:
    Nov 9, 2016
    Posts:
    53
    When the tip och the hammer touches the white sphere on the rock the reward is +30, if it touches the rock on the side it gets +2. If it touches the floor or anything else that it's not supposed to touch it's a -5 penalty. I used the standard config file at first but later in changed it to allow more episodes.



    behaviors:
    ControlScript:
    trainer_type: ppo
    hyperparameters:
    batch_size: 10
    buffer_size: 100
    learning_rate: 3.0e-4
    beta: 5.0e-4
    epsilon: 0.2
    lambd: 0.99
    num_epoch: 3
    learning_rate_schedule: linear
    network_settings:
    normalize: false
    hidden_units: 256
    num_layers: 2
    reward_signals:
    extrinsic:
    gamma: 0.99
    strength: 1.0
    max_steps: 80000000
    time_horizon: 64
    summary_freq: 10000
     
  4. gft-ai

    gft-ai

    Joined:
    Jan 12, 2021
    Posts:
    44
    Cool thanks,

    your reward structure seems to be way out of the recommended range but in my experience as long as the rewards clearly indicate in terms of what the desired behaviour is or not it seemed to be more forgiving towards the agent learning the desired behaviour. However, what I learned that is more important is the the observations you are giving to the agent. For example, just to get you started, how does your agent know where the rock is? Is it using raycasts? Or are you giving the position of the rock? Is it in a global frame or relative to the agent? I think the easiest way to go on about it is first thinking about ‘how would I learn as a human to achieve this goal? And what kind of information would I need to do so?’ Then once you think you have some good starting point and the agent does better than what it does now then you can think about optimising the parameters such as better reward functions and the training configurations.

    when you are at that stage you can fine tune the parameters according to the documentation here: https://github.com/gzrjzcx/ML-agents/blob/master/docs/Training-PPO.md

    hopefully this will get you started and been helpful in some way. I also hope there will be someone else who will add to what I have written here to help you further. But if you have more specific questions or comments I will do my best to help and respond here too.
     
    Kjelle69 likes this.
  5. Kjelle69

    Kjelle69

    Joined:
    Nov 9, 2016
    Posts:
    53
    Thanks for your answer.
    The observations is so far limited to the local position of the Rock (targetTransform) and the Hammer (hammertransform), those are set in local mode in order to be able to duplicate the scene in training mode. (Maybe these should be in world coordinates in order for the model to see how they are positioned relative to each other?:

    Code (CSharp):
    1. public override void CollectObservations(VectorSensor sensor)
    2.     {
    3.        
    4.         sensor.AddObservation(targetTransform.localPosition);
    5.         sensor.AddObservation(hammerTransform.localPosition);
    6.     }
    I have not yet implemented any raycasts or camera sensors but I would definitely like to do so since that's the probable sensor set up in the real case. (3D radar sensors or Lidars in order to model the rocks) A human should need to see the rock in order to position the hammer right, (Thats the way the process look today with our remotely controlled Rock Breakers) I will read the documentation and search for examples that can inspire further development.

    Oh, I also added GAIL to the configuration now. I did close to 200 training sessions which were recorded into a demo file. It works better but is still not converging towards a solution.

    Config file with Gail-part activated.

    Code (CSharp):
    1. behaviors:
    2.   ControlScript:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 10
    6.       buffer_size: 100
    7.       learning_rate: 3.0e-4
    8.       beta: 5.0e-4
    9.       epsilon: 0.2
    10.       lambd: 0.99
    11.       num_epoch: 3
    12.       learning_rate_schedule: linear    
    13.     network_settings:
    14.       normalize: false
    15.       hidden_units: 256
    16.       num_layers: 2
    17.     reward_signals:
    18.       extrinsic:
    19.         gamma: 0.99
    20.         strength: 1.0
    21.       gail:
    22.         gamma: 0.99
    23.         strength: 0.01
    24.         network_settings:
    25.           normalize: false
    26.           hidden_units: 128
    27.           num_layers: 2
    28.           vis_encode_type: simple
    29.         learning_rate: 0.0003
    30.         use_actions: false
    31.         use_vail: false
    32.         demo_path: D:\UnityProjects\SkutknackareML\Assets\Demo\Skutknackare.demo
    33.     max_steps: 80000000
    34.     time_horizon: 64
    35.     summary_freq: 10000
     

    Attached Files:

  6. gft-ai

    gft-ai

    Joined:
    Jan 12, 2021
    Posts:
    44
    One thing i can suggest is changing the observation from localposition of the rock and the agent to the rock’s relative distance from the agent (hammer). You can easily achieve this by using the inversetransformpoint and/or inverstransformdirection. If you are not so familiar with these methods you can read up on it (a simple google search will give you the documentation).

    after that you might also decide to optimise the reward functions as well to promote the agent reaching the goal faster and more efficiently.

    gail can also be useful but from my experience it is quite tricky to get the configurations setup just right and i havent had much success (or better results) with it but bear in mind I didnt dig really deep into it so maybe you can get some real good results. Please let me know in that case :)

    also something to consider is in unity the ppo network draws outputs from a probability distribution. Which results in different outputs given the same input. In simulation this will not matter so much as the agent will eventually achieve the goal. But if you are planning to control real engines you need to keep this in mind as the outputs from the model could be oscillating from one extreme to the other frequently. This is also the problem i am currently facing and planning to use the low level python api to use my own custom deterministic models. You can also let me know if you face such problems later on :)

    i hope this helped, let me know how you progress. Good luck!
     
    Kjelle69 likes this.
  7. Kjelle69

    Kjelle69

    Joined:
    Nov 9, 2016
    Posts:
    53
    I changed both the hammer and the rocks observations to global coordinates. I did some Gail training and started off with a Demo-file. I added 3D Ray Perception Sensor. I also calculated the distance between the Hammer and The Rock and gave a small reward when the distance decreased and vice versa when it increased. almost 40 000 000 steps and the reward curve still increases and it finds the goal. Very fascinating indeed. There are some dips in the reward curve but the general direction is up.
     

    Attached Files:

  8. Kjelle69

    Kjelle69

    Joined:
    Nov 9, 2016
    Posts:
    53
    After a lot of work, tests and studies I finally managed to train the Rock Breaker model with the moving goal. . After each fail I tried to add more functionality to the Machine Learning Agent in order to help the model find the solution. I added a 3D Ray Perception Sensor (Similar to Lidar) and also used GAIL, (Generative Adversarial Imitation Learning) in order to succeed. I also added an algorithm which gives rewards and penalties depending of the distance, magnitude and direction of the goal and targets transforms (Position and rotations). And... at last, the slowly updating cumulative reward in the Tensorboard got a steady upward trend! After 40 million steps it finally managed to find the goal (rock) regardless of its position and rotation. The video shows the stand alone running simulation using the Neural Network created from the training sessions. This is at the same time a bit frightening yet very fascinating.

    As you can observe in this video the model sometimes fail but since the cumulative reward still had an upward trend resumed training will improve the Neural Network even more.

    So, what is the next level for the Rock Breaker model to solve, next degree of difficulty? Suggestions and comments are welcome.

     
    gft-ai and PutridEx like this.
  9. gft-ai

    gft-ai

    Joined:
    Jan 12, 2021
    Posts:
    44
    Congratulations!

    I think you can give a little variations such as varying initial position of the rock breaker body, change up the size of the rock and also make the tip of the rock breaker to target a smaller point on the rock! That would be very interesting to see!
     
    Kjelle69 likes this.