Hi. I am a Spanish university student doing my final degree project with ML-Agents and I would like to ask for some help to improve it. I am trying to create an artificial intelligence with the help of ML-Agents (release 12, Package v1.7.2) that creates a defense animation in a 3D fighting videogame. For this, the project has the following characteristics: Character defender: A full rigged character that has different colliders on his body (head, chest, legs and arms) and grabs a sword (the agent of this project). If the agent collides with one of these colliders, we set a reward of -1 and we end the episode. Sword defender (Agent): A sword with a rigidbody and a collider (and all the stuff of the agent). His objective is to collide with the enemy sword, trying to imitate a "real" clash of swords, and thus defend the defender. For that to happen, I have a ChildObject placed right in the middle of each sword. Using "collision.GetContact(0).point" of the function "OnCollisionEnter", I calculate the distance from this ChildObject of the two swords to the point of collision. The shorter this distance, the greater the reward. Enemy: A full rigged character with a sword that has an attack animation. This sword has a collider. When the animation is played, if the sword collides wit any collider of the charcater defender, we set a reward of -1 and we end the episode. Also, to simulate that the defender is moving the sword, I use inverse kinematics. If the position of the agent goes beyond certain distances (distances hard coded), we set a reward of -1 and we end the episode. This also prevents the agent from going straight to the enemy sword. Actions The agent can move "freely" in all axis and rotate in X and Z (not in Y because it only spins itself). These actions are discrete actions, as the defender should be able to move and rotate the sword at the same time. Observations Currently, the variables observed by the agent are: Position, rotation and velocity of the defender's sword (the agent). Position of each part of the defender that has a collider. Position of the enemy sword. The space size of this vector is 97 and I have a value of 5 on the argument of Stacked Vectors. Configuration (.yaml) Code (CSharp): behaviors: DefendSamurai: trainer_type: ppo hyperparameters: batch_size: 128 buffer_size: 2048 learning_rate: 0.0003 beta: 0.005 epsilon: 0.2 lambd: 0.95 num_epoch: 3 learning_rate_schedule: linear network_settings: normalize: false hidden_units: 640 num_layers: 4 vis_encode_type: simple reward_signals: extrinsic: gamma: 0.99 strength: 1.0 keep_checkpoints: 5 max_steps: 2000000 time_horizon: 2048 summary_freq: 50000 threaded: true Training The enemy has a list of animations. In an episode, I check which animation comes next and I move the position of the character enemy to perform the animation. Therefore, the resulting neural network will have to be configured (the weights and so on) to stop the enemy sword in all animations. Results Obviously, if I am making this thread it is because the results are not as expected. Low mean reward and the sword bearly collide with the other sword. I have attached two images of the Tensorboard in case it is useful to analyse what is happening. So... What can I do to improve this? I have several ideas that can bring me closer to the solution. Firstly, observations. It may be that the position of the enemy sword is not enough and the agent needs more information. Secondly, the .yaml file. I have a very brief understanding of it and maybe if I change some of the configuration parameters it will make it easier for the agent to learn. If you are willing to help, do not hesitate to ask me anything about the project that you have doubts or that is not explained in this thread. And if you've read this far, thank you for your time and I hope you can give me a hand with this. I will be eternally grateful. Kind regards.