Resolved Unable to solve problem

GamerLordMat · Aug 13, 2021

Hello everyone,

I want to make a very “simple” thing: A (2D) Rigid object should apply torque so that it’s rotation matches a target rotation. The starting rotation, beginning rotational speed and Target Rotation is randomized.

Basically, I want to make a PID controller, which really isn’t witchcraft.

And it doesn’t work. I tried a lot of reward strategies, and none really converged. The best one was just the dot product between the target and current angle. But the training was unstable. In theory that should lead to an optimal behaviour, bc. the optimal strategy is to get as fast as possible to the goal and stay there.

(Really everything that I would consider useful didn’t work with Machine Learning. )

I also tried all the best practices (normalizing input, Rewards not too high or low, etc.)

I think that there is a really easy answer to that problem, bc I solved it one time while doing another project (but didn’t save the code after I had done some changes later on ☹.)

Has anyone an idea how to solve that primitive problem with mlagents? Maybe some hints on the choice of the hyperparameters?

This problem is important for all kinds of physically driven A.I. (lets say a drone that tries to stabilize itself, a turret that tries to follow a target, or in my case a leg for a funny roboter)

It seems like a challenge to me, please help . I am doing that for 3 days straight now without getting a good solution.

Thank you in advance!

Ps: What I also tried:

- Penalizing for velocity (-Abs(Velocity.magnitude))| Result where ok but could lead to slow movement when not

- Comparing the current distance to the goal and the last one, and giving Reward = (lastDistance – Currentdistance). Worked fine but worse than just dot

- Giving Reward only when it hits the goal +/- some delta (didn’t work at all)

- Using sigmoid function on dot product (didn’t work)

- Init every Agent with the same target or init every Agent individually (also didn’t work)

The main code:

Code (CSharp):

using System.Collections;

using System.Collections.Generic;

using UnityEngine;

using Unity.MLAgents;

using Unity.MLAgents.Actuators;

using Unity.MLAgents.Sensors;

public class LegAgent : Agent

{

public GameObject visual;

Rigidbody rb;

HingeJoint hJoint;

public Quaternion targetRotation;

public override void OnActionReceived(ActionBuffers actions)

{

float f = actions.ContinuousActions[0];

rb.AddTorque(Vector3.right * manager.legStrenght * f);

}

public override void CollectObservations(VectorSensor sensor)

{

//because I rotate only on the x-axis, only the x and w Quaternion components // are changing

Quaternion q1 = Quaternion.Euler((hJoint.angle), 0, 0).normalized;

sensor.AddObservation(new Vector2(q1.x, q1.w));

Quaternion q2 = targetRotation.normalized;

sensor.AddObservation(new Vector2(q2.x, q2.w);

sensor.AddObservation((rb.angularVelocity.x)/rb.maxAngularVelocity);

Quaternion q3 = Quaternion.FromToRotation(targetRotation.ToEuler(), transform.rotation.ToEuler()).normalized;

sensor.AddObservation(new Vector2(q3.x, q3.w));

}

public Mananger manager;

private void FixedUpdate()

{

float dot = (Quaternion.Dot(targetRotation, transform.rotation) + 1) * 0.5f;

AddReward(dot);

}

public override void OnEpisodeBegin()

{

int i = (int)Academy.Instance.EnvironmentParameters.GetWithDefault("lecture", 0.1f);

switch (i)

{

case 0:

//getting same Random for all Agents

rb.velocity = Vector3.zero;

rb.angularVelocity = new Vector3(manager.randomFloat, 0, 0);

delta = manager.delta;

rb.rotation.SetEulerAngles(manager.randomFloat2, 0, 0);

targetRotation = Quaternion.Euler(manager.randomFloat3, 0, 0);

break;

case 1:

//Random input for each Agent

rb.velocity = Vector3.zero;

rb.angularVelocity = new Vector3(Random.Range(-100, 100), 0, 0);

delta = 2;

rb.rotation.SetEulerAngles(Random.Range(-180, 180), 0, 0);

targetRotation = Quaternion.Euler(Random.Range(-180, 180), 0, 0);

lastRotation = Quaternion.FromToRotation(transform.rotation.eulerAngles, targetRotation.eulerAngles);

break;

case 2:

rb.velocity = Vector3.zero;

rb.angularVelocity = new Vector3(Random.Range(0, 0), 0, 0);

delta = 2;

rb.rotation.SetEulerAngles(Random.Range(-180, 180), 0, 0);

targetRotation = Quaternion.Euler(Random.Range(-180, 180), 0, 0);

break;

case 3:

rb.velocity = Vector3.zero;

rb.angularVelocity = new Vector3(Random.Range(0, 0), 0, 0);

delta = 2;

rb.rotation.SetEulerAngles(Random.Range(-180, 180), 0, 0);

targetRotation = Quaternion.Euler(Random.Range(-180, 180), 0, 0);

break;

}

//targetRotation = -160;

}

// Start is called before the first frame update

void Start()

{

hJoint = GetComponent<HingeJoint>();

rb = GetComponent<Rigidbody>();

rb.maxAngularVelocity = 300;

manager = FindObjectOfType<Mananger>();

}

The manager code:

Code (CSharp):

using System.Collections;

using System.Collections.Generic;

using UnityEngine;

public class Mananger : MonoBehaviour

{

public float delta;

public float legStrenght;

public Quaternion randomQuaternion1;

public Quaternion randomQuaternion2;

public float randomFloat;

public float randomFloat2;

public float randomFloat3;

public float min;

public float max;

public bool locked;

// Start is called before the first frame update

void Start()

{

}

// Update is called once per frame

void FixedUpdate()

{

if (!locked)

{

randomQuaternion1 = Random.rotation;

randomQuaternion2 = Random.rotation;

randomFloat = Random.Range(min, max);

randomFloat2 = Random.Range(-180, 180);

randomFloat3 = Random.Range(-180, 180);

}

}

}

GamerLordMat · Aug 19, 2021

So, I solved it!

To sum up my project, I wanted a pointer/object to point in the direction of target, or said differently, to match the rotation of my object to a target object's rotation.

The problem lies in the changing of the angular velocity through applying torque, what is a pretty difficult problem to solve (bc of gravity (acts unevenly on the object depending on the position in the circle) and I added a starting random velocity in each iteration.

The solution:

Firstly a crucial basic thing that I missed was that my Input in Observation was not normalized and the according Hyperparameter was not set to true. Normalizing the input solved it already.

Secondly the best results I could get were with the following Reward, Observations:

Observations:

targetVector (so basically the rotation regarding to a unit circle)

current pointing direction vector (transform.up)

(targetVector – transform.up).normalized (works bc you can interpret the vectors as positions lying on a unit circle)

Vector3.Angle(targetVector, transform.up) * (Mathf.degToRad * lengthOfVectors)

(Radians are basically the distance on a circle between two points on a circle. You could leave out the whole second part if you normalize it with 2 * Pi, what leaves you with 360 (degree). But it shows that you can think of distances even though it is about angles)

rigidbody.angularVelocity.x/rigidbody.maxAngularVelocity;

Reward:

Delta = 0.03f;

If(Vector3.Angle(targetVector, transform.up) * (Mathf.degToRad * lengthOfVectors) < delta)

{

Addreward(1f);

}

So, I apologize if that was explained in too much detail, but I was frustrated not being able to solve that. I hope that helps someone.

If you have any suggestions for improvement/ critic
please make a comment!

Search Unity

Unity ID

Useful Searches

Resolved Unable to solve problem

GamerLordMat

GamerLordMat