Train agent to push enemies off the platform

Kozaki2 · Mar 22, 2020

Hi, my training area looks like this

I want to train that blue agent to push off red enemies from the platform. Each enemy has tag "Enemy" that is passed to the Ray Perception Sensor 3D component of agent. The inspector of my agent looks like this

Unfortunately I can't see any progress during training. My enemy has no in mind to go in direction of any enemy. Hi is just walking in random(?) directions.

My BotAgent script rewards:
1. Each step -1 / maxStep which is -0002f
2. -1 when agent fall off the platform
3. 5 when agent push off all enemies

Is something wrong with my configuration? Can i do something better?

Antypodish · Mar 22, 2020

I am not using ML-Agents, but I used different ML algorithm.
I would add more reward conditions.

One for getting close to the enemy.

Second for pushing / touching enemy. (First condition may resolve this, then no need for this one condition)

Other, for pushing toward the nearest edge.

celion_unity · Mar 23, 2020

What are the vector observations that you're using? I assume the position of the agent on the platform?

Following up on the ideas from @Antypodish, giving a small reward each time the agent moves the enemy towards the edge might help. You could also try just a single enemy to try to simplify things.

Other than that, I don't see anything obviously wrong. How long are you letting it train? Can you paste some of the trainer output here?

Kozaki2 · Mar 23, 2020

Hi, thanks for tips. So, now I give 0.1f reward for colliding with enemies but I can't see any difference. At the beginning agent is making a lot of collisions bo over time number of those collisions are smaller

My vector observations are for the agent position. I changed space size to 3. Maybe there is something wrong with my bot agent script

Code (CSharp):

using System.Collections.Generic;

using System.Linq;

using MLAgents;

using MLAgents.Sensors;

using UnityEngine;

using Random = UnityEngine.Random;

public class BotAgent : Agent

{

public float speed = 5.0f;

private Rigidbody _rigidBody;

private Vector3 _startPosition;

private Quaternion _startRotation;

private IEnumerable<GameObject> _enemies;

private void Start()

{

_rigidBody = GetComponent<Rigidbody>();

_startPosition = transform.localPosition;

_startRotation = transform.localRotation;

_enemies = transform.parent.FindChildsWithTag("Enemy").Where(enemy => enemy != gameObject);

}

public override void OnEpisodeBegin()

{

if (transform.localPosition.y < -1)

{

_rigidBody.angularVelocity = Vector3.zero;

_rigidBody.velocity = Vector3.zero;

transform.localPosition = _startPosition;

transform.localRotation = _startRotation;

}

foreach (var enemy in _enemies)

{

enemy.transform.localPosition = new Vector3(Random.value * 8 -4, 2, Random.value * 8 - 4);

enemy.GetComponent<Rigidbody>().velocity = Vector3.zero;

enemy.GetComponent<Rigidbody>().angularVelocity = Vector3.zero;

}

}

public override void CollectObservations(VectorSensor sensor)

{

sensor.AddObservation(transform.localPosition);

}

public override void OnActionReceived(float[] act)

{

var controlSignal = Vector3.zero;

controlSignal.x = act[0];

controlSignal.z = act[1];

if (controlSignal != Vector3.zero)

{

transform.rotation = Quaternion.LookRotation(controlSignal);

}

_rigidBody.MovePosition(transform.position + transform.forward * (speed * Time.deltaTime));

var enemies = _enemies.Where(enemy => enemy.transform.localPosition.y > -1);

if (!enemies.Any())

{

Debug.Log("Training: 5.0f reward");

SetReward(5f);

EndEpisode();

}

if (transform.localPosition.y < -1)

{

Debug.Log("Training: -1.0f reward");

SetReward(-1);

EndEpisode();

}

AddReward(-0.0002f);

}

private void OnCollisionEnter(Collision other)

{

if (!other.gameObject.CompareTag("Enemy")) return;

Debug.Log("Training: 0.1f reward");

AddReward(0.1f);

}

}

Output of training: https://pastebin.com/A7e3m21D

celion_unity · Mar 23, 2020

Giving the reward in OnCollisionEnter is probably not going to encourage the behavior you want. You want to do something like (for each enemy):
* save the initial distance of the enemy to the edge as minDistanceToEdge
* at each step, get the distance from the enemy to the edge as currentDistanceToEdge
* if currentDistanceToEdge < minDistanceToEdge. give the agent a reward based on (minDistanceToEdge - currentDistanceToEdge) and set minDistanceToEdge to currentDistanceToEdge.

That way, the Agent only gets rewarded for moving the enemies in the proper direction (they can't keep moving the enemy back and forth to get a reward).

Looking at your logs:
1) 300 seconds isn't that long, you might need to wait longer before you see the behavior that you want.
2) How many Agents and environments are you training at once? If it's only 1 Agent and 1 environment, then something might have happened to it between steps 40000 and 50000. Since your Agent.maxSteps is 5000 but summary_freq is 10000, you should always complete an episode unless something weird happened to your Agent (like getting disabled). If you have more than 1 Agent or environment, ignore this (since summary_freq refers to the total number of Agent steps)

Kozaki2 · Mar 24, 2020

The reward for pushing enemies near edge didn't work. They are making a lot of collisions but only at the beginning of training. I run some tests on "PushBlock" example.

I changed vector space type from discrete to continuous and vector action space size to 2

I modified "move" method to the one from my script

After these changes agents are unable to learn, so maybe the problem is continuous vector space? Maybe it should be set to Discrete but I dont know how to achive my movement function with such a vector type

Code (CSharp):

public override void OnActionReceived(float[] act)

{

var controlSignal = Vector3.zero;

controlSignal.x = act[0];

controlSignal.z = act[1];

if (controlSignal != Vector3.zero)

{

transform.rotation = Quaternion.LookRotation(controlSignal);

}

_rigidBody.MovePosition(transform.position + transform.forward * (speed * Time.deltaTime));

//(...)

}

Search Unity

Unity ID

Useful Searches

Train agent to push enemies off the platform

Kozaki2

Antypodish

celion_unity

Kozaki2

celion_unity

Kozaki2