Question In multi-agent environment, agents don't get reward and observation, and end episode at right time

daishiqin1996 · Nov 17, 2022

Hi

I encountered issue of not able to control the correct execution order of multiple agents. It means I can't let an agent to add reward and observation at right time so the information to be added is not updated yet. I also can't end episode at the same time right after the game should be over. So I want to find some kinds of "Lock" to make sure a function only can run after the other one is done.

However, ML-agents is controlled by Academy. I can't just make a script to run funstions in order. I tried to use
Academy.Instance.AutomaticSteppingEnabled = false
but it makes the environment slow, and I only disable it when a character dies, it doesn't work. I also tried to use lock, but it doesn't help either.

Below is the my whole debugging process, but problem is not solved yet.

I made a "Hide and Seek" game with ML-agents. The setting is very simple. There are 2 teams: hiders and seekers. When a hider is collided by one or more seekers, it'll be deactivated. After all of the hiders get caught, all the agents should end epsisode.

Code (CSharp):

using System;

using System.Collections.Generic;

using Unity.MLAgents;

using Unity.MLAgents.Actuators;

using Unity.MLAgents.Sensors;

using UnityEngine;

using UnityEngine.AI;

using UnityEngine.InputSystem;

/// <summary>

/// Base class for game agents

/// </summary>

public class GameAgent : Agent

{

//Set input for players

public InputAction moveInput;

public InputAction dirInput;

[HideInInspector] public float mapSize;

//Player's parameter

public float moveSpeed = 0.5f;

public float rotateSpeed = 200f;

//If in training or inference mode

public bool trainingMode;

//If true, meaning the agent is still activated

public bool alive;

//If true, destroy the hider on the next step

private bool hiderDestroyFlag;

//Player's destinationPosition and rotation on the last step

private Vector3 lastPosition;

private Quaternion lastRotation;

//Player spawner as the parent of all players

private PlayerSpawner playerSpawner;

public List<bool> detected;

//private Color originalColor;

//Steps to freeze seekers, so hiders have preparation time

private int stepLeftToFreeze;

/// <summary>

/// Disable inputs when agent is destroyed.

/// </summary>

private void OnDestroy()

{

moveInput.Disable();

dirInput.Disable();

}

public void OnCollisionEnter(Collision collision)

{

if (collision.gameObject.CompareTag("Seeker") && gameObject.CompareTag("Hider"))

{

//Add reward when get caught as a hider

hiderDestroyFlag = true;

//Turn its camera to black when a hider is caught

var camera = transform.Find("Eye").Find("Camera").GetComponent<Camera>();

camera.clearFlags = CameraClearFlags.SolidColor;

camera.backgroundColor = Color.black;

camera.cullingMask = 0;

}

//Todo: Add the reward at the same time as hider getting caught

if (collision.gameObject.CompareTag("Hider") && gameObject.CompareTag("Seeker"))

{

//Add reward when catch a hider

AddReward(1);

//print("Caught");

}

}

/// <summary>

/// Initialize ML-agent.

/// </summary>

public override void Initialize()

{

//Enable inputs

moveInput.Enable();

dirInput.Enable();

playerSpawner = FindObjectOfType<PlayerSpawner>();

//Get map size

var terrainAndRockSetting = FindObjectOfType<TerrainAndRockSetting>();

mapSize = terrainAndRockSetting.CalculateMapSize() / 2;

//Set the MaxStep as 5000 in training mode, 0 (inf) in inference mode

MaxStep = trainingMode ? 5000 : 0;

}

/// <summary>

/// Heuristic control, where W: go forward, S: go backward, A: turn left, D: turn right.

/// </summary>

/// <param name="actionsOut"></param>

public override void Heuristic(in ActionBuffers actionsOut)

{

var discreteActionsOut = actionsOut.DiscreteActions;

discreteActionsOut[0] = (int)moveInput.ReadValue<float>();

discreteActionsOut[1] = (int)dirInput.ReadValue<float>();

}

/// <summary>

/// Initialize player when episode begins

/// </summary>

public override void OnEpisodeBegin()

{

stepLeftToFreeze = playerSpawner.numStepToFreeze;

alive = true;

gameObject.transform.GetChild(0).gameObject.SetActive(true);

gameObject.transform.GetChild(1).gameObject.SetActive(true);

gameObject.layer = LayerMask.NameToLayer(gameObject.tag);

gameObject.GetComponent<Collider>().enabled = true;

PlayerSpawner.ResetCamera(gameObject.transform);

playerSpawner.RelocatePlayer(gameObject.transform);

GetComponent<Rigidbody>().velocity = Vector3.zero;

GetComponent<Rigidbody>().angularVelocity = Vector3.zero;

}

/// <summary>

/// Collect obsevrations

/// </summary>

/// <param name="sensor"></param>

public override void CollectObservations(VectorSensor sensor)

{

//Destroy hiders when caught

if (gameObject.CompareTag("Hider") && hiderDestroyFlag)

{

AddReward(-1);

hiderDestroyFlag = false;

alive = false;

gameObject.transform.GetChild(0).gameObject.SetActive(false);

gameObject.transform.GetChild(1).gameObject.SetActive(false);

gameObject.GetComponent<Collider>().enabled = false;

gameObject.layer = LayerMask.NameToLayer("Ignore Raycast");

}

sensor.AddObservation(alive);

sensor.AddObservation(PlayerSpawner.CountActiveNumHider(transform.parent.gameObject));

if (gameObject.CompareTag("Seeker")) AddReward(-0.1f);

//Add reward for surviving each step

if (gameObject.CompareTag("Hider") && alive)

AddReward(0.1f);

}

/// <summary>

/// Update agent's status when action is received.

/// </summary>

/// <param name="actionBuffers"></param>

public override void OnActionReceived(ActionBuffers actionBuffers)

{

//transform.Find("Body").GetComponent<Renderer>().material.color = originalColor;

if (detected.Count > 0)

{

//transform.Find("Body").GetComponent<Renderer>().material.color = Color.yellow;

detected.Clear();

}

if (gameObject.CompareTag("Seeker") && stepLeftToFreeze > 0)

{

stepLeftToFreeze--;

return;

}

if (alive)

MoveAgent(actionBuffers.DiscreteActions);

}

/// <summary>

/// Move agent by control.

/// </summary>

/// <param name="act"></param>

public virtual void MoveAgent(ActionSegment<int> act)

{

var dirToGo = Vector3.zero;

var rotateDir = Vector3.zero;

var flag = false;

dirToGo = transform.forward * act[0];

rotateDir = Vector3.up * act[1];

transform.Rotate(rotateDir, Time.deltaTime * rotateSpeed);

GetComponent<Rigidbody>().velocity = dirToGo * moveSpeed;

if (act[0] != 0)

{

GetComponent<PlaceObjectsToSurface>().StartPlacing(moveSpeed * dirToGo,true, false);

}

}

}

I first tried to
EndEpisode ()
to in each agent's script. However, when they count the remaining hiders, some of them count after other agents have already ended and began episodes. So I created an
AgentManager.cs
to end episodes for all agents.

Code (CSharp):

public class AgentManager : MonoBehaviour

{

public bool ifEndEpisode;

private PlayerSpawner playerSpawner;

private Transform[] players;

private void Awake()

{

playerSpawner = FindObjectOfType<PlayerSpawner>();

players = new Transform[playerSpawner.playerSpawner.transform.childCount];

for (var i = 0; i < playerSpawner.playerSpawner.transform.childCount; i++)

{

players[i] = playerSpawner.playerSpawner.transform.GetChild(i);

}

}

// Update is called once per frame

void Update()

{

if (PlayerSpawner.CountActiveNumHider(playerSpawner.playerSpawner)==0)

{

for (var i = 0; i < players.Length; i++)

{

players[i].GetComponent<GameAgent>().EndEpisode();

}

}

}

}

It doesn't have the counting issue, but there's always redudant observation and actions after hiders died, meaning the episode was ended with delay.

I think I have to stop the autostep by setting
Academy.Instance.AutomaticSteppingEnabled = false
, then use
EnvironmentStep()
manually. However, the game becomes very slow, I'm really confused.

Then I tried to disable auto stepping only when a hider is caught. Because I think the order of ML-agents is request action, process action, make observation and add rewards (Please let me know if I'm wrong)

Code (CSharp):

using System;

using System.Collections;

using System.Collections.Generic;

using Unity.MLAgents;

using Unity.VisualScripting.Dependencies.NCalc;

using UnityEngine;

public class AgentManager : MonoBehaviour

{

private PlayerSpawner playerSpawner;

private List<GameAgent> hiders;

private List<GameAgent> seekers;

public bool[] aliveFlag;

private void Awake()

{

playerSpawner = FindObjectOfType<PlayerSpawner>();

seekers = new List<GameAgent>();

hiders = new List<GameAgent>();

for (var i = 0; i < playerSpawner.playerSpawner.transform.childCount; i++)

{

if (playerSpawner.playerSpawner.transform.GetChild(i).CompareTag("Seeker"))

{

seekers.Add(playerSpawner.playerSpawner.transform.GetChild(i).GetComponent<GameAgent>());

}

else

{

hiders.Add(playerSpawner.playerSpawner.transform.GetChild(i).GetComponent<GameAgent>());

}

}

aliveFlag = new bool[hiders.Count];

ResetAliveFlag();

}

private void ResetAliveFlag()

{

for (var i = 0; i < hiders.Count; i++)

{

aliveFlag[i] = true;

}

}

// Update is called once per frame

void FixedUpdate()

{

for (var i = 0; i < hiders.Count; i++)

{

if (hiders[i].alive != aliveFlag[i] && !hiders[i].alive)

{

for (var j = 0; j < seekers.Count; j++)

{

seekers[j].SetReward(1f);

//Skip adding time-consuming reward if a hider is caught

seekers[j].skipReward = true;

}

hiders[i].SetReward(-1f);

}

aliveFlag[i] = hiders[i].alive;

}

if (CountActiveNumHider() == 0)

{

for (var i = 0; i < hiders.Count; i++)

{

hiders[i].EndEpisode();

}

for (var i = 0; i < seekers.Count; i++)

{

seekers[i].EndEpisode();

}

ResetAliveFlag();

}

Academy.Instance.AutomaticSteppingEnabled = true;

}

public int CountActiveNumHider()

{

if (aliveFlag.Length == 0)

return 0;

var numHider = 0;

foreach (var flag in aliveFlag)

if (flag)

numHider++;

return numHider;

}

}

I also changed the agent's script, by setting alive = false in
OnCollisionEnter
And add
Academy.Instance.AutomaticSteppingEnabled = false
when it's not alive.

Code (CSharp):

public void OnCollisionEnter(Collision collision)

{

if (collision.gameObject.CompareTag("Seeker") && gameObject.CompareTag("Hider"))

{

//Turn its camera to black when a hider is caught

var camera = transform.Find("Camera").GetComponent<Camera>();

camera.clearFlags = CameraClearFlags.SolidColor;

camera.backgroundColor = Color.black;

camera.cullingMask = 0;

alive = false;

skipReward = true;

gameObject.transform.Find("Body").gameObject.SetActive(false);

gameObject.transform.Find("Eye").gameObject.SetActive(false);

gameObject.GetComponent<Collider>().enabled = false;

gameObject.layer = LayerMask.NameToLayer("Ignore Raycast");

}

}

public override void CollectObservations(VectorSensor sensor)

{

if (!alive) Academy.Instance.AutomaticSteppingEnabled = false;

sensor.AddObservation(alive);

sensor.AddObservation(agentManager.CountActiveNumHider());

//Add reward for surviving each step

if (gameObject.CompareTag("Hider") && alive)

AddReward(0.1f);

//Only add time-consuming reward when no hiders is caught by all seekers

if (gameObject.CompareTag("Seeker") && !skipReward)

{

AddReward(-0.1f);

}

//Reset skip to false if just skip

//Todo: might be problematic when seekers catch hiders in both consecutive observations

if (gameObject.CompareTag("Seeker") && skipReward)

skipReward = false;

step++;

}

This time, when the hiders get a penelty of getting caught, the next state still shows it's alive. So still the environment keeps stepping after I disabled auto stepping.

Then I tried to use threading lock.

Code (CSharp):

public void OnCollisionEnter(Collision collision)

{

if (collision.gameObject.CompareTag("Seeker") && gameObject.CompareTag("Hider"))

{

lock (balanceLock)

{

//Turn its camera to black when a hider is caught

var camera = transform.Find("Camera").GetComponent<Camera>();

camera.clearFlags = CameraClearFlags.SolidColor;

camera.backgroundColor = Color.black;

camera.cullingMask = 0;

alive = false;

skipReward = true;

gameObject.transform.Find("Body").gameObject.SetActive(false);

gameObject.transform.Find("Eye").gameObject.SetActive(false);

gameObject.GetComponent<Collider>().enabled = false;

gameObject.layer = LayerMask.NameToLayer("Ignore Raycast");

}

}

}

public override void CollectObservations(VectorSensor sensor)

{

lock (balanceLock)

{

if (!alive) Academy.Instance.AutomaticSteppingEnabled = false;

sensor.AddObservation(alive);

sensor.AddObservation(agentManager.CountActiveNumHider());

//Add reward for surviving each step

if (gameObject.CompareTag("Hider") && alive)

AddReward(0.1f);

//Only add time-consuming reward when no hiders is caught by all seekers

if (gameObject.CompareTag("Seeker") && !skipReward)

{

AddReward(-0.1f);

}

//Reset skip to false if just skip

//Todo: might be problematic when seekers catch hiders in both consecutive observations

if (gameObject.CompareTag("Seeker") && skipReward)

skipReward = false;

step++;

}

}

However, it still doesn't work.

Anyone has any idea? Any help would be much appreciated.

fraa1197 · Nov 17, 2022

Multiple agents are working for you? For me only one ever moves. Is there a certain setting you have to turn on?

daishiqin1996 · Nov 17, 2022

fraa1197 said: ↑

Multiple agents are working for you? For me only one ever moves. Is there a certain setting you have to turn on?
Click to expand...

I'm not sure what happened to you. For me I just created a script for each agent, then it works

fraa1197 · Nov 17, 2022

daishiqin1996 said: ↑

I'm not sure what happened to you. For me I just created a script for each agent, then it works
Click to expand...

I guess I made a mistake.
¯\_(ツ)_/¯

Search Unity

Unity ID

Useful Searches

Question In multi-agent environment, agents don't get reward and observation, and end episode at right time

daishiqin1996

fraa1197

daishiqin1996

fraa1197