Chess game? Do I have to EndEpisode() after I SetReward? No examples of Chess with ML? :(

markashburner · Jan 23, 2021

Hi I am trying to integrate ML Agents with a chess game I've developed.

Except I am not sure what I should do after every move that ML Agent makes....After every move should I SetReward and then EndEpisode? Or can I just SetReward or AddReward after every move and only EndEpisode at the end of the game when it is either a Win for the agent or a draw?

But basically what I am asking is...Do I have to EndEpisode() after I SetReward?

celion_unity · Jan 25, 2021

The only difference between AddReward and SetReward is if you call them multiple times in the same step. So in general, it's fine to call SetReward on a step and not call EndEpisode that step.

For zero-sum games like chess, you should treat each game as an episode, and give +1 reward to the winner and -1 to the loser (and then end the episode).

You could also try giving small rewards (using AddReward) for capturing pieces, such as .01 for a pawn, .02 for a knight, etc. This wouldn't affect the ELO rating but might teach the agent that capturing pieces is valuable (it also might not help or actually hurt the agent in the long run).

markashburner · Jan 25, 2021

celion_unity said: ↑

The only difference between AddReward and SetReward is if you call them multiple times in the same step. So in general, it's fine to call SetReward on a step and not call EndEpisode that step.

For zero-sum games like chess, you should treat each game as an episode, and give +1 reward to the winner and -1 to the loser (and then end the episode).

You could also try giving small rewards (using AddReward) for capturing pieces, such as .01 for a pawn, .02 for a knight, etc. This wouldn't affect the ELO rating but might teach the agent that capturing pieces is valuable (it also might not help or actually hurt the agent in the long run).
Click to expand...

Yeah that's what I had...I used Add Reward 0.1 for pawn etc...for capturing pieces and then AddReward +1 for winning a game...I never called SetReward...only called SetReward when penalising an agent when it made the same move too many times....it worked ok....but could never work out how to checkmate...

so then I changed AddReward to 0.01 for pawn etc...Aggghhhh I give up lol...There is barely anyone to help out with machine learning...and I am not getting any good results...was getting some good results when I set it to AddReward only...and never calling SetReward...and it was ok with taking pieces but not great at the end game and working out checkmate...

Basically all I did was gather all the legal moves it could make and set the agent using one Discrete branch with a branch size of a 120 and then masked all the moves it couldn't make.

Too be honest...there isn't even one clear example of Unity machine learning with board games...which is ridiculous. I always thought the best way to learn machine learning was with board games...so I am surprised that Unity hasn't provided any examples....and then there is virtually no one to help you out...Machine learning is so exciting...but barely anyone in the Unity community is willing to help you out with it...

Anyways this is my config file.

Code (CSharp):

---

behaviors:

ChessMLAgent:

trainer_type: ppo

hyperparameters:

batch_size: 32

buffer_size: 2048

learning_rate: 3.0e-4

beta: 5.0e-4

epsilon: 0.3

lambd: 0.99

num_epoch: 3

learning_rate_schedule: constant

network_settings:

normalize: false

hidden_units: 128

num_layers: 3

reward_signals:

extrinsic:

gamma: 0.99

strength: 1.0

keep_checkpoints: 5

max_steps: 10000000

time_horizon: 32

summary_freq: 1000

threaded: true

self_play:

save_steps: 2000

team_change: 10000

swap_steps: 1000

window: 30

play_against_latest_model_ratio: 0.5

initial_elo: 1200.0

This has been an extremely frustrating endeavour with no clear examples provided by Unity on how to setup a simple board game like chess with machine learning...none whatsoever...

markashburner · Jan 25, 2021

celion_unity said: ↑

The only difference between AddReward and SetReward is if you call them multiple times in the same step. So in general, it's fine to call SetReward on a step and not call EndEpisode that step.

For zero-sum games like chess, you should treat each game as an episode, and give +1 reward to the winner and -1 to the loser (and then end the episode).

You could also try giving small rewards (using AddReward) for capturing pieces, such as .01 for a pawn, .02 for a knight, etc. This wouldn't affect the ELO rating but might teach the agent that capturing pieces is valuable (it also might not help or actually hurt the agent in the long run).
Click to expand...

So do you reckon it will work better if I add no reward when the agent takes a piece? And only SetReward when the Agent wins a game? I haven't tried that yet? But to be honest...my spirits in learning about Unity MLAgents has been shattered...I've been working on this for 3 or 4 days now...and I haven't achieved any significant results...the Agent is subpar...and using normal algorithms instead of ML has been significantly better. Really disappointed with Unity MLAgents.

markashburner · Jan 25, 2021

Code (CSharp):

using System;

using System.Collections;

using System.Collections.Generic;

using System.Security.AccessControl;

using UnityEngine;

using Unity.MLAgents;

using Unity.MLAgents.Actuators;

using Unity.MLAgents.Policies;

using Unity.MLAgents.Sensors;

using Random = System.Random;

using System.Linq;

public class ChessMLAgent : Agent

{

public ChessAgentManager manager;

public cgChessBoardScript board;

public bool isWhite;

public byte searchDepthWeak = 4;

public byte searchDepthStrong = 4;

public byte searchDepthEndGame =4;

public int branchSize;

public cgSimpleMove currentMove;

public List<cgSquareScript> squares = new List<cgSquareScript>();

public List<cgSimpleMove> legalMoves = new List<cgSimpleMove>();

public List<int> branchMask = new List<int>();

public List<int> impossibleMoves = new List<int>();

public override void OnEpisodeBegin()

{

legalMoves = board._abstractBoard.findStrictLegalMoves(isWhite);

}

public override void Heuristic(in ActionBuffers actionsOut)

{

if (manager.mode == MLBoardMode.MLAgentVsMLAgent)

{

Debug.Log("Heuristic Action number is " + actionsOut.DiscreteActions[0]);

currentMove = legalMoves[0];

}

}

public override void CollectObservations(VectorSensor sensor)

{

if (isWhite)

{

//moves = board._abstractBoard.moves;

if (legalMoves.Count != 0)

{

for (int i = 0; i < legalMoves.Count; i++)

{

sensor.AddOneHotObservation(i, legalMoves.Count);

}

}

}

else

{

//moves = board._abstractBoard.moves;

if (legalMoves.Count != 0)

{

for (int i = 0; i < legalMoves.Count; i++)

{

sensor.AddOneHotObservation(i, legalMoves.Count);

}

}

}

}

public override void OnActionReceived(ActionBuffers actions)

{

if (manager.mode == MLBoardMode.MLAgentVsMLAgent)

{

if (impossibleMoves.Count != branchSize)

{

if(actions.DiscreteActions[0] < legalMoves.Count) currentMove = legalMoves[actions.DiscreteActions[0]];

}

else

{

Debug.Log("Board " + manager.agentManagerId + " All moves masked under OnActionsReceived!");

if (board.chessPieces.Count != 0)

{

for (int i = 0; i < board.chessPieces.Count; i++)

{

board.chessPieces[i].GetComponent<cgChessPieceScript>().sqaureMoves.Clear();

board.chessPieces[i].GetComponent<cgChessPieceScript>().numberOfMovesMade = 0;

}

}

bool isChecked = board._abstractBoard.isChecked(isWhite);

if (isChecked)

{

if (isWhite)

{

AddReward(-1f);

manager.Agent2.AddReward(1f);

board._whiteWins += 1;

board.whiteWin.text = "White Games won: " + board._whiteWins;

EndEpisode();

manager.Agent2.EndEpisode();

}

else

{

AddReward(1f);

manager.Agent1.AddReward(1f);

board._blackWins += 1;

board.blackWin.text = "Black Games won: " + board._blackWins;

EndEpisode();

manager.Agent1.EndEpisode();

}

}

if (!isChecked)

{

board.ResetBoard();

manager.Agent1.EndEpisode();

manager.Agent2.EndEpisode();

}

}

}

}

public override void WriteDiscreteActionMask(IDiscreteActionMask actionMask)

{

branchMask.Clear();

legalMoves = board._abstractBoard.findStrictLegalMoves(isWhite);

int alpha = int.MinValue;

int beta = int.MaxValue;

if (legalMoves.Count != 0)

{

for (int i = 0; i < legalMoves.Count; i++)

{

cgSimpleMove possibleMove = legalMoves[i];

byte depth = (cgValueModifiers.AlphaBeta_Strong_Delineation < possibleMove.positionalVal ? searchDepthStrong : searchDepthWeak);

if (legalMoves.Count < 10) depth = searchDepthEndGame;

legalMoves[i].val = board.getEngine._alfaBeta(board._abstractBoard, legalMoves[i], depth, alpha,

beta, false);

}

if (legalMoves.Count > 60)

{

var newList = legalMoves.OrderByDescending(x => x.val).Take(60);

legalMoves = newList.ToList();

}

for (int i = 0; i < legalMoves.Count; i++)

{

branchMask.Add(i);

}

int movesAbove = branchSize - branchMask.Count;

impossibleMoves.Clear();

for (int i = 0; i < movesAbove; i++)

{

impossibleMoves.Add(branchMask.Count + i);

}

if (impossibleMoves.Count == branchSize)

{

Debug.Log("Board " + manager.agentManagerId + " All moves masked!");

if (board.chessPieces.Count != 0)

{

for (int i = 0; i < board.chessPieces.Count; i++)

{

board.chessPieces[i].GetComponent<cgChessPieceScript>().sqaureMoves.Clear();

board.chessPieces[i].GetComponent<cgChessPieceScript>().numberOfMovesMade = 0;

}

}

bool isChecked = board._abstractBoard.isChecked(isWhite);

if (isChecked)

{

if (isWhite)

{

AddReward(-1f);

manager.Agent2.AddReward(1f);

board._whiteWins += 1;

board.whiteWin.text = "White Games won: " + board._whiteWins;

EndEpisode();

manager.Agent2.EndEpisode();

}

else

{

AddReward(1f);

manager.Agent1.AddReward(1f);

board._blackWins += 1;

board.blackWin.text = "Black Games won: " + board._blackWins;

EndEpisode();

manager.Agent1.EndEpisode();

}

}

if (!isChecked)

{

board.ResetBoard();

manager.Agent1.EndEpisode();

manager.Agent2.EndEpisode();

}

}

else

{

actionMask.WriteMask(0, GetMovementSize());

}

}

}

private IEnumerable<int> GetMovementSize()

{

int movesAbove = branchSize - branchMask.Count;

impossibleMoves.Clear();

for (int i = 0; i < movesAbove; i++)

{

impossibleMoves.Add(branchMask.Count + i);

}

return impossibleMoves.ToArray();

}

}

This is my Chess Agent script

markashburner · Jan 25, 2021

Code (CSharp):

if (_getPieceOn(_abstractBoard.SquareNames[move.to]) != null && !(move is cgCastlingMove))

{

if (_getPieceOn(_abstractBoard.SquareNames[move.to]).type == cgChessPieceScript.Type.BlackPawn)

{

whiteAgent.AddReward(0.1f);

blackAgent.AddReward(-0.1f);

}

if (_getPieceOn(_abstractBoard.SquareNames[move.to]).type == cgChessPieceScript.Type.BlackKnight)

{

whiteAgent.AddReward(0.3f);

blackAgent.AddReward(-0.3f);

}

if (_getPieceOn(_abstractBoard.SquareNames[move.to]).type == cgChessPieceScript.Type.BlackBishop)

{

whiteAgent.AddReward(0.3f);

blackAgent.AddReward(-0.3f);

}

if (_getPieceOn(_abstractBoard.SquareNames[move.to]).type == cgChessPieceScript.Type.BlackRook)

{

whiteAgent.AddReward(0.5f);

blackAgent.AddReward(-0.5f);

}

if (_getPieceOn(_abstractBoard.SquareNames[move.to]).type == cgChessPieceScript.Type.BlackQueen)

{

whiteAgent.AddReward(0.9f);

blackAgent.AddReward(-0.9f);

}

if (_getPieceOn(_abstractBoard.SquareNames[move.to]).type == cgChessPieceScript.Type.WhitePawn)

{

whiteAgent.AddReward(-0.1f);

blackAgent.AddReward(0.1f);

}

if (_getPieceOn(_abstractBoard.SquareNames[move.to]).type == cgChessPieceScript.Type.WhiteKnight)

{

whiteAgent.AddReward(-0.3f);

blackAgent.AddReward(0.3f);

}

if (_getPieceOn(_abstractBoard.SquareNames[move.to]).type == cgChessPieceScript.Type.WhiteBishop)

{

whiteAgent.AddReward(-0.3f);

blackAgent.AddReward(0.3f);

}

if (_getPieceOn(_abstractBoard.SquareNames[move.to]).type == cgChessPieceScript.Type.WhiteRook)

{

whiteAgent.AddReward(-0.5f);

blackAgent.AddReward(0.5f);

}

if (_getPieceOn(_abstractBoard.SquareNames[move.to]).type == cgChessPieceScript.Type.WhiteQueen)

{

whiteAgent.AddReward(-0.9f);

blackAgent.AddReward(0.9f);

}

if(_getPieceOn(_abstractBoard.SquareNames[move.to]).type == cgChessPieceScript.Type.BlackKing)

{

Debug.Log("Black King piece was taken!");

if (chessPieces.Count != 0)

{

for (int i = 0; i < chessPieces.Count; i++)

{

chessPieces[i].GetComponent<cgChessPieceScript>().sqaureMoves.Clear();

chessPieces[i].GetComponent<cgChessPieceScript>().numberOfMovesMade = 0;

}

}

_abstractBoard.revert();

return;

}

if(_getPieceOn(_abstractBoard.SquareNames[move.to]).type == cgChessPieceScript.Type.WhiteKing)

{

Debug.Log("White King piece was taken!");

if (chessPieces.Count != 0)

{

for (int i = 0; i < chessPieces.Count; i++)

{

chessPieces[i].GetComponent<cgChessPieceScript>().sqaureMoves.Clear();

chessPieces[i].GetComponent<cgChessPieceScript>().numberOfMovesMade = 0;

}

}

_abstractBoard.revert();

return;

}

_setDeadPiece(_getPieceOn(_abstractBoard.SquareNames[move.to]));

}

markashburner · Jan 25, 2021

Code (CSharp):

public void _gameMLOver( bool whiteWins, bool blackWins, ChessMLAgent whiteAgent, ChessMLAgent blackAgent, bool movedTooManyTimes, bool isWhite)

{

string gameOverString = "Game Over. ";

if (whiteWins)

{

whitePlayer.AddReward(1f);

blackPlayer.AddReward(-1f);

_whiteWins += 1;

whiteWin.text = "White Games won: " + _whiteWins;

whitePlayer.EndEpisode();

blackPlayer.EndEpisode();

ResetBoard();

// gameOverString = "White Wins!";

}

else if( blackWins)

{

blackPlayer.AddReward(1f);

whitePlayer.AddReward(-1f);

_blackWins += 1;

blackWin.text = "Black Games won: " + _blackWins;

whitePlayer.EndEpisode();

blackPlayer.EndEpisode();

ResetBoard();

}

if (!blackWins && !whiteWins)

{

_draws += 1;

draws.text = "Draws: " + _draws;

if (!movedTooManyTimes)

{

gameOverString = "Its a draw!";

whiteAgent.AddReward(-0.75f);

blackAgent.AddReward(-0.75f);

whiteAgent.EndEpisode();

blackAgent.EndEpisode();

ResetBoard();

}

else

{

if (isWhite)

{

whiteAgent.SetReward(-1f);

blackAgent.AddReward(-0.25f);

whiteAgent.EndEpisode();

blackAgent.EndEpisode();

ResetBoard();

}

else

{

whiteAgent.AddReward(-0.25f);

blackAgent.SetReward(-1f);

whiteAgent.EndEpisode();

blackAgent.EndEpisode();

ResetBoard();

}

}

if (chessPieces.Count != 0)

{

for (int i = 0; i < chessPieces.Count; i++)

{

if (chessPieces[i] != null)

{

chessPieces[i].GetComponent<cgChessPieceScript>().sqaureMoves.Clear();

chessPieces[i].GetComponent<cgChessPieceScript>().numberOfMovesMade = 0;

}

}

}

}

}

celion_unity · Jan 26, 2021

markashburner said: ↑

Yeah that's what I had...I used Add Reward 0.1 for pawn etc...for capturing pieces and then AddReward +1 for winning a game...
Click to expand...

This isn't quite what I said. With what you said, an agent could get a higher reward than the opponent by capturing pieces, but still lose. The self-play system treats the agent with the higher reward as the winner, so you'd be "encouraging" to grab a bunch of pieces instead of necessarily winning. With smaller rewards for capturing, it learns that capturing pieces is better than not capturing pieces, but winning is still the most important thing.

markashburner said: ↑

only called SetReward when penalising an agent when it made the same move too many times....
Click to expand...

I wouldn't recommend that; if it actually repeats the moves enough to trigger a stalemate, then end the match and give both sides 0.

markashburner said: ↑

but could never work out how to checkmate
Click to expand...

The algorithms that we use don't know how to "look ahead" for things like this.

markashburner said: ↑

there isn't even one clear example of Unity machine learning with board games
Click to expand...

It's not exactly a board game, but we do have an example of a Match-3 game in ML-Agents.

markashburner said: ↑

I always thought the best way to learn machine learning was with board games...so I am surprised that Unity hasn't provided any examples
Click to expand...

I'm sorry if anything in our documentation gave this impression. Reinforcement learning (which is most of what ML-Agents is) is good for continuous environments where the "rules" for going from one step to the next aren't always known, which is the case in a lot of games or robotics. RL can also be used board games (for example, AlphaGo, AlphaZero, and MuZero). But AlphaGo and AlphaZero need to be told the "rules" of the game so that they can "simulate" possible future states; MuZero doesn't need the rules, but we haven't tried to implement it and I'm unsure how much more computing power it would take.

If you're interested in learning more "traditional" Game AI (not machine learning based), something like Monte Carlo Tree Search (MCTS) is a very powerful technique, and (if I understand correctly) it's also part of the basis for AlphaGo and AlphaZero.

markashburner · Jan 26, 2021

celion_unity said: ↑

This isn't quite what I said. With what you said, an agent could get a higher reward than the opponent by capturing pieces, but still lose. The self-play system treats the agent with the higher reward as the winner, so you'd be "encouraging" to grab a bunch of pieces instead of necessarily winning. With smaller rewards for capturing, it learns that capturing pieces is better than not capturing pieces, but winning is still the most important thing.

I wouldn't recommend that; if it actually repeats the moves enough to trigger a stalemate, then end the match and give both sides 0.

The algorithms that we use don't know how to "look ahead" for things like this.

It's not exactly a board game, but we do have an example of a Match-3 game in ML-Agents.

I'm sorry if anything in our documentation gave this impression. Reinforcement learning (which is most of what ML-Agents is) is good for continuous environments where the "rules" for going from one step to the next aren't always known, which is the case in a lot of games or robotics. RL can also be used board games (for example, AlphaGo, AlphaZero, and MuZero). But AlphaGo and AlphaZero need to be told the "rules" of the game so that they can "simulate" possible future states; MuZero doesn't need the rules, but we haven't tried to implement it and I'm unsure how much more computing power it would take.

If you're interested in learning more "traditional" Game AI (not machine learning based), something like Monte Carlo Tree Search (MCTS) is a very powerful technique, and (if I understand correctly) it's also part of the basis for AlphaGo and AlphaZero.
Click to expand...

Thanks heaps for the reply...I will teach it the rules of chess...

ThadJunior · Oct 3, 2022

Hi @markashburner ,

I’m looking into using MLAgents for a chess game. However, I get the feeling that the 64 positions with 64 moves + different behaviour for the pieces + different board situations + small chance of winning compared to stalemate, threefold repetition, etc. is a very difficult environment to learn the AI to win by reinforcement learning.

Could you share how your project is going?
I’m very interested in your progress/achievements.

Search Unity

Chess game? Do I have to EndEpisode() after I SetReward? No examples of Chess with ML? :(

markashburner

celion_unity

markashburner

markashburner

markashburner

Attached Files:

ChessMLAgent.cs

markashburner

markashburner

celion_unity

markashburner

ThadJunior

Search Unity

Unity ID

Useful Searches

Chess game? Do I have to EndEpisode() after I SetReward? No examples of Chess with ML? :(

Attached Files: