Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Need help trying to solve travelling salesman problem with ML-Agents

Discussion in 'ML-Agents' started by jokis125, Mar 14, 2022.

  1. jokis125

    jokis125

    Joined:
    Jul 2, 2015
    Posts:
    4
    Hello everyone! I'm trying to solve the traveling salesman problem with ML-Agents. But I seem to be stuck.

    My plan was to:
    1. Generate data in Vector2 Array
    2. Instantiate gameobjects in the hierarchy (for visualization purposes)
    3. Brute force a solution to the problem through a simple algorithm
    4. Train the agent comparing it's solutions to the brute force solution (and rewarding it accordingly)

    Details about the agent design:
    1. Observation vector would have Space Size of 10 (Trying with 5 stops first, so X and Y coordinates for every stop)
    2. Actions would have 3 Discrete Branches
      1. Swap from (size 5, because 5 stops)
      2. Swap to (size 5, because 5 stops)
      3. Stop swapping (size 2, 0 for continue, 1 for stopping)
    I've implemented all of this, but for some reason, when I'm trying to train the agent, it stops doing anything after the 1st episode. :(

    Episode Initialization looks like this:
    Code (CSharp):
    1.  public override void OnEpisodeBegin()
    2.     {
    3.         CleanBoard(); //Cleans gameobjects from the hierarchy that were generated with DrawBoard() that were left over from the last episode of training
    4.         GenerateBoard(); //Generates data (Vector2[])
    5.         DrawBoard(); //Generates Gameobjects in the hierarchy for visual representation
    6.         bestBoard = new Vector2[stops];
    7.         bestBoardtemp = new Vector2[stops];
    8.         board.CopyTo(bestBoard, 0); //Shallow copy (should be deep, but for testing purposes it's fine)
    9.         board.CopyTo(bestBoardtemp, 0); //Shallow copy (should be deep, but for testing purposes it's fine)
    10.         BruteForceRoute(0, bestBoardtemp); //Brute force way of solving the problem (for training purposes)
    11.  
    12.         //This is probably wrong
    13.         for(var i = 0; i < maxSwapCount; i++)
    14.         {
    15.             RequestDecision();
    16.         }
    17.        
    18.     }
    Observation collection:
    Code (CSharp):
    1.     public override void CollectObservations(VectorSensor sensor)
    2.     {
    3.         for(var i = 0; i < board.Length; i++)
    4.         {
    5.             sensor.AddObservation(board[i]);
    6.         }
    7.        
    8.     }

    OnActionReceived:

    Code (CSharp):
    1.     public override void OnActionReceived(ActionBuffers actionBuffers)
    2.     {
    3.         // Swap x with y elements. Stop episode if agent thinks it's done
    4.         int x = actionBuffers.DiscreteActions[0];
    5.         int y = actionBuffers.DiscreteActions[1];
    6.         int stop = actionBuffers.DiscreteActions[2];
    7.  
    8.         (bestBoard[x], bestBoard[y]) = (bestBoard[y], bestBoard[x]);
    9.        
    10.         DrawSolution();
    11.         if(stop == 1)
    12.         {
    13.             newDist = MeasureRoute(bestBoard); // measures the route distance
    14.             //assuming that 1000 is longest possible path ALSO DOESNT WORK PROPERLY BECAUSE COPYTO = SHALLOWCOPY. Will fix later
    15.             SetReward(Mathf.InverseLerp(1000, bestResult, newDist));
    16.             Debug.Log(GetCumulativeReward());
    17.             EndEpisode();
    18.         }
    19.     }
    I'm sure that I'm missing something obvious, would be very grateful for any help with this!
     
  2. ChillX

    ChillX

    Joined:
    Jun 16, 2016
    Posts:
    145
    One of the agent actions is to end the episode. Not a good idea.
    The agent will try ending the episode as an option to getting a higher reward and once it does that it has no way out.

    Agent should never be able to end an episode on its own except by doing something stupid like jumping off a bridge and or walking into Lava. In each of those cases its a lose scenario with a -1 reward so the agent is unlikely to intentionally try that.

    Instead of letting the Agent end the episode set MaxSteps to some reasonable value and let the agent try to get the best score it can within those Max Steps. Except where it solves the traveling salesman earlier than Max Steps.

    So if you have a time penalty (or penalty for each move) then the agent will try to solve it in the least number of moves possible.
     
  3. jokis125

    jokis125

    Joined:
    Jul 2, 2015
    Posts:
    4
    Thank you for responding, will try to implement that! But I'm still stuck on the agent, not restarting properly after the first episode. From debugging, it seems that the code never goes to OnActionReceived() after the first episode. Maybe you have any idea why that would be the case?
     
  4. Haneferd

    Haneferd

    Joined:
    Feb 3, 2018
    Posts:
    36
    Did you remember to add Decision Requester Component to your Agent GameObject ? upload_2022-3-16_19-36-48.png
     
  5. jokis125

    jokis125

    Joined:
    Jul 2, 2015
    Posts:
    4
    Yup, It's there!
     
  6. ChillX

    ChillX

    Joined:
    Jun 16, 2016
    Posts:
    145
    After each episode destroy and recreate the agent. Maybe just have a prefab of your agent and its stuff and just destroy + instantiate new copy on each episode.

    Easier than figuring out what's missing here.
     
  7. jokis125

    jokis125

    Joined:
    Jul 2, 2015
    Posts:
    4
    Will try that. Thank you very much!