Search Unity

  1. We are migrating the Unity Forums to Unity Discussions by the end of July. Read our announcement for more information and let us know if you have any questions.
    Dismiss Notice
  2. Dismiss Notice

Question POCA based Agents, need help with training

Discussion in 'ML-Agents' started by utkarshdwivedi3997, Apr 28, 2024.

  1. utkarshdwivedi3997


    Aug 11, 2017
    Hi, we're trying out ML Agents for our game Fling to the Finish, simply to see its viability right now. I have conducted many training experiments with different scenarios and I feel getting feedback from the Unity ML Agents team will help us move forward in the right direction. This will be a relatively lengthy post so please bear with me!


    Fling is a game where a team of two spherical characters are tethered together by a rope. These 2 characters can roll around individually, but the rope forces always affect them. Their goal is to get through an obstacle course. There is also a race mode, where multiple teams of 2 race each other to get to the finish line first.

    The rope is made of 13 configurable joints.
    Player movement is physics based (we use AddForce and AddTorque) to roll the characters.
    Other player controls are:
    Jump - Tap to jump, hold to jump higher / longer
    Stick - The players can stick to any surfaces while holding a button
    Fling - A character can "pull" or fling their partner towards them using the rope by tapping the Fling button

    So an example "obstacle" in the game would be a high wall. Both characters would alternate between sticking to the wall, then fling their partner up above them, then switching roles and repeating until they get on top of the wall.

    What my goal with ML Agents is

    I want to train agents that can meet the following criteria:

    1. Learn the controls, and get to the finish line as a team. (Both characters in a team = agents)
    2. Learn to team up with a real human (Only one character in a team = agent)
    3. Learn to win as a team (hence MA-POCA)

    The goal is to create bots that players can race with, and against.

    What I've tried so far

    I've looked at the Crawler and the two MA-POCA samples that ship with Unity ML-Agents and used them as a basis for Fling's agents. I give them the following information:


    1. Agent position, velocity
    2. Partner's position, velocity
    3. Next checkpoint position
    4. Forward vector of team camera (camera forward affects move horizontal and vertical inputs)
    5. Average team velocity (agent + partner velocity average)
    6. Goal / Target Velocity of the team (what average direction and max speed should the team be at each step)
    7. Is agent grounded (can jump?)
    8. How long has the agent been jumping for (float)
    9. How long has the agent been providing jump input for (float, this is not the same as 8!)
    10. Is partner grounded?
    11. How long has the partner been jumping for
    12. How long has the partner been providing jump input for
    13. Is stuck to a surface?
    14. How long has the agent been holding the stick input for (float)
    15. Is partner stuck to a surface?
    16. How long has the partner been holding the stick input for (float)

    Currently, there is no information given about the rope, at all. This may be a problem, but I am not sure yet.


    Continuous: 1) horizontal movement, left and right [-1, +1]
    2) vertical movement, forward and backward [-1, +1]

    Discrete: 1) Jump? 0 and 1
    2) Stick? 0 and 1


    I have a RayPerceptionSensor with 5 rays casting outward from each Agent's center, slanted downward a little. This helps them track the ground, and in my testing BEFORE I implemented Jump training, the agents learned to get to the goal while avoiding falling off edges. However, jumping creates issues because the raycasts don't hit anything when the agent is mid air due to a jump, and they don't have any information about where they will land. What I am currently trying is multiple RayPerceptionSensors in all 3D directions (about 25 rays), but I will want to optimize this. I have also experimented with letting the agents control the length and tilt of the rays using additional continuous actions, so that they decide where the rays should point at, but this didn't yield great results, presumably because it would need a lot more training time.

    In a real game scenario this would have to be heavily optimized, which is why I am not even trying training based on the camera render. We support 4 split-screens, which would make performance a nightmare with the camera based perception. So I'm trying to figure out simpler alternatives.

    Rewards are given like this:

    1. Group reward for hitting a checkpoint ( 1.0f / num checkpoints)
    2. Group reward for hitting the finish line (1.0f constant reward), ends training
    3. Group reward for matching average team velocity with average target team velocity (sigmoid curve, currently this is the exact same as in the Crawler example)
    4. Group reward for falling off edges (respawning) (-0.3f constant reward)
    5. Group reward at every step (very small negative value) ( - 1.0f / MaxSteps)
    6. Group reward for not reaching finish line by MaxSteps (-1.0 constant reward), ends training

    I have an individual reward for each of the 6 categories mentioned above, given only to the agent. I can configure which reward should be given to just the agent, just the group, or both.


    As you can see, there is no reward for jumping or sticking at the right moment, as I want the agents to figure that out on their own. Since jumping is not restricted as long as a character is grounded, being able to jump at all times should hopefully teach the agents that jumping is fine, whereas sticking slows them down, so they should only stick when they really need to. In my training, only using the group rewards seems to help them learn to jump and move very quickly towards the goals, but they're struggling with avoiding falling off platforms.

    One thing I am trying right now is curriculum training, but it seems like ML-Agents does not yet support using group reward as a `measure` for thresholds to move on to next lessons. So I'm only using the agent rewards as the measure. The way the curriculum is set up is, the first lesson does not have any jumps or walls, the agents simply have to get through a path. The next lessons activate walls that get higher, to teach them jumping.

    I've not even tried to teach them flinging yet, as that is a very involved mechanic: what the fling direction will be is NOT communicated to the player in any way, but is calculated based on the two vectors, one pointing from the player to the middle rope joint, and the other vector from one player to the other. What is important, though, is that the more stretched the rope is, the better the fling force will be. So players sometimes have to move BACKWARD to stretch the rope while their partner is stuck to a surface, and then get flung by their partner. This is actively AGAINST the velocity match reward that I give the group for getting towards the goal faster. I am very lost on how to approach this problem.


    Any ideas on what I can try to test flinging and improve eyesight? Should I provide videos of what the training currently looks like? How would you approach this problem? I appreciate all the help!