Search Unity

How to encourage overtaking in racing mlagent

Discussion in 'ML-Agents' started by hasseyg, Apr 5, 2021.

  1. hasseyg

    hasseyg

    Joined:
    Nov 16, 2013
    Posts:
    68
    Hi, I have successfully got my mlagent to drive around the track the way i want it to. However i then realised after running inference on multiple agents on the same track, they would collide with each other while trying to move into the center of the track. Therefore I needed to add opponent avoidance. The mlagent is using a mixture of rays and spline information to navigate around the track, so i added another set of rays casting out towards the front 180 degrees, set to only detect the car physics layer and not the track. I am using 220 hidden units and 3 layers.

    For anyone with experience in solving this problem, I want to ask is this the correct approach and do you think the unit and layer number needs to be changed. I ask this because adding these extra rays is affecting the trained model negatively, as it is now colliding with the track walls, where as before it never touched them. Also the mlagents do not seem to want to overtake each other and just stay back behind the other.

    thanks
     
    Luke-Houlihan likes this.
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    379
    In my experience, penalizing agents for being overtaken makes them a bit more aggressive and actively try to block others. Rewarding them for overtaking others is more tricky, because agents will likely exploit this by overtaking, falling back, overtaking again etc. I wouldn't expect too much overtaking happening in general though if all agents have equivalent driving skills.
     
    Luke-Houlihan likes this.
  3. hasseyg

    hasseyg

    Joined:
    Nov 16, 2013
    Posts:
    68
    Ok, I suppose in my case having them overtake is not a necessity and the priority should be just on avoiding collisions. I think I will just try increasing the hidden units and layers, as adding more observations may require more complexity of the brain.

    thanks
     
  4. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    267
    Hi @hasseyg, this is a really interesting question! @mbaske already gave a good answer so I'll try to just add to it.

    I try to break down different types of training conceptually in complex cases to come up with a plan to get the wanted behavior. I use the term 'Mechanistic' to describe behavior that is basic or directly related to the mechanics of the environment/game and I use the term 'Strategic' to describe behavior that involves planning, or well, strategic thinking. An example would be if you wanted to walk across a room, the movements your legs and arms make to get you to your destination would be mechanistic behaviors while picking the point you want to end up and the path you take to get there would be strategic. The distinction is important because behaviors that are mechanistic tend to be short term and need some reward sculpting while strategic behaviors shouldn't be shaped by reward and must be shaped by optimization or competition.

    To break down your racing training I would say anything involved with driving the car around the track is mechanistic (accelerating, braking, turning, collision avoidance, stuff like that). While some of the behavior you're looking for is strategic (completing laps, positioning, overtaking). To explain why you shouldn't reward strategic behaviors directly imagine you had an agent that would place first in 10% of races, second place 30%, third place 50%, and crashes 10%. That's a fairly skilled agent. By adding incentive to overtake and/or punishment for being overtaken you may end up with an agent that places first in 20% of races but crashes the other 80%. This is because the agents will fight to overtake each other and defend from being overtake more aggressively as a strategy that is artificially imposed instead of other strategies that may be more beneficial such as more conservative approaches or taking more sparse calculated risks positionally.

    That can only be answered with testing, if the policy isn't training you may need to add more, if it's overfitting you may need to take some away.

    In order to reduce some complexity here you can re-use the same ray casts to detect both tags but that depends on the orientation of the existing casts. I would try to consolidate them into one component that detects both but uses less rays overall.

    Hope this helps!
     
  5. hasseyg

    hasseyg

    Joined:
    Nov 16, 2013
    Posts:
    68
    Hi, thanks for the help. I have tested lots of different setups since my last post and have settled on using the rays to detect both the track and cars (detectable tags), as you suggested and I have also removed the rewards I had initially, for staying in the center of the track and facing the direction of the spline. This has resulted in some overtaking happening when a car in front is slower and the car behind naturally goes around it.

    However with all the different setups I have tried I keep running into the same problem when I come to running inference. When the cars start off from the starting grid, the cars behind the front positions seem to immediatley go backwards or sidewards and this results in crashes. This is obviously because they detect a car in front of them and they have not had enough training in starting from formation on the starting grid. This is due to the cars initially starting off this way in the training environment, but then gradually getting dispersed out, as they finish each of their episodes and start from the beginning of the track again, this time not in the formation.

    Is it possible that once an agent has finished their episode, they can then be paused somehow until all the other agents have also finished and then they can all start again from the starting grid. This way they should be able to get plenty of training for this situation. I ask this question because I am not aware there is a way to do this as it seems that when an episode finishes it seems to just call OnEpisodeBegin automatically.
     
  6. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    379
    Please see this thread https://forum.unity.com/threads/how-to-stop-agent-from-automatically-respawning.1089658/
     
  7. hasseyg

    hasseyg

    Joined:
    Nov 16, 2013
    Posts:
    68
    Hi, ok thanks, so it is possible then. I was previously ending the episode when OnCollisionEnter was called or 90 seconds had elapsed. What I have done is to just allow the agent to continue after a collision instead, so that they all finish after the 90 seconds. This ensures they all start again at the same time every time they do a lap, therefore getting the necessary amount of starting grid experience.

    thanks
     
unityunity