Search Unity

Question Is it possible to train Food Collector agents as single-agent like a community

Discussion in 'ML-Agents' started by Hsgngr, Jun 14, 2020.

  1. Hsgngr

    Hsgngr

    Joined:
    Dec 28, 2015
    Posts:
    61
    I have a task which can be done as a community. Every agent should avoid each other. This is the first rule, than with curriculum learning task complexity will be increased. The food collector environment fits my need perfectly if I can use them as a single agent since they should act as a community (e.g 100 of them together). Is it possible ? If yes can somebody show the way to the enlightenment , cheers
     
  2. vincentgao88

    vincentgao88

    Unity Technologies

    Joined:
    Feb 7, 2018
    Posts:
    21
  3. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    81
    Reward that is shared among all the agents in a single environment? Very crude implementation of the idea behind centralized training with decentralized execution. I have a strong feeling it would not work that easily (especially for 100 agents).
     
  4. Hsgngr

    Hsgngr

    Joined:
    Dec 28, 2015
    Posts:
    61
    @andrzej_

    Hmm, I would like to teach "how to social distancing" to the agents. I thought it would be easier to keep things simpler with a single agent. Maybe I think wrong, what do you guys suggest for such a task ?
     
  5. Kirsch

    Kirsch

    Joined:
    Dec 7, 2013
    Posts:
    1
    Add negative reward for distance< X ?
    you may create a list of agent transforms and fill it at initiation, and then in Update cycle through the distances to execute the Award rule. I have the opposite task (reward for each ally close than X point to agent) and it works pretty well.
     
  6. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Sounds like what you're after is related to boids / flocking behaviour. You wouldn't need ML here, since it's pretty straight-forward to implement. See code below for starting with a random distribution of agents/boids and then incrementally moving them until they're more or less evenly spaced.

    If you want to solve this with reinforcement learning, you could still loop through the colliders and calculate a centroid, and then set a reward proportional to -centroid.magnitude, because you want to optimize for equidistant positions (centroid.magnitude = 0).

    Code (CSharp):
    1. public class BoidPrefab : MonoBehaviour
    2. {
    3.     [SerializeField] float detectionRadius = 5;
    4.     [SerializeField] float speedMult = 5;
    5.  
    6.     void Update()
    7.     {
    8.         Vector3 pos = transform.position;
    9.         Vector3 centroid = Vector3.zero;
    10.         Collider[] colliders = Physics.OverlapSphere(pos, detectionRadius);
    11.         foreach (Collider c in colliders)
    12.         {
    13.             centroid += (c.transform.position - pos);
    14.         }
    15.         centroid /= (float)colliders.Length;
    16.         transform.Translate(-centroid * Time.deltaTime * speedMult);
    17.     }
    18. }
    19.  
    20. public class BoidTest : MonoBehaviour
    21. {
    22.     [SerializeField] int n = 100;
    23.     [SerializeField] float spawnRadius = 10;
    24.     [SerializeField] GameObject boidPrefab;
    25.  
    26.     void Start()
    27.     {
    28.         for (int i = 0; i < n; i++)
    29.         {
    30.             Instantiate(boidPrefab, Random.insideUnitSphere * spawnRadius, Quaternion.identity, transform);
    31.         }
    32.     }
    33. }
     
    Last edited: Jun 18, 2020
    Hsgngr likes this.
  7. Hsgngr

    Hsgngr

    Joined:
    Dec 28, 2015
    Posts:
    61
    This is awesome, yeah it is much easier to find optimal distance without RL but I would like to add some difficulties such as they need to go market to refill them. They need to wait each other in the market queue while staying away