Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Evolutionary approach VS. "Intelligent design" for agent?

Discussion in 'ML-Agents' started by seyyedmahdi69, Jan 3, 2021.

  1. seyyedmahdi69

    seyyedmahdi69

    Joined:
    Dec 2, 2020
    Posts:
    25
    Hi. I apologize if this question isn't really fit for a "machine learning" problem but since I doubt I can ask it anywhere else, here it goes:

    So I have this agent (blue ball) that should collide itself with the green ball for reward. It should also avoid colliding with the red balls (that are moving in random directions) or else it will receive punishment and die. The catch is that all the elements in the Training area will spawn at random positions in the two separate sections except for the boxes (that will spawn evenly in even numbers in random locations in each section). So when the prize is in the same section as the agent, there isn't much problem in avoiding the enemy and reaching the prize. But when they are in different sections, I'd like the agent to be able to push a box towards the middle wall, jump on it and jump inside the other section to collide with the prize, which proved to be tough. It even gets a little suicidal when it can't find the prize.



    This could be done by having some reward given for touching the boxes and making them collide with the inner wall and giving some reward for jumping off the box.

    However, I don't want to spoon-feed all the necessary actions to the agent through a well-designed reward system. I'd like the agent to "learn" to push the boxes and jump on them etc. But the problem is that after a few million steps, there doesn't seem to be a progress in the scenario in which the agent is not in the same section with the prize and every time the agent changes sections, its just an accident.

    The observations provided to the agent are:
    1- agent's local position
    2- a child transform with two Ray Perception Sensor 3Ds each recognizing 5-6 tags respectively.

    Reward conditions:
    1- touching the prize: 10f

    Punishment conditions:
    1- touching the enemy: -1f
    2- running out of time: -3f
    3- running out of jumps: -5f
    4- falling off the outer walls: -1f
    5- -0.025f for every decision
    6- -0.01f for every jump

    Episode ends when:
    1- agent touches enemy
    2- agent touches prize
    3- agent's y position is a negative value
    4- certain amount of time passes without any of the above happening.

    So is it:
    1- impossible for the agent to actually come up with the idea of pushing the boxes and jumping on and off of them by itself (to achieve the goal) and I should map it out through reward system?
    OR
    2- given enough steps, the agent may / will learn it by itself through trial and error?

    -Thanks.
     
  2. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    This is definitely possible to learn (see our WallJump https://github.com/Unity-Technologi...cs/Learning-Environment-Examples.md#wall-jump example for behavior that does exactly this) but it will be certainly be challenging to learn and also probably unreliable i.e. with just raycasts/without a curriculum (we use a curriculum in walljump).

    As a proof of concept though, you can simplify the learning problem by providing the agent with the coordinates of the goal sphere which I think should be possible to learn without a curriculum but may require a lot of samples.
     
    seyyedmahdi69 likes this.
  3. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    With that reward structure I'm not surprised the agents gets "suicidal" I can't help but think of the meseeks from rick and morty when I see some of the negative rewards being thrown around in projects around here lol.

    Anyway like most things you'll probably want to do a mixture of both intelligent design and letting the policy infer it's own relationships. Just like a person is neither wholly nature or nurture but a complicated mixture of the two.

    For your problem I would recommend using a curriculum and starting with just the agent, the target, one box, and the divided room. When an agent isn't randomly stumbling upon a solution in a reasonable amount of time it needs some help and the best way to do it is usually to simplify the problem and reintroduce complexity slowly after it figures it out.
     
    seyyedmahdi69 likes this.
  4. seyyedmahdi69

    seyyedmahdi69

    Joined:
    Dec 2, 2020
    Posts:
    25
    Thank you very much for your input. I would have preferred for the agent to actually find the target without knowing the position but worst case scenario, I'll give it the coordinates to the target. Also, could you please point me towards some more information about developing a curriculum for the problem? It's a very new subject to me
     
  5. seyyedmahdi69

    seyyedmahdi69

    Joined:
    Dec 2, 2020
    Posts:
    25
    Thank you for your input
    I remember that episode lol. But are you suggesting that my reward structure isn't the best for this problem (or just overall bad)? If that's the case, please tell me how I can improve the reward system
     
  6. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    Not at all, I was just trying to be funny ;). I wouldn't consider any reward structure that trains as bad. This one could just maybe use a little optimizing.

    There are 2 important things to consider when using negative rewards

    1. Reward scale - The actual amount of the reward (positive or negative) doesn't actually matter, what matters is it's relation to other rewards.

    In your punishment conditions you are telling the agent that running out of time is 3 times worse than running into an enemy. The logical behavior for the agent to learn in this scenario (when unable to find the target) is to find and suicide on the nearest enemy just before the time runs out. Now let's consider the next negative reward, running out of jumps is 5X(!) as bad as running into an enemy. This in addition to each individual jump tacking on a small penalty will very quickly convince the agent that jumping is never a good idea. I don't think those are the outcome you were going for.

    2. Negative rewards are over-expressed in the early stages of training a policy.

    When an agent first starts training it explores the environment using random actions. Random actions are very unlikely to ever be correct in solving a problem of any complexity, so the agent is guaranteed to get an over representation of the negative rewards. This is in practical terms, a good thing, as negative rewards are tied to things we don't want the agent doing. The problem arises when the agent (intelligently) learns avoidance behaviors that we couldn't have/didn't anticipate. In your case the agent learns very early to never jump, much earlier than they would ever stumble randomly upon the fact that they can (a) move the crate (b) jump on the crate and (c) jump over the gap. I like to shorten this explanation to "Negative rewards promote aversion, whether you want it or not"

    Ok so knowing that, how can we improve the reward structure to have a chance at seeing the behavior you're looking for? I would start by setting a negative reward limit and using that for any behavior that resets the agent (I use -1). Essentially a reset is a loss so no matter how the agent got reset they are all likely equally bad. We don't want any agents actively seeking out one loss to prevent another worse loss. Then I would look into adding an incentive to explore the space more, a Curiosity reward would fit this use case, or the RND reward if you're using a newer version and pytorch.

    If that still doesn't do the trick your options are some combination of; break down the problem into smaller chunks and create a curriculum (good option, but complex), give the agent small positive rewards for things like pushing boxes, jumping on them, and getting to the other side (decent option, will probably have unintended side effects), or to just keep adding to max steps until the agent can randomly stumble upon it (bad option, brittle).

    Let me know if you need clarification on anything, hope this helps.
     
    mbaske and seyyedmahdi69 like this.
  7. seyyedmahdi69

    seyyedmahdi69

    Joined:
    Dec 2, 2020
    Posts:
    25
    thank you for the time and effort :)
    This amount of information just blew up in my face haha. I feel like I don't know anything at this point lol. wow....so much to learn. as for the reward, I think you're right and I only noticed some of the flaws in the reward system after posting this thread and reading what I had written lol.

    as for jumping, what you say makes sense and the agent should feel like its a bad idea to jump, but ironically, it doesn't. it keeps jumping until it runs out! thats a very curious case to me but I hope making some overall changes in the reward system, will set me up in the right path.

    and as for the reward types and curriculum, I literally hadn't heard of them until last night. So I have a lot of homework to do learning them and trying to implement them.
     
  8. seyyedmahdi69

    seyyedmahdi69

    Joined:
    Dec 2, 2020
    Posts:
    25
    UPDATE:

    So I have:
    1- Adjusted and possibly optimized the reward system.
    2- added some curiosity reward as follows:
    Code (CSharp):
    1.  
    2. curiosity:
    3.       strength: 0.02
    4.       gamma: 0.995
    5.       encoding_size: 256
    6.       learning_rate: 3.0e-4
    7.  
    3- and set up some curriculum as follows
    Code (CSharp):
    1. environment_parameters:
    2.   Inner_wall_height:
    3.     curriculum:
    4.       - name: lesson0 # The '-' is important as this is a list
    5.         completion_criteria:
    6.           measure: progress
    7.           behavior: SmartBall
    8.           signal_smoothing: true
    9.           min_lesson_length: 100
    10.           threshold: 0.2
    11.         value:
    12.             sampler_type: uniform
    13.             sampler_parameters:
    14.                 min_value: 0.0
    15.                 max_value: 1.0
    16.       - name: lesson1 # This is the start of the second lesson
    17.         completion_criteria:
    18.           measure: progress
    19.           behavior: SmartBall
    20.           signal_smoothing: true
    21.           min_lesson_length: 100
    22.           threshold: 0.6
    23.           require_reset: true
    24.         value:
    25.           sampler_type: uniform
    26.           sampler_parameters:
    27.             min_value: 4.0
    28.             max_value: 7.0
    29.       - name: lesson3
    30.         value: 8.0
    And I am adjusting the inner wall height based on the
    Inner_wall_height
    parameter through the curriculum. however there seems to be a problem. there doesn't seem to much learning going on even though the agent can basically jump through the rooms. The agent just ignores the target. Should I disable the curious reward now that I have a curriculum in place? or is there another bug in my code?
     
  9. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    Yeah I would remove the curiosity reward until you're sure you need it. The curiosity module itself is another deep network and will need it's own hyper-parameter tuning so it's best to try that after altering the environment/reward structure fails to train the desired behavior. Remember to always add only one change at a time so that when training gets worse you know what to blame (and conversely when it gets better you know what improved it).
     
    seyyedmahdi69 likes this.
  10. seyyedmahdi69

    seyyedmahdi69

    Joined:
    Dec 2, 2020
    Posts:
    25
    @Luke-Houlihan thank you very much for your inputs and opening my eyes to the things I need to learn.

    UPDATE SINCE MY LAST POST:
    I am almost certain I am making a lot of mistakes in implementing curriculum learning to the point where I completely destroy the learning process and the decision making process.
    If you are reading this, could you please:

    1- point me towards materials (text-based or video-based) that would teach me how to use curriculum learning (other than the ml-agents github repo ofc lol.....thats pretty....not useful for noobs.)? Because as I understood, that is the only method I'll be able to use to thoroughly teach the agent to solve the problem.

    2- if you are curious and have free time / bored, see if you can make sense of my mess-up and see what I am doing wrong or how this problem can be solved. I am uploading a zip file containing the project assets(yaml files included in scripts folder). This project was build using ML-agents 1.0.6
     

    Attached Files:

  11. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    @seyyedmahdi69 Great I was going to suggest just putting your project up, taking a look at it now...

    Unfortunately there are not really any tutorials or material other than the unity example environments (or similar simplified walkthroughs), that are related directly to unity ml-agents. You need to keep in mind that this is cutting edge technology and is actively being built and researched, that means you'll have to learn some foundational/fundamental theory and apply it to complex cases yourself. There isn't going to be much beginner friendly material because frankly, no one can say they definitively know the answers yet.

    That said I can point you toward some good learning material to develop a foundation in reinforcement learning.

    Great article by openai that is extremely applicable to your use case - https://openai.com/blog/emergent-tool-use/
    (Openai blog is a gold mine, check out anything related to reinforcement learning)

    An approachable college style course on RL by deepmind & UCL -


    A great intro textbook for reinforcement learning - http://incompleteideas.net/book/the-book.html
    (Purchase a copy if you can afford it, otherwise just click "Full PDF", its the whole text for free)
     
  12. seyyedmahdi69

    seyyedmahdi69

    Joined:
    Dec 2, 2020
    Posts:
    25
    As before, a great pleasure to learn from you. Please let me know if you could make heads or tails from my project. Tnx.
     
  13. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303


    I had a few minutes today and got some good results by simplifying the problem. With no enemies and a short wall that only requires a jump the agent is able to consistently achieve the goal. You'll notice the agent is also consistently overshooting the target, this is because the agents velocity is not being observed, only the targets location relative to the agent.

    Next I'll use this as a base to build the more complicated behaviors - using boxes (hard problem) followed by avoiding enemies (easy problem)
     
    seyyedmahdi69 likes this.
  14. seyyedmahdi69

    seyyedmahdi69

    Joined:
    Dec 2, 2020
    Posts:
    25
    This is some serious result man. I didn't get there myself. So as I understood you passed in the target's location as an observation. is that correct? If so, how much do you think it simplifies the problem. Originally i wanted the agent to stumble upon the target. but if it makes the problem too difficult, i can pass on that
     
  15. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    I don't actually use the goals position as an observation, I use the goals positions relative to the agents reference frame.

    sensor.AddObservation(this.transform.InverseTransformPoint(goal.transform.position));


    This is like the difference between giving someone directions (left at the stoplight then right at the blue sign) and just telling them the destination (go to the gas station).

    This greatly simplifies the problem compared to searching for it via raycast hits, although I'm fairly confident we'll be able to train without it later.

    Here's the agent script with my changes - I tried to put "CHANGED" next to anything I altered

    Code (CSharp):
    1. using System.Linq;
    2. using System.Collections.Generic;
    3. using Unity.MLAgents;
    4. using Unity.MLAgents.Sensors;
    5. using UnityEngine;
    6.  
    7. public class PlayerAI : Agent
    8. {
    9.     public Transform goal;
    10.     public GameObject box1;
    11.     public GameObject enemy;
    12.     public GameObject boundLimits;
    13.     public GameObject innerWall;
    14.     private float _disstanceToTheGround;
    15.  
    16.  
    17.     public bool _isGrounded;
    18.  
    19.  
    20.     private Rigidbody _rPlayer;
    21.     private Rigidbody _rGoal;
    22.    
    23.     public float timer;
    24.     public int _playerLives = 1;
    25.     public float speed = 800f;
    26.     public int _jumpsLeft = 10;
    27.  
    28.  
    29.     private List<GameObject> _enemies = new List<GameObject>();
    30.     private List<Rigidbody> _rEnemies = new List<Rigidbody>();
    31.     private List<GameObject> _boxes1 = new List<GameObject>();
    32.     private List<GameObject> _boxes2 = new List<GameObject>();
    33.  
    34.  
    35.     public int roundtimer = 0;
    36.     private float _enemySpeed = 100f;
    37.     public int _enemyCount = 0;
    38.     public int _boxCount = 1;
    39.  
    40.  
    41.     private int[] _degrees = { 90, 180, 270 };
    42.     private int[] _forceIntervals = { 2, 8, 7 };
    43.     private int[] _forceDirections = { 0, 1, -1 };
    44.     private string[] tagsToSpawnOn = { "Ground1", "Ground2" };
    45.     private string _playerSpawnedOn;
    46.  
    47.  
    48.  
    49.  
    50.     private Bounds _areaBounds;
    51.     private float _xlimits;
    52.     private float _zlimits;
    53.  
    54.     EnvironmentParameters resetParams;
    55.  
    56.     public override void Initialize()
    57.     {
    58.         resetParams = Academy.Instance.EnvironmentParameters;
    59.        
    60.         // CHANGED: A curriculum should drive only one parameter, here I've removed the scaling and we'll rely on just moving along the y-axis
    61.         // innerWall.transform.localScale = new Vector3(30f, resetParams.GetWithDefault("Inner_wall_height", 1f), 0.5f);
    62.         innerWall.transform.localPosition = new Vector3(0, resetParams.GetWithDefault("Inner_wall_height", 0.2f), 0f);
    63.         //transform.Rotate(0, _degrees[Random.Range(0, _degrees.Length)], 0);
    64.         _areaBounds = boundLimits.GetComponent<Collider>().bounds;
    65.         _disstanceToTheGround = GetComponent<Collider>().bounds.extents.y;
    66.  
    67.         _xlimits = _areaBounds.extents.x;
    68.         _zlimits = _areaBounds.extents.z;
    69.  
    70.         _rPlayer = GetComponent<Rigidbody>();
    71.         _rGoal = goal.GetComponent<Rigidbody>();
    72.  
    73.         transform.localPosition = EmptyRandPosition(transform.name, _xlimits, _zlimits, 1, 2, tagsToSpawnOn);
    74.         goal.localPosition = EmptyRandPosition(goal.transform.name, _xlimits, _zlimits, 1, 2, tagsToSpawnOn);
    75.         //Debug.Log($"prize location: {prize.transform.localPosition}");
    76.  
    77.  
    78.         _rPlayer.velocity = Vector3.zero;
    79.         _rPlayer.angularVelocity = Vector3.zero;
    80.  
    81.  
    82.         _rGoal.velocity = Vector3.zero;
    83.         _rGoal.angularVelocity = Vector3.zero;
    84.         //Debug.Log($"{_xlimits} {_zlimits}");
    85.  
    86.         for (int i = 0; i < _enemyCount; i++)
    87.         {
    88.             GameObject enemyClone = Instantiate(enemy, new Vector3(0, 0, 0), Quaternion.identity);
    89.             enemyClone.transform.parent = transform.parent;
    90.             enemyClone.transform.localPosition = EmptyRandPosition(enemyClone.transform.name, _xlimits, _zlimits, 1, 2, tagsToSpawnOn);
    91.  
    92.  
    93.             _enemies.Add(enemyClone);
    94.             _rEnemies.Add(enemyClone.GetComponent<Rigidbody>());
    95.             //Debug.Log($"Enemy created within {_xlimits} {_zlimits}.");
    96.         }
    97.  
    98.         for (int i = 0; i < _boxCount; i++)
    99.         {
    100.             GameObject boxClone = Instantiate(box1, new Vector3(0, 0, 0), Quaternion.identity);
    101.             boxClone.transform.parent = transform.parent;
    102.             boxClone.transform.localPosition = EmptyRandPosition(boxClone.transform.name, _xlimits, _zlimits, 1, 2, new string[] { "Ground1" });
    103.  
    104.             _boxes1.Add(boxClone);
    105.  
    106.         }
    107.  
    108.         for (int i = 0; i < _boxCount; i++)
    109.         {
    110.             GameObject boxClone = Instantiate(box1, new Vector3(0, 0, 0), Quaternion.identity);
    111.             boxClone.transform.parent = transform.parent;
    112.             boxClone.transform.localPosition = EmptyRandPosition(boxClone.transform.name, _xlimits, _zlimits, 1, 2, new string[] { "Ground2" });
    113.  
    114.             _boxes2.Add(boxClone);
    115.  
    116.         }
    117.     }
    118.  
    119.  
    120.     public override void OnEpisodeBegin()
    121.     {
    122.         // CHANGED: See above
    123.         // innerWall.transform.localScale = new Vector3(30f, resetParams.GetWithDefault("Inner_wall_height", 1f), 0.5f);
    124.         innerWall.transform.localPosition = new Vector3(0, resetParams.GetWithDefault("Inner_wall_height", 0.2f), 0f);
    125.     }
    126.  
    127.  
    128.     public override void CollectObservations(VectorSensor sensor)
    129.     {
    130.         sensor.AddObservation(transform.localPosition);
    131.        
    132.         // CHANGED: An agent cant learn about a punishment if they can't ever see why it's happening
    133.         sensor.AddObservation(_jumpsLeft);
    134.         sensor.AddObservation(_isGrounded);
    135.        
    136.         // CHANGED: Just seeing if we can make this easier for the agent
    137.         // sensor.AddObservation(goal.localPosition);
    138.         sensor.AddObservation(this.transform.InverseTransformPoint(goal.transform.position));
    139.         sensor.AddObservation(_rPlayer.velocity);
    140.     }
    141.  
    142.  
    143.     public override void OnActionReceived(float[] vectorAction)
    144.     {
    145.         _isGrounded = Physics.Raycast(transform.position, Vector3.down, _disstanceToTheGround + 0.001f);
    146.         //Debug.Log($" is grounded {_isGrounded}");
    147.         //Debug.Log($"Jumps left: {_jumpsLeft}");
    148.         timer += Time.deltaTime;
    149.         // CHANGED: AddReward is correct here, we want all punishments to count
    150.         AddReward(-0.0005f);
    151.        
    152.         MoveAgentVertical(vectorAction[0]);
    153.         MoveAgentHorizontal(vectorAction[1]);
    154.         // CHANGED: Jump & move work better as concurrent acts, games like mario allow you to change direction while midair
    155.         // because it's intuitive.
    156.         JumpAgent(vectorAction[2]);
    157.  
    158.         if (transform.localPosition.y < 0)
    159.         {
    160.             AddReward(-1f);
    161.             EndEpisode();
    162.             ResetScene();
    163.             //Debug.Log("Player fell");
    164.         }
    165.        
    166.         // CHANGED: 1000 seconds is really long, using max steps instead and lowered for my sanity
    167.         // if (timer > 1000)
    168.         // {
    169.         //     SetReward(-1f);
    170.         //     EndEpisode();
    171.         //     ResetScene();
    172.         //     //Debug.Log("Time ran out");
    173.         // }
    174.  
    175.         // CHANGED:
    176.         if (StepCount == MaxStep - 1)
    177.         {
    178.             AddReward(-1f);
    179.             EndEpisode();
    180.             ResetScene();
    181.         }
    182.  
    183.         if (_playerLives < 1)
    184.         {
    185.             EndEpisode();
    186.             ResetScene();
    187.             //Debug.Log("Player died!");
    188.         }
    189.        
    190.         // if (_jumpsLeft < 1)
    191.         // {
    192.         //     AddReward(-1f);
    193.         //     EndEpisode();
    194.         //     ResetScene();
    195.         //     //Debug.Log("Player ran out of jumps");
    196.         // }
    197.  
    198.         AddForceToEnemies();
    199.     }
    200.  
    201.  
    202.     public void OnCollisionEnter(Collision collision)
    203.     {
    204.         if (collision.transform.tag == "Enemy")
    205.         {
    206.             AddReward(-1f);
    207.             _playerLives -= 1;
    208.             //Debug.Log($"Lives at {_playerLives}");
    209.         }
    210.  
    211.  
    212.         if (collision.transform.tag == "Prize")
    213.         {
    214.             // CHANGED: Reward amount
    215.             AddReward(1f);
    216.             EndEpisode();
    217.             ResetScene();
    218.             Debug.Log("Prize touched!");
    219.         }
    220.  
    221.         // if (tagsToSpawnOn.Contains(collision.transform.tag) && collision.transform.tag != _playerSpawnedOn)
    222.         // {
    223.         //     Debug.Log($"Player changed ground from {_playerSpawnedOn} to {collision.transform.tag}");
    224.         // }
    225.     }
    226.  
    227.     public void JumpAgent(float act)
    228.     {
    229.         switch (act)
    230.         {
    231.             // No jump
    232.             case 0:
    233.                 break;
    234.             // Jump
    235.             case 1:
    236.                 if (_jumpsLeft > 0 && _isGrounded)
    237.                 {
    238.                     _isGrounded = false;
    239.                     // _jumpsLeft -= 1;
    240.                     //Debug.Log("Player jumped");
    241.                     // CHANGED: AddReward is correct here, we want all punishments to count
    242.                     // AddReward(-0.001f);
    243.                     _rPlayer.AddForce(new Vector3(0, 20f, 0) * Time.fixedDeltaTime * speed, ForceMode.Force);
    244.                 }
    245.                 else if (_jumpsLeft <= 0 && _isGrounded)
    246.                 {
    247.                     AddReward(-1f);
    248.                     EndEpisode();
    249.                     ResetScene();
    250.                 }
    251.  
    252.                 break;
    253.         }
    254.        
    255.     }
    256.  
    257.     // CHANGED: Splitting out horizontal and lateral movement for better agent control (along with jump)
    258.     public void MoveAgentVertical(float act)
    259.     {
    260.         switch (act)
    261.         {
    262.             case 0:
    263.                 break;
    264.             case 1:
    265.                 _rPlayer.AddForce(Vector3.forward * Time.fixedDeltaTime * speed, ForceMode.Force);
    266.                 break;
    267.             case 2:
    268.                 _rPlayer.AddForce(Vector3.back * Time.fixedDeltaTime * speed, ForceMode.Force);
    269.                 break;
    270.         }
    271.     }
    272.  
    273.     public void MoveAgentHorizontal(float act)
    274.     {
    275.         switch (act)
    276.         {
    277.             case 0:
    278.                 break;
    279.             case 1:
    280.                 _rPlayer.AddForce(Vector3.right * Time.fixedDeltaTime * speed, ForceMode.Force);
    281.                 break;
    282.             case 2:
    283.                 _rPlayer.AddForce(Vector3.left * Time.fixedDeltaTime * speed, ForceMode.Force);
    284.                 break;
    285.         }
    286.     }
    287.  
    288.  
    289.     // public void MoveAgent(float act)
    290.     // {
    291.     //
    292.     //     Vector3 controlSignal = Vector3.zero;
    293.     //     Vector3 rotateSignal = Vector3.zero;
    294.     //     switch (act)
    295.     //     {
    296.     //         case 0:
    297.     //             controlSignal.x = -1.5f;
    298.     //             break;
    299.     //         case 1:
    300.     //             controlSignal.x = 0;
    301.     //             break;
    302.     //         case 2:
    303.     //             controlSignal.x = 1.5f;
    304.     //             break;
    305.     //         case 3:
    306.     //             controlSignal.z = -1.5f;
    307.     //             break;
    308.     //         case 4:
    309.     //             controlSignal.z = 0f;
    310.     //             break;
    311.     //         case 5:
    312.     //             controlSignal.z = 1.5f;
    313.     //             break;
    314.     //         // case 6:
    315.     //         //     if (_jumpsLeft > 0 && _isGrounded == true)
    316.     //         //     {
    317.     //         //         controlSignal.y = 10f;
    318.     //         //         _jumpsLeft -= 1;
    319.     //         //         //Debug.Log("Player jumped");
    320.     //         //         SetReward(-0.00001f);
    321.     //         //
    322.     //         //     }
    323.     //         //     break;
    324.     //
    325.     //             //case 6:
    326.     //             //    rotateSignal.y = -1;
    327.     //             //    break;
    328.     //             //case 7:
    329.     //             //    rotateSignal.y = 0;
    330.     //             //    break;
    331.     //             //case 8:
    332.     //             //    rotateSignal.y = 1;
    333.     //             //    break;
    334.     //
    335.     //
    336.     //     }
    337.     //     // if (_isGrounded == true)
    338.     //     // {
    339.     //     _rPlayer.AddForce(controlSignal * Time.fixedDeltaTime * speed, ForceMode.Force);
    340.     //     //transform.Rotate(rotateSignal * Time.fixedDeltaTime * speed);
    341.     //     // }
    342.     //
    343.     // }
    344.  
    345.  
    346.     public void AddForceToEnemies()
    347.     {
    348.         timer += Time.fixedDeltaTime;
    349.         roundtimer = Mathf.RoundToInt(timer);
    350.         //Debug.Log(roundtimer);
    351.         var randomInt = _forceIntervals[Random.Range(0, _forceIntervals.Length)];
    352.         //Debug.Log($"timer at {roundtimer}. enemy count: {_rEnemies.Count}. random force interval: {randomInt}");
    353.         foreach (Rigidbody body in _rEnemies)
    354.         {
    355.             if (roundtimer % randomInt == 0)
    356.             {
    357.                 var DirToGO = new Vector3(_forceDirections[Random.Range(0, _forceDirections.Length)], 0,
    358.                 _forceDirections[Random.Range(0, _forceDirections.Length)]);
    359.                 body.AddForce(DirToGO * Time.fixedDeltaTime * _enemySpeed, ForceMode.Impulse);
    360.             }
    361.         }
    362.     }
    363.  
    364.  
    365.     public Vector3 EmptyRandPosition(string transformName, float xlimit, float zlimit, float customeY, float obstacleCheckRadius, string[] colsToSpawnOn)
    366.     {
    367.         Vector3 randPos;
    368.         int x = 0;
    369.         while (x < 500)
    370.         {
    371.             x += 1;
    372.             randPos = new Vector3(Random.Range(-xlimit, xlimit), customeY, Random.Range(-zlimit, zlimit));
    373.             //Debug.Log($"Random pos: {randPos.x} {randPos.y} {randPos.z}");
    374.             Collider[] colliders = Physics.OverlapSphere(randPos, obstacleCheckRadius);
    375.             //Debug.Log($"colliders length: {colliders.Length}");
    376.             if (colliders.Length == 1 && colsToSpawnOn.Contains(colliders[0].tag))
    377.             {
    378.                 //Debug.Log($"spawned {transformName} on {colliders[0].tag} on pos: {randPos.x}, {randPos.y}, {randPos.z}");
    379.                 if (transformName == "Player")
    380.                 {
    381.                     _playerSpawnedOn = colliders[0].tag;
    382.                     //Debug.Log($"Player spawned on {_playerSpawnedOn}");
    383.                 }
    384.                 return randPos;
    385.             }
    386.         }
    387.         //Debug.Log($"could'nt find suitable location for {transformName} after {x} tries.");
    388.         return new Vector3(0, customeY, 0);
    389.     }
    390.  
    391.  
    392.     public void ResetScene()
    393.     {
    394.         timer = 0;
    395.         _playerLives = 1;
    396.         _jumpsLeft = 150;
    397.         _rPlayer.velocity = Vector3.zero;
    398.         _rPlayer.angularVelocity = Vector3.zero;
    399.  
    400.         _rGoal.velocity = Vector3.zero;
    401.         _rGoal.angularVelocity = Vector3.zero;
    402.         //transform.Rotate(0, _degrees[Random.Range(0, _degrees.Length)], 0);
    403.         transform.localPosition = EmptyRandPosition(transform.name, _xlimits, _zlimits, 1, 2, tagsToSpawnOn);
    404.         goal.localPosition = EmptyRandPosition(goal.transform.name, _xlimits, _zlimits, 1, 2, tagsToSpawnOn);
    405.         //Debug.Log(prize.transform.localPosition);
    406.  
    407.  
    408.  
    409.         foreach (Rigidbody rEnemy in _rEnemies)
    410.         {
    411.             rEnemy.velocity = Vector3.zero;
    412.             rEnemy.angularVelocity = Vector3.zero;
    413.         }
    414.  
    415.         foreach (GameObject enemy in _enemies)
    416.         {
    417.             enemy.transform.localPosition = EmptyRandPosition(enemy.transform.name, _xlimits, _zlimits, 1, 2, tagsToSpawnOn);
    418.         }
    419.  
    420.         foreach (GameObject box in _boxes1)
    421.         {
    422.             box.transform.localPosition = EmptyRandPosition(box.transform.name, _xlimits, _zlimits, 1, 2, new string[] { "Ground1" });
    423.         }
    424.  
    425.         foreach (GameObject box in _boxes2)
    426.         {
    427.             box.transform.localPosition = EmptyRandPosition(box.transform.name, _xlimits, _zlimits, 1, 2, new string[] { "Ground2" });
    428.         }
    429.  
    430.         // CHANGED: Redundant
    431.         // timer = 0;
    432.         // roundtimer = 0;
    433.     }
    434. }
     
    seyyedmahdi69 likes this.
  16. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    Doing even better with velocity observed and some hyperparameter tuning -

     
    seyyedmahdi69 likes this.
  17. seyyedmahdi69

    seyyedmahdi69

    Joined:
    Dec 2, 2020
    Posts:
    25
    So far I understood that I was missing some necessary observations. Also that you have increased size of vector actions branches and vector observations. I saw the improvements you have made in the code and I am grateful. I am anxious to see the final result lol.
     
  18. seyyedmahdi69

    seyyedmahdi69

    Joined:
    Dec 2, 2020
    Posts:
    25
    @Luke-Houlihan I think the conversation I started didn't get across to you so I'll ask here :D
    could you kindly upload the modified project files? thank you very much :)
     
  19. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    @seyyedmahdi69 Yeah, I haven't had a chance to get back to this but I hope it helps you move in the right direction.

    The project may not open for you because I upgraded the unity version and the ml-agents version but the scripts should be fine. The prefabs I'm not sure about.
     

    Attached Files: