Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Bomberman agent

Discussion in 'ML-Agents' started by Noxalus, May 17, 2020.

  1. Noxalus

    Noxalus

    Joined:
    Jan 9, 2018
    Posts:
    80
    Hello everyone!

    I've seen the news where ML Agent package v1.0 release has been announced (here), and I would like to try it implementing an AI using machine learning (and reinforcement learning) for a simple bomberman game.

    I've read most of this documentation and looked at most of the Unity demos available on the repository, but I still have some questions.

    Bomberman is a simple game, you only have 5 possible actions: move (4 direction) and plant a bomb.

    The learning environment looks like that:

    upload_2020-5-17_15-8-23.png

    Brick walls are destructable, and for the first step, I would like that the agent plants bombs to destroy all walls without dying. This implies that the agent learn to go near a wall, plant a bomb and escape from the explosion before to repeat the process.

    If the agent destroys a wall, he will get a positive reward and if he dies (because of his own bomb) he will get a negative reward. To encourage it to plant bombs as much as possible, I only give him a little reward. That's how I setup my Agent's Initialize method override:

    Code (CSharp):
    1.  
    2. public override void Initialize()
    3. {
    4.     _player.OnDeath.AddListener((player) => AddReward(-1f));
    5.     _player.OnWallDestroy.AddListener((player) => AddReward(0.75f));
    6.     _player.OnPlantBomb.AddListener((player) => AddReward(0.1f));
    7. }
    8.  
    The training episode ends if all walls have been destroyed (success) or at the end of the maximum steps (handled by ML Agent framework).

    But from what I understood, one of the first things to do with an agent is to add the "observations" to make it aware of his environment and make decisions. But simple float values is not enough for my case, right? I would like to give it the entire map and it could be complicated using classic observators. Looking at the Grid example I think that "visual observations" is more suitable. But it's a little bit abstract for me as I don't see any logic depending on it.

    To test my agent before to train it, I've implemented Heuristic method like that:

    Code (CSharp):
    1. public override void Heuristic(float[] actionsOut)
    2. {
    3.     actionsOut[0] = (int)PossibleAction.Nothing;
    4.     if (Input.GetKey(KeyCode.RightArrow))
    5.     {
    6.         actionsOut[0] = (int)PossibleAction.Right;
    7.     }
    8.     if (Input.GetKey(KeyCode.UpArrow))
    9.     {
    10.         actionsOut[0] = (int)PossibleAction.Up;
    11.     }
    12.     if (Input.GetKey(KeyCode.LeftArrow))
    13.     {
    14.         actionsOut[0] = (int)PossibleAction.Left;
    15.     }
    16.     if (Input.GetKey(KeyCode.DownArrow))
    17.     {
    18.         actionsOut[0] = (int)PossibleAction.Down;
    19.     }
    20.     if (Input.GetKey(KeyCode.Space))
    21.     {
    22.             actionsOut[0] = (int)PossibleAction.Bomb;
    23.     }
    24. }
    And the OnActionReceived method like that:

    Code (CSharp):
    1. public override void OnActionReceived(float[] vectorAction)
    2. {
    3.     var movement = Vector2.zero;
    4.     var action = (PossibleAction)Mathf.FloorToInt(vectorAction[0]);
    5.  
    6.     switch (action)
    7.     {
    8.         case PossibleAction.Nothing:
    9.             // do nothing
    10.             break;
    11.         case PossibleAction.Right:
    12.             movement.x = 1;
    13.             break;
    14.         case PossibleAction.Left:
    15.             movement.x = -1;
    16.             break;
    17.         case PossibleAction.Up:
    18.             movement.y = 1;
    19.             break;
    20.         case PossibleAction.Down:
    21.             movement.y = -1;
    22.             break;
    23.         case PossibleAction.Bomb:
    24.             _playerMovement.PlantBomb();
    25.             break;
    26.         default:
    27.             throw new ArgumentException("Invalid action value");
    28.     }
    29.  
    30.     _playerMovement.Move(movement);
    31. }
    And it works as expected, without model, I'm able to move my agent using keyboard arrows and plant bombs to destroy walls.

    So the next step is the training. And here is my problem, I've installed Python 3.7 with mlagents dependency and run the proper command line, but when I play in Unity, my agent doesn't move at all... (but the time is accelerated, so I think the process is good)

    Do you know why? Did I miss something?

    Thank you in advance.

    PS: Here is my agent prefab components:

    upload_2020-5-17_15-18-50.png
     
    Last edited: May 17, 2020
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    "Behavior Parameters > Vector Action > Branch 0 Size" is set to 1. The vectorAction array in OnActionReceived will therefore always be [0]. Increase the branch size to 6 in order to allow for different action values matching your PossibleAction enum.
     
    Noxalus likes this.
  3. Noxalus

    Noxalus

    Joined:
    Jan 9, 2018
    Posts:
    80
    Thanks a lot, that was that! ;)
     
  4. Noxalus

    Noxalus

    Joined:
    Jan 9, 2018
    Posts:
    80
    I've did a lot of tests, and even after multiple session of +3000s (almost 1h) of training, my agent is still to dumb to destroy more than one single wall without dying...

    I've started the training with simple rules:
    • Give -0.5 reward when agent dies (and the episode ends)
    • Give 0.1 when agent destroys a wall
    • Give 0.05 when agent plants a bomb
    For me, this should be enough for the agent to progress, but that was not the case. Here is what the agent always does:

    r6Pje05vlZ.gif

    So I tried to add more rules:
    • Give a reward of -0.005 each frame the agent stays on a dangerous cell
    • Give a reward of 0.001 each frame the agent stays on a safe cell
    But, it's not better at all, here is the last log I've get:

    Code (cmd):
    1. 2020-05-18 20:23:28 INFO [environment.py:201] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
    2. 2020-05-18 20:23:38 INFO [environment.py:111] Connected to Unity environment with package version 1.0.0-preview and communication version 1.0.0
    3. 2020-05-18 20:23:38 INFO [environment.py:342] Connected new brain:
    4. MLAIPlayer?team=0
    5. 2020-05-18 20:23:39.085224: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
    6. 2020-05-18 20:23:39.132653: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14b65b5a440 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    7. 2020-05-18 20:23:39.138668: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    8. 2020-05-18 20:23:39.192837: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
    9. 2020-05-18 20:23:39.252891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
    10. pciBusID: 0000:01:00.0 name: GeForce RTX 2080 computeCapability: 7.5
    11. coreClock: 1.71GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.23GiB/s
    12. 2020-05-18 20:23:39.261904: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
    13. 2020-05-18 20:23:39.267506: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cublas64_10.dll'; dlerror: cublas64_10.dll not found
    14. 2020-05-18 20:23:39.273665: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
    15. 2020-05-18 20:23:39.278761: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
    16. 2020-05-18 20:23:39.285988: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
    17. 2020-05-18 20:23:39.292389: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cusparse64_10.dll'; dlerror: cusparse64_10.dll not found
    18. 2020-05-18 20:23:39.298523: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found
    19. 2020-05-18 20:23:39.303337: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
    20. Skipping registering GPU devices...
    21. 2020-05-18 20:23:39.509592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
    22. 2020-05-18 20:23:39.513809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
    23. 2020-05-18 20:23:39.516466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
    24. 2020-05-18 20:23:39.522302: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14b65e02c80 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
    25. 2020-05-18 20:23:39.527797: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
    26. 2020-05-18 20:23:39 WARNING [stats.py:197] events.out.tfevents.1589790467.DESKTOP-LPEC6ST was left over from a previous run. Deleting.
    27. 2020-05-18 20:23:39 WARNING [stats.py:197] events.out.tfevents.1589790467.DESKTOP-LPEC6ST.meta was left over from a previous run. Deleting.
    28. 2020-05-18 20:23:39 INFO [stats.py:130] Hyperparameters for behavior name BombermanAI_MLAIPlayer:
    29.         trainer:        ppo
    30.         batch_size:     32
    31.         beta:   0.005
    32.         buffer_size:    256
    33.         epsilon:        0.2
    34.         hidden_units:   256
    35.         lambd:  0.95
    36.         learning_rate:  0.0003
    37.         learning_rate_schedule: linear
    38.         max_steps:      5000000
    39.         memory_size:    128
    40.         normalize:      False
    41.         num_epoch:      3
    42.         num_layers:     1
    43.         time_horizon:   5
    44.         sequence_length:        64
    45.         summary_freq:   20000
    46.         use_recurrent:  False
    47.         vis_encode_type:        simple
    48.         reward_signals:
    49.           extrinsic:
    50.             strength:   1.0
    51.             gamma:      0.9
    52.         summary_path:   BombermanAI_MLAIPlayer
    53.         model_path:     ./models/BombermanAI/MLAIPlayer
    54.         keep_checkpoints:       5
    55. 2020-05-18 20:23:39.625993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
    56. pciBusID: 0000:01:00.0 name: GeForce RTX 2080 computeCapability: 7.5
    57. coreClock: 1.71GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.23GiB/s
    58. 2020-05-18 20:23:39.638200: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
    59. 2020-05-18 20:23:39.644507: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cublas64_10.dll'; dlerror: cublas64_10.dll not found
    60. 2020-05-18 20:23:39.651002: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
    61. 2020-05-18 20:23:39.657101: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
    62. 2020-05-18 20:23:39.664442: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
    63. 2020-05-18 20:23:39.672330: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cusparse64_10.dll'; dlerror: cusparse64_10.dll not found
    64. 2020-05-18 20:23:39.680482: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found
    65. 2020-05-18 20:23:39.686136: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
    66. Skipping registering GPU devices...
    67. 2020-05-18 20:23:39.697456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
    68. 2020-05-18 20:23:39.702265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
    69. 2020-05-18 20:23:39.704919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
    70. 2020-05-18 20:26:22 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 20000. Time Elapsed: 175.199 s Mean Reward: -0.500. Std of Reward: 0.069. Training.
    71. 2020-05-18 20:29:02 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 40000. Time Elapsed: 335.727 s Mean Reward: -0.518. Std of Reward: 0.030. Training.
    72. 2020-05-18 20:30:23 INFO [trainer_controller.py:112] Saved Model
    73. 2020-05-18 20:31:43 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 60000. Time Elapsed: 496.874 s Mean Reward: -0.507. Std of Reward: 0.027. Training.
    74. 2020-05-18 20:34:24 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 80000. Time Elapsed: 657.731 s Mean Reward: -0.492. Std of Reward: 0.090. Training.
    75. 2020-05-18 20:37:05 INFO [trainer_controller.py:112] Saved Model
    76. 2020-05-18 20:37:05 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 100000. Time Elapsed: 819.105 s Mean Reward: -0.520. Std of Reward: 0.026. Training.
    77. 2020-05-18 20:39:46 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 120000. Time Elapsed: 980.133 s Mean Reward: -0.506. Std of Reward: 0.035. Training.
    78. 2020-05-18 20:42:27 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 140000. Time Elapsed: 1140.756 s Mean Reward: -0.487. Std of Reward: 0.084. Training.
    79. 2020-05-18 20:43:48 INFO [trainer_controller.py:112] Saved Model
    80. 2020-05-18 20:45:08 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 160000. Time Elapsed: 1301.993 s Mean Reward: -0.507. Std of Reward: 0.052. Training.
    81. 2020-05-18 20:47:49 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 180000. Time Elapsed: 1462.484 s Mean Reward: -0.513. Std of Reward: 0.025. Training.
    82. 2020-05-18 20:50:24 INFO [trainer_controller.py:112] Saved Model
    83. 2020-05-18 20:50:24 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 200000. Time Elapsed: 1617.413 s Mean Reward: -0.518. Std of Reward: 0.025. Training.
    84. 2020-05-18 20:52:59 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 220000. Time Elapsed: 1772.531 s Mean Reward: -0.519. Std of Reward: 0.029. Training.
    85. 2020-05-18 20:55:39 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 240000. Time Elapsed: 1932.905 s Mean Reward: -0.511. Std of Reward: 0.034. Training.
    86. 2020-05-18 20:56:59 INFO [trainer_controller.py:112] Saved Model
    87. 2020-05-18 20:58:20 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 260000. Time Elapsed: 2093.836 s Mean Reward: -0.491. Std of Reward: 0.040. Training.
    88. 2020-05-18 21:01:01 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 280000. Time Elapsed: 2254.238 s Mean Reward: -0.486. Std of Reward: 0.034. Training.
    89. 2020-05-18 21:03:39 INFO [trainer_controller.py:112] Saved Model
    90. 2020-05-18 21:03:39 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 300000. Time Elapsed: 2412.942 s Mean Reward: -0.506. Std of Reward: 0.067. Training.
    91. 2020-05-18 21:06:20 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 320000. Time Elapsed: 2573.216 s Mean Reward: -0.480. Std of Reward: 0.043. Training.
    92. 2020-05-18 21:08:59 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 340000. Time Elapsed: 2732.896 s Mean Reward: -0.485. Std of Reward: 0.080. Training.
    93. 2020-05-18 21:10:19 INFO [trainer_controller.py:112] Saved Model
    94. 2020-05-18 21:11:40 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 360000. Time Elapsed: 2893.632 s Mean Reward: -0.495. Std of Reward: 0.037. Training.
    95. 2020-05-18 21:14:24 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 380000. Time Elapsed: 3057.400 s Mean Reward: -0.485. Std of Reward: 0.035. Training.
    96. 2020-05-18 21:17:04 INFO [trainer_controller.py:112] Saved Model
    97. 2020-05-18 21:17:04 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 400000. Time Elapsed: 3217.258 s Mean Reward: -0.447. Std of Reward: 0.176. Training.
    98. 2020-05-18 21:19:46 INFO [stats.py:111] BombermanAI_MLAIPlayer: Step: 420000. Time Elapsed: 3380.051 s Mean Reward: -0.490. Std of Reward: 0.062. Training.
    In the log we can see that the Mean Reward and the Std of Reward don't really evolve.

    I've also tried a lot of new rules to improve it, but without success:
    • Give a positive reward the agent when he goes on an unvisited cell
    • Give a negative reward when the agent does nothing if no bomb has been placed
    • Give a negative reward if the agent tried to move but didn't (walk facing a wall)
    • Give a positive reward if the agent keep going to the same direction (for continuous movement instead of switching between backward/forward constantly)

    So I'm little lost, what do I do wrong?

    As said above, I use render target sensor rendering the entire map (like the main camera) in a PNG of 84x84:

    upload_2020-5-18_22-21-49.png

    Could it be the origin of the problem? Should I use vector observation instead?

    Here is the config file I use for my training (based on Grid example):

    Code (yaml):
    1. default:
    2.     trainer: ppo
    3.     batch_size: 1024
    4.     beta: 5.0e-3
    5.     buffer_size: 10240
    6.     epsilon: 0.2
    7.     hidden_units: 128
    8.     lambd: 0.95
    9.     learning_rate: 3.0e-4
    10.     learning_rate_schedule: linear
    11.     max_steps: 5.0e5
    12.     memory_size: 128
    13.     normalize: false
    14.     num_epoch: 3
    15.     num_layers: 2
    16.     time_horizon: 64
    17.     sequence_length: 64
    18.     summary_freq: 10000
    19.     use_recurrent: false
    20.     vis_encode_type: simple
    21.     reward_signals:
    22.         extrinsic:
    23.             strength: 1.0
    24.             gamma: 0.99
    25.  
    26. MLAIPlayer:
    27.     batch_size: 32
    28.     normalize: false
    29.     num_layers: 1
    30.     hidden_units: 256
    31.     beta: 5.0e-3
    32.     buffer_size: 256
    33.     max_steps: 5000000
    34.     summary_freq: 20000
    35.     time_horizon: 5
    36.     reward_signals:
    37.         extrinsic:
    38.             strength: 1.0
    39.             gamma: 0.9
    Could you give me some advice of how I should train him properly?
     
  5. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Rendertextures don't seem ideal for encoding gamestate. I've tried to get something like this working myself, but only with very limited success yet. https://forum.unity.com/threads/minimum-visual-observation-size.855487/#post-5655583
    First thing I'd change, is making the rendertexture contents clearer and more abstract. Instead of using a camera view*, try encoding grid cell states with color channels. So for instance red could represent a blocked cell, green a free cell and so on. Set the texture size to a multiple of your grid size. With the grid being 15x15, the texture might be 60x60.

    *EDIT: Looking at it again - I did actually use a top-down camera in my environment. But I was adding alternate grid-cell sized objects, gave them an unlit material and set them on a layer only visible to that camera. My agent for instance would have a blue cube attached to it which only the camera sensor could see. Might be a bit crude, but maybe simpler than pixelwise editing of the rendertexture. The point is just to make the visual observation as clear as possible and remove any unnecessary information.

    That might be too granular. Please see my post on rewarding behavior vs rewarding achievements.
    https://forum.unity.com/threads/needed-advise-regarding-rewards-and-signals.888862/#post-5840512
     
    Last edited: May 19, 2020
    Noxalus likes this.
  6. Noxalus

    Noxalus

    Joined:
    Jan 9, 2018
    Posts:
    80
    Thanks you for your answer @mbaske

    I will try what you suggest and come back with results! ;)
     
  7. Noxalus

    Noxalus

    Joined:
    Jan 9, 2018
    Posts:
    80
    Ok, so I've made an abstract map in the render texture which was not so easy. I already had a way to get the type of entity at a given cell position, but the problem is I didn't find how to write a pixel color in a RenderTexture and for what I've read on the web, it's not the way we use them. RenderTexture are written by a camera.

    So instead, I created a Texture2D based on the RenderTexture dimension (60x60 => 4x4 pixels for each entity) and I render a quad with the texture on it (and an unlit material). Then, an orthographic camera will render the quad in the RenderTexture used by the RenderTexture Sensor. It's probably not the best way to do it, but, hey, it works... :confused:

    upload_2020-5-20_0-49-9.png

    Unfortunatly, the result is still really bad, it's like he doesn't learn at all.

    Here is the result after 1h30 of training:

    upload_2020-5-20_0-50-47.png

    The orange line is the official Grid example (with only one environment) and it's much more effective.

    I will start new experiments tomorrow but I expected. :(
     
  8. MrWetsnow

    MrWetsnow

    Joined:
    Jan 5, 2020
    Posts:
    60
    This might not be very helpful, but I would caution you about making judgements after relatively short training amounts. I am sure you know this, but to reiterate, the systems basically starts by throwing random inputs into your program. It has to throw enough randomness in to start getting some sort of result that makes sense.

    In your case, it seems you need your agent to:

    1. Move next to a wall
    2. Place a bomb there
    3. Move away from the wall

    For each of the three, you actually seem to have 6 different actions the player can take. So the probability of doing all three here is: 1/6 * 1/6 * 1/6. (perhaps a bit better, since I assume you mask out some of the movements where the agent can't move).

    And now you have to do this repeatedly, while the area of movement is changing.
     
    Noxalus likes this.
  9. Noxalus

    Noxalus

    Joined:
    Jan 9, 2018
    Posts:
    80
    Thank for the reply @MrWetsnow.

    I thought of what you said, and I'm wondering if it would be a good idea to limit actions until the agent reach a new cell. Before to start ML experimentations, I wrote an AI myself with a bunch of rules and the first thing this AI could do is to go to cell [x,y]. Actually, between 2 cells, there it's pretty rare to choose another action than continue to the targetted cell, right?

    Is this a good thing to limit the agent actions this much?