Search Unity

Help with Roguelike using Curiosity

Discussion in 'ML-Agents' started by TheJarmanitor, May 25, 2021.

  1. TheJarmanitor

    TheJarmanitor

    Joined:
    Mar 18, 2018
    Posts:
    20
    i'm teaching an agent to beat a spelunky type roguelike . the levels are small but require precision and wall jumping.
    i've added curiosity to the parameters but it's not taking anough risks, they don't jump right or even use all the actions. I'm certain the problem comes with the parameters. This is what i'm using right now
    Code (CSharp):
    1. behaviors:
    2.   PlayerAgent:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 256
    6.       buffer_size: 2048
    7.       learning_rate: 3.0e-5
    8.       beta: 5.0e-1
    9.       epsilon: 0.2
    10.       lambd: 0.9
    11.       num_epoch: 5
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: false
    15.       hidden_units: 1024
    16.       num_layers: 10
    17.     reward_signals:
    18.       extrinsic:
    19.         gamma: 0.99
    20.         strength: 0.4
    21.       curiosity:
    22.         strength: 0.9
    23.         gamma: 0.99
    24.         network_settings:
    25.             hidden_units: 512
    26.             num_layers: 5
    27.         learning_rate: 3e-3
    28.     keep_checkpoints: 3
    29.     max_steps: 10000000
    30.     time_horizon: 256
    31.     summary_freq: 20000
    what should be higher/lower/different? what recommendations can you give me in this scenario?
     
  2. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    Hi,

    It will be helpful for us to help if you can describe your game with more details, including the game mechanism, the task you're trying to solve, the observations/actions setup of your agents, reward functions, etc.
     
  3. TheJarmanitor

    TheJarmanitor

    Joined:
    Mar 18, 2018
    Posts:
    20
    i'll elaborate on the game.

    It's a Rogulike platformer with emphasis on verticality. It uses The spelunky approach on dungeon creation(set of premade rooms with doors and an algorithm to create a main path and then fill the rest). I have some basic enemies with patroling AIThe Agent needs to learn how to get to the end of the dungeon. At the beginning of the episode, a new dungeon is created. the Agent needs precision jumping to move around

    The agent can move horizontally, start a jump, end a jump midway, attack, wall jump once after touching the ground and pass through some platforms.

    each time it moves, there's a small penalty so the agent tries to solve it as fast as possible, and if it jumps so the agent doesn't hop too much. the rest of the rewards are:
    Code (CSharp):
    1.         AgentActions(actionBuffers.ContinuousActions);
    2.         if(combat.didDamage){
    3.             AddReward(0.05f);
    4.         }
    5.         if(combat.hurt){
    6.             AddReward(-0.01f);
    7.         }
    8.         if(combat.isAttacking && !combat.didDamage){
    9.             AddReward(-0.005f);
    10.         }
    11.         if(combat.died){
    12.             AddReward(-0.5f);
    13.             EndEpisode();
    14.         }
    it's using a ray perception where it can check platform tiles, ground/wall tiles, enemies and the checkpoint. Besides that the vector observations i'm collecting are

    Code (CSharp):
    1.        sensor.AddObservation(player.wallSliding ? 1f: 0f); //check if it's on a wall
    2.         sensor.AddObservation(player.velocity);
    3.         sensor.AddObservation(player.directionalInput);
    4.  
    for the movement and collisions i'm a using raycasts(not sure if your'e familiarized with Sebastian Lague's platformer tutorial)
     

    Attached Files:

  4. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    10 x 1024 hidden units seem a bit excessive for the task. Maybe try simplifying the environment first, in order to see if the agent can actually learn walking and jumping with the given observations and rewards. I would remove enemies for now and concentrate on movement. You can probably parameterize the dungeon difficulty, so the wall jump example should be a good fit in terms of config params and curriculum. Just keep it simple to see if things work like you intended, before increasing complexity and adding enemies.
     
  5. TheJarmanitor

    TheJarmanitor

    Joined:
    Mar 18, 2018
    Posts:
    20
    The enemies aren't causing extra problems right now because the agent is not exploring enough to get to them. sometimes the agent moves by accident but mostly stays on the same place jumping and moving erratically. That's what i want to impove right now. what should i change on the parameters to make that? on top of what you just said

    But the curriculum probably will work for me in the future so i appreciate it greatly. i don't understand the wall jump exmaple too well but i'll do my best. Thank you
     
  6. TheJarmanitor

    TheJarmanitor

    Joined:
    Mar 18, 2018
    Posts:
    20
    An update on this. I have changed my actions from continuous to discrete. Now there is a branch for horizontal movement, vertical moment and jumping. I've been get better results but still really far from good. My agent sometimes finds the end of the level, but when it hits a dead end he doesn't jump back and try another route. The learning process doesn't look consistent

    Here are my new parameters:
    Code (CSharp):
    1. behaviors:
    2.   PlayerAgent:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 64
    6.       buffer_size: 40960
    7.       learning_rate: 3e-4
    8.       beta: 1e-3
    9.       epsilon: 0.1
    10.       lambd: 0.9
    11.       num_epoch: 5
    12.       learning_rate_schedule: constant
    13.     network_settings:
    14.       normalize: true
    15.       hidden_units: 512
    16.       num_layers: 5
    17.     reward_signals:
    18.       extrinsic:
    19.         gamma: 0.99
    20.         strength: 0.7
    21.       curiosity:
    22.         strength: 0.8
    23.         gamma: 0.8
    24.         encoding_size: 256
    25.         learning_rate: 5e-3
    26.     keep_checkpoints: 3
    27.     max_steps: 10000000
    28.     time_horizon: 1024
    29.     summary_freq: 25000
    i tried using memory, but hasn't worked very well, i also used bigger sensors, but i'm not sure if it was the best idea
     

    Attached Files:

    Last edited: May 29, 2021