Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice
  3. Join us on November 16th, 2023, between 1 pm and 9 pm CET for Ask the Experts Online on Discord and on Unity Discussions.
    Dismiss Notice
  4. Dismiss Notice

Autonomous parking problem

Discussion in 'ML-Agents' started by nicoloarena, Feb 25, 2021.

  1. nicoloarena

    nicoloarena

    Joined:
    Oct 3, 2019
    Posts:
    4
    Hi everyone,
    I'm working on a project in which an agent learns to park in a specific spot. The problem is that the agent never goes in the spot, it keeps mooving forward and backward in a local optimum.

    Here is the environment: (the white slot is the target)

    The agent position is generated randomly (in available places) at the beginning of each episode.
    I used 20 total areas in the scene, so there are 20 agents learning.

    Here is my configuration:

    Observations:
    - transform.localPosition.x; (normalized in [0, 1])
    - transform.localPosition.z; (normalized in [0, 1])
    - transform.localRotation.y; (normalized in [0, 1])
    - Mathf.Abs(rBody.velocity.x * 3.6f / 100);
    - Mathf.Abs(rBody.velocity.z * 3.6f / 100);
    - transform.InverseTransformPoint(target.transform.position).x;
    - transform.InverseTransformPoint(target.transform.position).z;
    - anglular difference (y axis) between the car and the parking slot, so that 1 means parallel and 0 means
    perpendicular;
    - RayPerceptionSensor3D with 6 rays per direction, 180°, detecting obstacles and other cars;
    - RayPerceptionSensor3D with 5 rays per direction, 180°, detecting sidewalks and parking slot.

    Action space:
    - throttle, continuous in [-1, 1];
    - steering, continuous in [-1, 1];
    - brake, continuous in [0, 1].

    Reward system:
    - -0.001 for every step; (max step of each agent is set to 3000)
    - -0.5 if it collides with sidewalk;
    - -1 if it collides with obstacles or other cars;
    - sqrt(dx^2 + dz^2)/10, considering dx and dz the difference between current distance and previous step distance, it's a small amount added only if it gets closer to the target;
    - 10 points if it stops in the parking slot (distance.x < 0.5 and distance.z < 0.5);
    - reward based on angular difference between the car and the parking slot, 10 points if it's parallel, 0 if it's perpendicular.
    - + curiosity.

    config.yaml:
    Code (yaml):
    1. behaviors:
    2.   ParkingAI:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 2048
    6.       buffer_size: 8192
    7.       learning_rate: 0.0003
    8.       beta: 0.01
    9.       epsilon: 0.2
    10.       lambd: 0.99
    11.       num_epoch: 3
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: true
    15.       hidden_units: 256
    16.       num_layers: 3
    17.       memory:
    18.         memory_size: 256
    19.         sequence_length: 512
    20.       vis_encode_type: simple
    21.     reward_signals:
    22.       extrinsic:
    23.         gamma: 0.99
    24.         strength: 1.0
    25.       curiosity:
    26.         gamma: 0.99
    27.         strength: 0.02
    28.         encoding_size: 256
    29.         learning_rate: 0.0003
    30.     keep_checkpoints: 5
    31.     max_steps: 50000000
    32.     time_horizon: 128
    33.     summary_freq: 30000
    34.     threaded: true
    Tensorboard graphs:



    Please, let me now what you think about my configuration, if it's right and I just have to wait more (I saw about 300 - 400 episodes) or maybe I have to change something. Thank you.
     
    Last edited: Feb 26, 2021
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Rewarding distance changes can result in oscillating behaviour if the agent figures out that it can maximize rewards by repeatedly moving forward and backward. You could try rewarding the agent for getting closer to the target, but also assign a proportional penalty for moving away from it.
     
  3. nicoloarena

    nicoloarena

    Joined:
    Oct 3, 2019
    Posts:
    4
    Thanks for your quick reply.
    Yes, before using curiosity I'd also tried adding a negative reward for moving away, but in that case the agent wasn't moving. Also, I've read in another post here that adding negative reward for actions I don't want it to do could end up discouraging the agent moving, because he learns that when he moves he has 50% chance of getting positive reward and 50% chance of getting negative reward.
    However, since I'm also using curiosity I'll try that, thanks.
    Is there anything else I can change to improve my configuration?
     
  4. celion_unity

    celion_unity

    Unity Technologies

    Joined:
    Jun 12, 2019
    Posts:
    289
    You could try keeping track of the "best" distance so far, and only give the reward when that decreases. So something like (in pseudocode)
    Code (CSharp):
    1. if currentDistance < bestDistance:
    2.   agent.AddReward(k * (bestDistance - currentDistance))
    3.   bestDistance = currentDistance
     
  5. nicoloarena

    nicoloarena

    Joined:
    Oct 3, 2019
    Posts:
    4
    This is a great idea, thank you.
    I'm applying a lot of changes, I'll let you know about results in the next days.
    How many steps should I have to wait, for every simulation, in order to actually see if it's working?