Search Unity

Impossible to solve a simple problem.

Discussion in 'ML-Agents' started by chrisk, May 15, 2021.

  1. chrisk

    chrisk

    Joined:
    Jan 23, 2009
    Posts:
    704
    Hi, I've been working on a simple turret to train to point to a target.

    Input(Observation): Turret angle to a target, normalized between [-1,1], the turret is located at the origin.
    Output(Action) : Simulated turret angle to the target [-1,1]; (I want the Output to be the same as the Input in this test.)

    I give reward/penalty if it points toward the target.
    AddReward((20- aimErrorAbsAngle) / 180)

    aimErrorAbsAngle is, a diff-angle from the Ouput angle to the target.

    That's it! I've tried to run the training for hours and tried many different hyperparameters(ppo) from the default, low, to high range but they are all the same. The turret generally points toward the target but it shakes violently. It shakes between about -45 degrees to 45 degrees from the target direction.

    I didn't know that (Input value => Output value) is such a hard problem to solve. Hmm..

    One other thing that puzzles me is that the turret also shakes violently left and right during Inference. The target is fixed in location, thus input is always the same. If input is the same, the output should be the same during Inference, right? Where does the randomness come from during inference?

    Thanks.
     
    Last edited: May 15, 2021
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Some amount of randomness is to be expected. But if it's too much, I'd guess the reward signal isn't clear enough. You can try this: get the inverse normalized angle towards the target (1: facing target, 0: facing opposite direction) and raise it to a power, e.g.
    reward = Mathf.Pow(inverse_normalized_angle, 10) * reward_factor;
    This will cause rewards to drop steeply when the turret is pointing away from the target even a little.
    Also, limited relative movement tends to be less jittery than setting absolute directions. Maybe let the agent rotate the turret by some maximum angle per step. I often set a decision interval of 5 or higher and then interpolate between action values for creating smooth motion.
     
    Last edited: May 15, 2021
    AngrySamsquanch likes this.
  3. chrisk

    chrisk

    Joined:
    Jan 23, 2009
    Posts:
    704
    Hi, that's a good idea. I'm still experimenting and I learned that making the simple for ML to solve helps more than anything else. There is one other thing that I just found out. Reward range matters. I thought rewards are relative terms but it's not. While keeping the reward relativity the same, the reward range [0,1] produces different results from [0,100]. Bigger ranges produce better results. I wonder why.
     
  4. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    Increasing the reward dramatically is kind of like increasing the learning rate - you'll get a larger gradient, but less stability. How do the two entropy curves compare (Policy/Entropy)? You might try a lower beta if the entropy is increasing/not decreasing.