Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Entropy Up and Down

Discussion in 'ML-Agents' started by asad133, Apr 4, 2020.

  1. asad133

    asad133

    Joined:
    Apr 1, 2020
    Posts:
    22
    upload_2020-4-4_2-6-9.png
    I am working in an environment where an agent is places in random locations and must find a target, also placed in random locations, in a square environment. The agent has up-down, left-right movement movement (combining them it can move diagonally) and rays to detect objects. The environment is deliberately made large and curiosity is thus used.

    No matter hat value I set beta to, Entropy always is as above: up and down, hovering about exactly the same value.

    Please advise as to why this may be the case. It is also very weird that the value is 1.95 only.
     
    Last edited: Apr 4, 2020
  2. asad133

    asad133

    Joined:
    Apr 1, 2020
    Posts:
    22
    I have even simplified the task drastically to have a small floor size and the goal positions and agent start potions fixed BUT entropy is still behaving weirdly. I want to encourage my agent to take different actions hence a high entropy at the start is important.

    I am rewarding +1 if it finds target and -1/MaxSteps (10000) every timestep.

    Please advise. This has been immensely frustrating for a while
     
    Last edited: Apr 4, 2020
  3. asad133

    asad133

    Joined:
    Apr 1, 2020
    Posts:
    22
    For completeness of information, I am using curriculum learning. Maybe there is a problem with the other hyperparameters. Let me know if any further info is required.
     
  4. asad133

    asad133

    Joined:
    Apr 1, 2020
    Posts:
    22
    Last edited: Apr 4, 2020
  5. awjuliani

    awjuliani

    Unity Technologies

    Joined:
    Mar 1, 2017
    Posts:
    69
    Hi asad133,

    Can you try changing your buffer and batch size to be smaller. I believe in another thread you shared the values you were using, and they were quite large. If these are too large, learning will progress very slowly, which might be the reason you don't see your entropy values changing very much.
     
  6. asad133

    asad133

    Joined:
    Apr 1, 2020
    Posts:
    22
    Last edited: Apr 11, 2020
  7. asad133

    asad133

    Joined:
    Apr 1, 2020
    Posts:
    22
    With regards to buffer and batch size: is it related to the length of a successful episode?. I've looked at the PPO doc to for help but what does one base the value on? I have tuned manually but this is messy and random