Search Unity

  1. We are migrating the Unity Forums to Unity Discussions. On July 12, the Unity Forums will become read-only.

    Please, do not make any changes to your username or email addresses at id.unity.com during this transition time.

    It's still possible to reply to existing private message conversations during the migration, but any new replies you post will be missing after the main migration is complete. We'll do our best to migrate these messages in a follow-up step.

    On July 15, Unity Discussions will become read-only until July 18, when the new design and the migrated forum contents will go live.


    Read our full announcement for more information and let us know if you have any questions.

Possible mistake in the reward signal for value estimate

Discussion in 'ML-Agents' started by jhblack6, May 31, 2022.

  1. jhblack6

    jhblack6

    Joined:
    May 14, 2019
    Posts:
    1
    I was going through the python code for the PPO algorithm and noticed that the reward signal key for the value estimates is "returns" and not "value_estimates" (line 82 in the file buffer.py in the folder mlagents/trainers).
    Is this intentional? if so, why use the same key for both value estimates and returns?