I'm a little confused on what is getting tested against to move to the next threshold in the curriculum. From the documentation measure - What to measure learning progress, and advancement in lessons by. reward - Uses a measure received reward. progress - Uses ratio of steps/max_steps. thresholds (float array) - Points in value of measure where lesson should be increased. Is the compared reward value the same as the Mean Reward? I also learned recently this value needs to be between 0-1 for both reward and progress measures, however that would mean we would need to keep the mean reward between 0-1. Thanks for helping me clarify this.
Reward measure highly depends on your environment, it can be any number as you define your reward signals. Progress measure however must be set between 0 to 1 as you said, because it is a ratio of steps/max_steps. note "steps" means the total number of taken steps and the max_steps is the maximum steps for training that you define in the trainer_config.yaml