Search Unity

(long term) Feature Request: Inference/Performance-Evaluation steps during training

Discussion in 'ML-Agents' started by unity_-DoCqyPS6-iU3A, Sep 17, 2020.

  1. unity_-DoCqyPS6-iU3A

    unity_-DoCqyPS6-iU3A

    Joined:
    Aug 18, 2018
    Posts:
    26
    Hello everyone,

    I've been tweaking my environment and playing with hyperparameters for quite some time, and finally reached a point where I'm moderately confident that training my agent could succeed.

    Pretraining my agent on a more general problem against a "null agent" finished, and I've taken those weights as initialization for a ghost-trainer setup.

    I have now started a multi-day-training and check in from time to time to see how my agents are doing.

    The "problem" I'm having is that I can't really tell (except visually) if my agents are improving. Since agent and opponent should both be getting better, the reward stays almost constant. And ELO-Ratings are nice, but may it be for unfamiliarity with the rating-system or because my ELO-Rating is fluctuating quite a bit (probably more debugging required), I don't like that metric very much yet.

    It would be nice to be able to evaluate the performance of my agent on a defined set of conditions from time to time.

    I'm thinking about an extra "evaluation" section in a yaml file, that would define
    -at which interval (in steps) the evaluation should be run
    -how many episodes or steps should be run for the evaluation (tensorflows "smoothing" function will probably average the different episodes nicely)
    -the environment parameters for the evaluation
    -for ghost trainer: the path of a NN-file? Or maybe even just a switch that defines that the heuristic brain should be used?

    Multiple parameter sections could be defined, so that the agent is evaluated on multiple sets of parameters. Each set of evaluation-results would get their own prefix "evalation1/", "evaluation2/", ... in tensorboard

    I can also imagine that this metric would be interesting to watch for curriculum training as well. I'd think that as training progresses the agent continously gets better at the "current lesson", but will not have peak-performance at the last lesson (the ultimate goal). Also the agent will probably get worse at parameters of older lessons over time.

    Evaluation sets could also indicate overfitting, as the agent will likely get worse at generalizing.

    Maybe I'm the only one finding this feature useful. Or maybe somebody shares my opinion on this, but it will take some time to implement.
    But hey, my oldest ml-agents file here are from March 2018, and I've seen mlagents improve with every version. I'm here for the long run :)
     
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Are you using the stats recoder?
    https://github.com/Unity-Technologi...ng-Environment-Design.md#recording-statistics
    I made it a habit to write a whole bunch of custom metrics to tensorboard, which helps a lot with keeping track of the training progress in detail.
     
  3. unity_-DoCqyPS6-iU3A

    unity_-DoCqyPS6-iU3A

    Joined:
    Aug 18, 2018
    Posts:
    26
    Yes!
    I only discovered it recently, but I find it *very* helpful.