Hello y'all, With their permission, I am currently attempting to get forum user mbaske's code (found at https://github.com/mbaske/ml-table-football) to have a trained nn model our team can start to use to actualise onto a physical, mechanised foosball table (I apologise if asking for support with ml-agents 0.14.1 is frowned upon, but as an electrical engineering major with no ML/Unity experience, mbaske's old project is the best kick-off point for us right now. I hope that the issue is one of approach and not a version-specific concern. Another team member is currently putting together a single-rod solution with the latest version, mirroring the approach of the KIcker team [found at https://www.engineering.com/story/the-kicker-story-foosball-and-deep-reinforcement-learning]). To run this project, I've created a virtual environment with Python 3.7.7, ml-agents 0.14.1, and Unity 2019.3.3f. The issue I am facing is that despite correct reward signalling (to my knowledge, the last reward signal sent is either a 1 for win, 0 for draw, or -1 for a loss) there is no change in elo. I made only the following change in mbaske's code to try to ensure this: Besides this, the only reward signalling I have currently enabled is "OnGoal" - when an agent is scored on, it receives a -1, and when an agent scores, it receives a +1. I have only been adding debug logs/attempting to add logger messages to try and see/understand what's going on under the hood. I found the following in /ghost/trainer.py: It appears that this is where elo calculation is done, but the flag I put there is never printed, which leads me to believe the "trajectory done" condition is never reached, or only reached at the maximum step or something. I'm not quite sure what a trajectory refers to despite my research, I assume it has to do with /trajectory.py, but I assume the issue is not in ml-agents 0.14.1 on the python side, but on the Csharp and Unity side i.e. the code using the library. Here is an example of a training session from Tensorboard: The beginning of some graphs is really strange to me - that there seems to be multiple y values for a given value of x. The cumulative reward starts negative yet seems to zero out, which also I don't quite understand. I take this to be a result of the "zero sum" nature of self-play 1v1 with the same agent or something along those lines. Here it is clear the elo never changes. This is an example of the LOGGER output: Please let me know if there is more useful information I can provide - I would sincerely appreciate help. Also, the model indeed does not seem to be getting "better" at Foosball, or at least a 10000 step trained model seems to go even with the 1,000,000 step model.