Search Unity

n00b questions relating to the karting demo

Discussion in 'ML-Agents' started by RR7, Mar 22, 2021.

  1. RR7

    RR7

    Joined:
    Jan 9, 2017
    Posts:
    254
    spent my first day with ML agents, and finally got it running (not a nice experience, if i'm honest, i nearly gave up several times) based off the karting demo.

    i'd really really appreciate it if any of you nice peoples could bring me up to speed!

    * how comes the latest karting demo doesn't need an academy? the tutorial suggests there should be one.
    * is it right to just add loads of clones of a prefabbed agent in order to speed up the process, at what point does this become negative? my framerate says 60 when i run but the screen updates at around 2fps.
    * is it normal if i have 40 agents, that a significant number of them drive the wrong way and seem to keep doing it?
    * my mean reward after 3500 seconds and 3m steps still jolts around between 1200 and 300, does this seem right to you?
    * how long should it take roughly to train a car to go round a track, i mean should it be hours, days, weeks? just wondering!
    * should i be looking for the highest mean reward alongside the lowest std of reward?
    * do we HAVE to go through all that python, pip, pytorch, venv stuff to get this working, is there not a simpler way, at least in windows?
    * do the various ANN asset store things work the same way or do the same job as the karting demo, are they different ways of doing it altogether?

    i do hang out in the discord but...i dunno it seems kinda dead, i suspect i'm too n00bish to engage with right now, which i can understand.
     
  2. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    The ML-Agents package is very actively being developed. When the karting demo was made, the academy may have been required to be a part of your scene, and that is no longer the case.

    It depends. You get more samples to the training process with more agents, but at some point, it is diminishing returns. It is really dependent on the type of hardware you have, the simulation you are running, etc.

    I'm not sure. Intuitively the agents would start to perform better as they learn. I don't know how your scene is setup, so I cannot tell you whether or not this should be expected.

    Again it depends on the environment. I would expect for the karting demo that this shouldn't be the case, as when I was training it, it began to converge quickly.

    For the karting demo, it should be an hour or less.

    Usually the reward and the variance of the reward will start to "converge" after a period of time. What I mean by that is that the maximum mean reward will start to level off and the standard deviation of the reward will also start to reach a constant value. This can range from environment to environment. For the karting environment, it should be obvious relatively quickly that your behavior is starting to converge.

    Yes, for now this is part of the ml-agents workflow. We have tasks filed to improve this and we are looking to address some of these issues this year.

    I haven't used this Asset from the Asset Store so I can't speak to it. Based on the video, it looks like there is no python involved.
     
  3. RR7

    RR7

    Joined:
    Jan 9, 2017
    Posts:
    254
    thanks for the replies! i left the learning running overnight, and they pretty much all go the right way around now. but the variance is still massive so i think it must be down to the rewards =)
     
  4. RR7

    RR7

    Joined:
    Jan 9, 2017
    Posts:
    254
    okay sorry i have another question:

    i see vids of generations learning from the best of previous generations. it seems to yield quite effective results quickly. however this unity one seems to just keep resetting and trying again getting better over time. is this just because the generation based vids are not using the unity ml-agents, they are a totally different thing?
     
  5. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    I’m not sure. We don’t use the term generation much, so it’s probably just a different way to do learning.

    if you need help with your reward functions you can post them here for us to look at and see if we notice anything.
     
  6. RR7

    RR7

    Joined:
    Jan 9, 2017
    Posts:
    254
    all i changed in the rewards from the karting demo was the checkpoint from 1 to 5 as the little dudes decided to just spin around in an open space for the constant small speed reward rather than go for a checkpoint. maybe 5 was a little high.

    I'm pretty happy with the reward system in as much as its making sense how i'd try and influence then agents and i do kinda love the kerbal space program aspect of setting it up and watching what they do! (i did reward them for not steering but they decided to just not move in the end and reap the reward per frame without incurring any penalty, which did raise a smile)

    it does feel if it takes 12 hours or more for them to learn to go the right way around, that i just need to make it more obvious to the agents whats expected of them, and perhaps end an episode after one lap.
     
  7. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    @christophergoy I think he's referring to a type of Evolutionary Algorithm. Probably a Genetic Algorithm.

    @RR7_ Genetic algorithms are often used in youtube videos because they are fairly easy to implement and produce nice explainable results in simple simulations. They are not often used outside of toy problems or research though because they are less sample-efficient (they require more iterations) than reinforcement learning algorithms and produce a lot of random dead ends much like in real life when species go extinct (at least through natural process, aka not cause by humans). Check out the limitations for other reasons as well.

    If you're interested in both approaches you can actually use a pretty simple evolutionary computational approach to RL by training a few policies, keeping the best performing policies, then using something like behavior cloning on a new policy using the top performers (just resuming training works too). For example - train 5 policies, grab the best 2, train 5 more from each of the 2, grab the best 2, rinse and repeat until you're happy with the outcome. Interesting stuff.
     
    dhyeythumar likes this.
  8. Wolf00007

    Wolf00007

    Joined:
    Jan 26, 2019
    Posts:
    24
    Hey there, I'm also struggling with my Agents going the wrong way on the tracks sometimes, I have noticed this happens more when they have some sort of a bump on the road or if the road goes up. Did you manage to find a way to learn them not to do that? I remember the problem occured even in the demo version when I checked it.

    Also, did you train your agents by onlu adjusting their rewards or did you change any scripts or added some new ones?