Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Sharing experience - GAN Neural networks.

Discussion in 'General Discussion' started by neginfinity, Jan 31, 2021.

  1. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,324
    As I mentioned here:
    https://forum.unity.com/threads/neural-network-training-data-and-copyrights.1049135/

    I've spent a bit of time messing around with GAN (Generative Adversirial Neural Networks) over the weekend.

    Wanted to share the experience.

    I've written overview about GANs here:
    https://forum.unity.com/threads/ai-morphing-similar-to-art-breeder.1036603/#post-6722686
    Introduction can be found, for example here:
    https://realpython.com/generative-adversarial-networks/

    With a few issues: Apparently "python notebooks" became incredibly popular among data scientists, so every article insists on using them, even though you don't really need them and they have annoying dependencies.

    The basic idea is that there are two neural networks - one tries to generate a fake image given an array of floating point numbers, the other tries to determine if the image is real or fake, because it is given a mix of real images and generated ones. That results in unsupervised learning. The "Generator" network ends up effectively mapping the input array of floats to features in images it produces. However, meaning of the values is unknown. It is like creating a "character generator" with 512 sliders, where each slider changes some aspect of character, but you aren't sure which one.

    A standard tutorial for GANs is generating sine wave,
    Which is not that useful, but relatively easy to do once you get through the jupyther-related nonsense about notebooks. That's not that useful.

    So I went and tried a basic image based GAN described from the next tutorial on the same site.

    And ran into problems. First, apparently basic GAN is very prone to flatlining, meaning the model eventually reaches the point where can't learn anymore. It may fail to reach some conclusion, and up "oscilllating" around some sort of data, only to "crash" and reach the state where generator produces noise, and discriminator can't be fooled by it.

    Step 1:
    upload_2021-2-1_1-7-43.png
    Step 173:
    upload_2021-2-1_1-8-2.png

    Which, apparently, is a known problem. And which is why direct GAN aren't used often.

    Poking around, I've found an improved algorithm.
    https://github.com/akanimax/BMSG-GAN

    The idea here is that instead of letting the generator figure out the right thing and comparing only final output.... multiple layers are being compared on both discriminator and generator:


    Meaning since the beginning the network is working on essentually downscaled images and tries to make them look right. The other popular approach is slowly growing the network from small to large image.

    The BMSG-GAN seem to be less prone to "flatlining", and given that there was a script available, I played with various data and managed to produce this:
    upload_2021-2-1_1-12-6.png
    This is created by the described network after having it chew some manga for 8 hours.

    Problems.

    First, this one is already memory hungry. Basically with 3 gigabytes of video memory I can only make images no larger than 256x256, and fit 8 to 10 images from test set into GPU memory.

    Second, this is extremely time consuming. This particular network appears to require a LOT of input images, otherwise it starts behaving oddly and barely improves. Basically 300 images is a no go. 30 000 is good. However large number of images means large training time, and as a result image I uploaded took 8 hours but only reached 29th training epoch.

    Lastly.... the results are gibberish. I've given it a bit of thought and appeared at conclusion that this is due to nature of the network. Basically, the input vector alone (128 float values) determines everything in the scene, and is connected directly into the tiny input layer which produces only 4x4 image. What's more the algorithm of this network enforces treating each neural layer as an image plane, and the input vector is connected to the smallest one. Meaning there's no opportunity to accumulate some "hidden state" information or affect higher layers, so unless the input data is already prepared so it is mostly identical (like faces), the network can't generalize data because it literally has no space to store the "hidden state" information. Which means you can't just throw "photos of all objects in existence" at this one and have it figure out similarities between them.

    That led me to look into further algorithms and I arrived at StyleGAN which is the one used for "This person does not exist".
    And that was the end of the journey, because although StyleGAN appears to be dealing with the problem of generator producing nonsense, it requires powerful hardware and a lot of time.
    Basically, readme within the github implementaiton of StyleGAN right off the bat recommends to use 8 GPUs, at least 14 gigabytes of video memory, and even then the training would take 3 days (https://github.com/NVlabs/stylegan). With one GPU it will take 15.

    There are optimized implementations, available here:
    https://github.com/huangzh13/StyleGAN.pytorch
    But that one runs out of RAM on my 3 gb GPU upon reaching 4x4 training image.

    And that was the end of looking into it.

    Some thoughts:
    It feels like that training models will be likely out of reach for many people.
    There are ton of interesting datasets. For example, flickr dataset is available here for download:
    https://github.com/NVlabs/ffhq-dataset . This is 30k faces, but it is for non-commercial use only.
    There's also "celebfaces" dataset (which can be only extracted from kaggle, as it is only stored on google drive and is alwways over download quota), anime characters dataset, 100k objects dataset, and apparently google holds huge dataset for image segmentation and objeect detection. In many cases the data, however, has different sizes, varying image quality and requires to be cropped and scaled to target resolution.

    As a bonus here's a animated process of BSMG-GAN learning:

    Well, hopefully this will be useful to someone.
     
  2. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,514
    Great read. Thanks for sharing!

    It seems to me that if it works on images it could also work on game content, such as certain types of levels. Instead of "real" vs "fake" it'd be "good" vs "bad". Any reasons this is a fundamentally flawed direction?
     
  3. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,324
    Well, the first obvious problem is that in order to build such neural net, you'd need to provide thirty thousand game levels to train it. Which doesn't sound like a lot of fun. All examples performed poorly on small datasets.

    I'm also not sure how you'd map neural net to a game level. The issue here is taht the net has fixed number of outputs, and for images that works, because they're a grid. A 3d game level has potentially unlimited number of objects in it.

    One term that came up while digging through the stuff is "VAE" or "Variational Autoencoders".

    I didn't read that one too deeply, but it may be worth looking into.
    Articles:
    https://jaan.io/what-is-variational-autoencoder-vae-tutorial/
    https://www.reddit.com/r/MachineLearning/comments/4r3pjy/variational_autoencoders_vae_vs_generative/

    It was described here (this site occasioanlly seems to demand registration to view it):
    https://towardsdatascience.com/generating-images-with-autoencoders-77fd3a8dd368

    And it briefly mentioned that it may be useful for games, although it didn't explain how.
     
  4. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,514
    I wasn't thinking 3D levels. I was thinking stuff like world maps for RPGs, RTS maps, top down shooter levels, that kind of thing.

    Could an input base be built up over time based on user ratings? For instance, design your game so that you can easily filter out "broken" maps, generate lots of maps, and go from there based on user ratings? Potentially also add user generated content to the mix?
     
  5. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,324
    I don't think this is a suitable data type for that.

    The BMSG-GAN in my example, at least partially works as a form of upscaler. For example, the bottom layer looks like this:

    upload_2021-2-1_2-33-27.png
    That's 4x4 pixel image.

    From that image, the net generates this, once trained.
    upload_2021-2-1_2-34-1.png

    With each layer it tries to produce plausible enlarged version, and then fills in details.

    Basically, given a lowres solution it tries to hallucinate a high-res one, except it isn't really successful, because result makes no sense, despite being visually similar. And lowest resolution solution is produced from array of noise values.

    The reason why it can hallucinate a high-res version is because it saw examples thousands of times. So you need to feed it (tens of) thousands of examples so it can learn something. Also the point of engaging with this process is because the training is unsupervised. You throw it sample images, it will try to chew them, then it will draw some conclusion, and then will figure out how to create something that looks similar.

    So in a situation where you're taking in user feedback, you'll be dealing, in my opinion, with something else, most likely a genetic algorithm, where feedback acts as a fitness function. And the issue with a genetic algorithm is that they take time to get to their goal. And while they're going towards the goal, people will be dealing with bad decisions made by the system.

    What you COULD do is to feed planet earth into GAN and then it would create a map generator which would make plausible locations and heightmaps. You'd need a detailed enough source with height and texture of the landscape, but that would work.
     
    angrypenguin likes this.
  6. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,580
    As far I am aware, there is common issue with over training NNs. Which may mess up once good results.

    I think some set of data would be required, to generate maps and content. But I think it is feasible. Maybe potentially better, than just randomness of procedural generated and often boring features.

    But generally speaking, I don't see why not.
    Just consider simple map as an art, made by NNs. This could be utilized already in games.
     
  7. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,324
    I wasn't speaking of overtraining, but about model collapse.

    Basically, at some point the can happen:
    Discriminator: I'm looking for a human face.
    Generator: Here's bluish greenish mess with a red dot in top left corner.
    Discriminator: It is PERFECT.

    Meaning the generator will come up with some messy output which will always fool the discriminator. As a result, the only thing the generator will produce is that one broken image and nothing else. Once it reaches this point, it is unusable.

    It is a bit different from overtraining, where the system memorizes input images.
     
  8. EternalAmbiguity

    EternalAmbiguity

    Joined:
    Dec 27, 2014
    Posts:
    3,144
    Training on real faces provides a very objective label and "target" for the models to train against. How would that work for a game level? You'd need X number of "objectively" good levels.

    Furthermore, they'd all need to have common hidden variables for the NNs to identify and manipulate. And relating back to the first point, those variables would need to be properly represented in your dataset. If you're training on puzzle levels and the exit is always >100 m from the entrance, your model may not be able to generate scenarios where the exit is very close to the entrance but requires intricate puzzle-solving to access (just as a quick example). This is essentially overfitting - the problem is that your dataset needs to fully encompass the problem space, and how do you do that with something vague like a "good" video game level?

    The two situations are very different. That isn't to say it can't be done, it would simply need a heckuva lot of thought.

    Unrelated comment - one benefit of jupyter notebooks is that they are essentially notebooks, or they're not just code - they can have things like formatted text, and they allow you to store the results of one operation and modify a later operation. Imagine the benefits of this for something like an ML model, where you spend significant amounts of time training and just want to view the results in various ways. It's basically a halfway step between regular code and Excel. I've never really gotten into them myself but they're very popular with amateur (not the right word, moreso folks who just dabble in it for their job) data scientists.
     
    Last edited: Feb 1, 2021
  9. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,324
    I understand that, however after looking into them I dislike them.

    Because they're web-based and make getting started with programming more complicated. There's also hidden state and this guy:

    I do see how they can have appeal for people who don't focus on programming. It is just this is not the kind of tool I'd want to use.
     
    EternalAmbiguity likes this.
  10. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,580
    I just drop this here

     
  11. TOES

    TOES

    Joined:
    Jun 23, 2017
    Posts:
    134
    Thank you, finally some useful information about this subject after wading through useless academic posts that care nothing of actual implementation or practical concerns. They're all basically patching together tons of boiler plate code with massive amount of impractical dependencies, trying to look smart by quoting each others nonsense in an endless loop.o_O
     
    neginfinity likes this.