I'm a game dev student working on MLAgent implementations for simple two-player turn-based board games. Having no result at all on 7x6ConnectFour and 4x4ConnectThree I'm now training a TicTacToe agent but I expect bad results too. I wonder if I am doing some thing wrong or if this is just not really possible (having found nothing but similar failed projects on github). If anyone has a good sample project that could be cool! I have some questions: - What are good hyperparameters for something like this? - Is it better to train with self-play or just train a starting- and a responding agent? - Is it bad to train two agents at the same time? - Is it bad to train against a random move agent? - Is there anyway to use a CNN/RNN for Connect four instead of the default NN?