Dear all, I have updated my version of ml-agents to 0.26.0 and noticed some changes to the configuration file. I have a couple of questions regarding the differences between the first network (see image) and the extra network that appears under "reward signals". For the case of PPO, does the first network represents the "actor" or policy network? Does the network for reward signals represents the "critic network"? If so, is the critic network parametrizing the value function V(s) or the state-action value function Q(a,s)? Are there any links available for the pytorch implementation? Should both these networks have similar design? i.e. number of layers, hidden units. From my understanding, "goal_conditioning_type = hyper" generates yet another neural network. In the image below, I understand that 2 hyper-networks are being generated (one for each network). Is that correct? If so, what is the configuration (num of layers/hidden units) of these hyper networks? Sorry for the long post and many thanks!