Search Unity

  1. We are migrating the Unity Forums to Unity Discussions. On July 12, the Unity Forums will become read-only. On July 15, Unity Discussions will become read-only until July 18, when the new design and the migrated forum contents will go live. Read our full announcement for more information and let us know if you have any questions.

Question Multitask implementation on a real robot?

Discussion in 'ML-Agents' started by lucvt001, Dec 3, 2023.

  1. lucvt001


    Jun 28, 2023
    Hi everyone, I am doing a research project on RL. The project involves training a model for a real-world underwater robot to complete a sequence of tasks: for eg, pass through gate A while avoiding obstacle B, move towards C to do certain tasks, and then find the location of objects D, E, F to hit them. Basically there are a lot of sequential tasks.

    My approach is to have a base model capable of moving to a relative location (variable observation) given velocity vector input. The model's action space is continuous and controls the speed and direction at which it moves. After this, I will slowly add more rewards relating to task A, B, ..., and train a few different models focused on only a task. In short, I am trying to train a base model that can bridge the sim-to-real gap in waypoint navigation, and build on top of that to do more complex tasks using curriculum learning (or other methods).

    I hope to receive some advice related to two questions:
    1) How to optimally bridge the sim-to-real gap, especially for underwater environment? Is there a way to combine PPO with SAC (which are supposedly more suitable for robotics) and collect real-world data to train the model? Or, if I vary the different hydrodynamics variables to simulate all the possible and feasible motions in a fluid, is PPO training sufficient?
    2) What's the difference between curriculum learning (CL) and stage-by-stage (SbS) training (for the lack of a better description)? For CL, I need to write a master curriculum before the start of training, and if I need to modify things when training has already started, I think it's a bit hard (?) In comparison, for SbS training, I will train a model as per normal, and then add tasks and modify rewards as the model excels at the original simpler task. In a sense, I don't need to prepare an overall plan (like a curriculum) before the starts of training. Is it possible?