Search Unity

Question Multitask implementation on a real robot?

Discussion in 'ML-Agents' started by lucvt001, Dec 3, 2023.

  1. lucvt001


    Jun 28, 2023
    Hi everyone, I am doing a research project on RL. The project involves training a model for a real-world underwater robot to complete a sequence of tasks: for eg, pass through gate A while avoiding obstacle B, move towards C to do certain tasks, and then find the location of objects D, E, F to hit them. Basically there are a lot of sequential tasks.

    My approach is to have a base model capable of moving to a relative location (variable observation) given velocity vector input. The model's action space is continuous and controls the speed and direction at which it moves. After this, I will slowly add more rewards relating to task A, B, ..., and train a few different models focused on only a task. In short, I am trying to train a base model that can bridge the sim-to-real gap in waypoint navigation, and build on top of that to do more complex tasks using curriculum learning (or other methods).

    I hope to receive some advice related to two questions:
    1) How to optimally bridge the sim-to-real gap, especially for underwater environment? Is there a way to combine PPO with SAC (which are supposedly more suitable for robotics) and collect real-world data to train the model? Or, if I vary the different hydrodynamics variables to simulate all the possible and feasible motions in a fluid, is PPO training sufficient?
    2) What's the difference between curriculum learning (CL) and stage-by-stage (SbS) training (for the lack of a better description)? For CL, I need to write a master curriculum before the start of training, and if I need to modify things when training has already started, I think it's a bit hard (?) In comparison, for SbS training, I will train a model as per normal, and then add tasks and modify rewards as the model excels at the original simpler task. In a sense, I don't need to prepare an overall plan (like a curriculum) before the starts of training. Is it possible?