Search Unity

Question Asynchronous inference in Barracuda

Discussion in 'Barracuda' started by OneManEscapePlan, Dec 6, 2022.

  1. OneManEscapePlan

    OneManEscapePlan

    Joined:
    Oct 14, 2015
    Posts:
    222
    The Barracuda documention has badly outdated example code for Scheduled Execution:

    Code (CSharp):
    1. Tensor ExecuteInParts(IWorker worker, Tensor I, int syncEveryNthLayer = 5) {
    2.     var executor = worker.ExecuteAsync(I);
    3.     var it = 0;
    4.     bool hasMoreWork;
    5.  
    6.     do {
    7.         hasMoreWork = executor.MoveNext();
    8.         if (++it % syncEveryNthLayer == 0)
    9.             worker.WaitForCompletion();
    10.     } while (hasMoreWork);
    11.  
    12.     return worker.CopyOutput();
    13. }
    This code uses multiple deprecated APIs (worker.ExecuteAsync(), worker.WaitForCompletion()).

    The documentation for IWorker gives a completely different approach for asynchronous inferencing:

    Code (CSharp):
    1.  
    2.        IEnumerator ImageRecognitionCoroutine() {
    3.            //[...]
    4.                using (var input = new Tensor(imageToRecognise, channels:3))  {
    5.                    // execute neural network with specific input and get results back
    6.                    var output = worker.Execute(input).PeekOutput();
    7.  
    8.                    // allow main thread to run until neural network execution has finished
    9.                    yield return new WaitForCompletion(output);
    10.                    //[...]
    11.                }
    12.        }
    13.  
    I also found a third method described in a Github issue:

    Code (CSharp):
    1.  
    2.        IEnumerator ImageRecognitionCoroutine() {
    3.            //[...]
    4.                using (var input = new Tensor(imageToRecognise, channels:3))  {
    5.                    yield return worker.StartManualSchedule(input);
    6.                    var output = worker.PeekOutput();
    7.                     //[...]
    8.                }  
    9.        }
    10.  
    First: Unity team, please update the documentation so that the "Model execution" page doesn't use deprecated code.

    Is there a preferred method of asynchronous execution? Any additional information on working with the second and third methods described above?

    For context - we are trying to use Barracuda on the Microsoft HoloLens 2 for inferencing from a low-resolution 3-channel image (currently testing with GPU inferencing). We've found that the second execution method described above (using `yield return new WaitForCompletion(output)`) takes about 1.1 seconds to run inferencing on our test network on the HL2. This would be acceptable for our use case, but the execution appears to be completely synchronous (the application freezes until inferencing is finished). The third execution method described above (using StartManualSchedule()) does work asynchronously on the HL2 (the application does not freeze), but inferencing takes 8 seconds, which is way too long.
     
  2. OneManEscapePlan

    OneManEscapePlan

    Joined:
    Oct 14, 2015
    Posts:
    222
    Figured it out.

    Code (CSharp):
    1.  
    2. IEnumerator ImageRecognitionCoroutine() {
    3.     //[...]
    4.     using (var input = new Tensor(imageToRecognise, channels:3))  {
    5.         int stepsPerFrame = 5;
    6.         var enumerator = worker.StartManualSchedule(input);
    7.         int step = 0;
    8.         while (enumerator.MoveNext()) {
    9.             if (++step % stepsPerFrame == 0) yield return null;
    10.         }
    11.         var output = worker.PeekOutput();
    12.         //[...]
    13.     }
    14. }
    We can increase the value of stepsPerFrame to do more work per frame and reduce overall execution time. However, if the value is too high, the UI may stutter or freeze until inferencing finishes.