Search Unity

Audio Offline voice recognition keyword detection

Discussion in 'Audio & Video' started by andrzej_, Dec 18, 2019.

  1. andrzej_


    Dec 2, 2016
    I'm looking for a solution to detect, two dozen or so, keywords from audio input. English at first, other languages later. This is for an app that ultimately will have to work on both android and iOS. I have a few ideas so far:
    - using Barracuda to deploy a onnx model (converted from TF)
    - using TFlite for the model and deploy it with TF# (from what I was able to figure out this is doable, although I know TF# is no longer used in ML-agents. hope this won't cause problems in the longer term)
    - using whatever is accessible from GCP, Polly, IBM Watson ... that will work offline. So far I know Google offers an offline voice2text inside assistant on Pixel phones with a manageable size model of ~80MB. But maybe I'm missing something and there is something like this available. Also keyword detection vs full voice2text can be orders of magnitude smaller/more efficient, deployed even on Google Coral and other edge devices.
    - using Unity as a library in Android/iOS native app and implementing TFlite with C++ API

    None of those options are good in terms of workflow and/or performance and even one solution will take significant time not to mention testing all of them, one-by one. Ideally I'd like to use Barracuda as it would be most flexible in terms of cross-platform deployment, but the available ML functionality might be an issue.

    Am I missing something, has anyone tried one of those solutions and has some advice?
    Iman_col likes this.