Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Audio Lightning fast Voice commands for Android and iOS (on-device)

Discussion in 'Audio & Video' started by trungnq97, Jul 28, 2020.

  1. trungnq97

    trungnq97

    Joined:
    Oct 31, 2019
    Posts:
    4
    Dear all,
    Recently I am fascinated by voice technology. My kids (4 and 7) are fans of Google Voice search. It's so fun to ask Google about all kinds of things they are curious about :D

    So I thought it would be cool to use voices to control games! I mean using only short commands like jump, gun, fire, heal, drop, left, right, stop, etc. The commands will need to be recognized extremely fast.

    I did a bit of search on the asset store to find if there is any asset that does this. But I couldn't find one that fits my needs. There are assets that wrap around cloud-based or OS-based speech recognition services. They are great for their use-cases but they could be slow, bandwidth consuming, battery consuming, do not provide a smooth user-experience, could be expensive to scale if cloud-based speech recognition services are required [1].

    So I thought of building a lightning fast on-device voice commands embedded directly inside games. It works everywhere (Android, iOS :D) with zero dependency on OSes speech recognition, internet connections. Scaling as big and as fast as you want when your games attract hundreds of thousands of users. Everything is done on-device, fast and efficient.

    Would be great to hear your thoughts. How should the library be designed to be easy to use and to fit most games. What voice commands will you want first for your games?

    Cheers!

    [1] https://developer.apple.com/documentation/speech/sfspeechrecognizer#2364809
     
  2. Mark_01

    Mark_01

    Joined:
    Mar 31, 2016
    Posts:
    606
    Hi, I am not entirely sure what you are looking for, but I am guessing you what this done on the phone.
    I have Zero coding skills so I can not help ( sorry ) .. But I think it is a cool idea, after a couple of days thinking on
    it in the back of my mind,, I did a search on the store for speech to text. So I am guessing what you could try is
    Use the speech to text .. then the code would see the text " jump " and then it would do the action like
    Speech to text jump .. then jump means move the object in the Y axis .. If i have the thought right.

    https://assetstore.unity.com/packag...d-text-chat-148188?q=speech to text&orderBy=1
     
  3. trungnq97

    trungnq97

    Joined:
    Oct 31, 2019
    Posts:
    4
    Hi Mark_01,
    I am planing to implement a voice command recognition for Unity. The system should be tiny, fast and accurate. It should be cross-platforms (Android and iOS for now). It should not depend on OSes (like Android, iOS) to do speech to text recognition. It can work offline on devices, no internet connection required.

    Speech to text (STT) can be a good start. But STT systems might not be the best fit for the use-case I mentioned. If STT systems use deep neural networks, they tend to be quite big (1GB memory required) [1]. So accurate STTs tend to be hosted on clouds. To make STTs smaller, accuracy is often sacrificed.

    Interestingly, for games, we might need just a small set of predefined commands (20-30 commands for example). We could build a tiny (less than 1MB) voice command recognition system that has characteristics/features I mentioned above.

    I would like to hear thoughts about the idea, different use-cases (for different types of games) and any set of commands that can be practically used. If the idea makes sense and can be useful, I will start implementing it.

    [1] DeepSpeech, opensource automatic speech recognition (ASR) engine from Mozzila. Mozzila did a great job to reduce memory size from 1.8GB to 84MB. https://hacks.mozilla.org/2019/12/deepspeech-0-6-mozillas-speech-to-text-engine/
     
    Mark_01 likes this.
  4. trungnq97

    trungnq97

    Joined:
    Oct 31, 2019
    Posts:
    4
    Mark_01 likes this.
  5. Zikos95

    Zikos95

    Joined:
    Nov 21, 2016
    Posts:
    4
    Hi trungnq97,
    Did you manage to implement DeepSpeech in unity to run for Android? I am looking to do something similar and wanted to know if this option is viable, more specifically I'm looking to implement speech recognition on an Oculus. Any comments orsuggestions?

    Thanks
     
  6. fendercodes

    fendercodes

    Joined:
    Feb 4, 2019
    Posts:
    190
    Any luck @trungnq97?
     
  7. voxelltech

    voxelltech

    Joined:
    Oct 8, 2019
    Posts:
    44
    Hi I know this is a really old thread but I wanna bring it up again as speech recognition and synthesis is really not much explored in unity.

    I have been trying to implement TTS and STT in unity, currently, I only managed to implement TTS using the TFLite package.
    TFLite: https://github.com/asus4/tf-lite-unity-sample
    My Repo: https://github.com/voxell-tech/UnityASR

    I am trying to use deepspeech, imported the .dll and .so file, I can use their namespace but was unable to load a model cause it says libdeepspeech.so not found. (I have already place it under Plugins/x86_x64 folder...) I managed to run it on normal .Net Core project in visual studio, just no luck in Unity...

    Is it possible to somehow combine the .dll with the .so file? (This might be a stupid question but I am quite new here to libraries and binary stuff) If you managed to implement deep speech let me know!
     
    Last edited: Jul 31, 2021
  8. Jayckoup

    Jayckoup

    Joined:
    May 19, 2021
    Posts:
    4
    Have you managed to figure this out? I am also trying to implement STT with Deepspeech in Unity and I'm getting the same error you were getting where it says libdeepspeech.so is not found.
     
  9. voxelltech

    voxelltech

    Joined:
    Oct 8, 2019
    Posts:
    44
    Hey yo! glad to know that there are ppl interested in it! Yes I did made a working version!
    link here: https://github.com/voxell-tech/UnityASR
    Let me know if you got any error from using it. I will make a video on my Youtube channel on how to set things up in the near future (https://youtube.com/voxelltech) and also improve the readme XD. for now, enjoy deep speech!
     
    Last edited: Jul 31, 2021
    Jayckoup and Mark_01 like this.
  10. carlordvr

    carlordvr

    Joined:
    Apr 22, 2021
    Posts:
    3
    Hey Voxell, does this deep speech module allow for scorers? Also, when do you think you will have speech to text working?
     
  11. voxelltech

    voxelltech

    Joined:
    Oct 8, 2019
    Posts:
    44
    Hey @carlordvr , I had speech to text working already. It is using deep speech currently. I haven't put up any demo scene yet as of how to use it, I will make one ASAP and post it here to keep you guys updated! (Will update the readme also ofc hahaha)
     
    Last edited: Jul 28, 2021
    AmmarSalim likes this.
  12. kbabilinski

    kbabilinski

    Joined:
    Jul 12, 2012
    Posts:
    19
    I'm also interested, let me know if I can help in any way !
     
    voxelltech likes this.
  13. voxelltech

    voxelltech

    Joined:
    Oct 8, 2019
    Posts:
    44
    Hi guys, I had made a mini demo on realtime deep speech stt but it doesn't recognize words as intended and also crashes unity, I am still figuring it out, if you guys had any idea please do let me know!
    btw this is all in a separate branch called deepspeech: https://github.com/voxell-tech/UnityASR/tree/deepspeech

    Also, here's a mini snippet of code to test if your deepspeech works! Read the README first and follow the installation step!
    Code (CSharp):
    1.  
    2. using UnityEngine;
    3. using System;
    4. using DeepSpeechClient;
    5. using Voxell.Inspector;
    6.  
    7. public class DeepSpeechTest : MonoBehaviour
    8. {
    9.   public string modelPath;
    10.   public AudioClip clip;
    11.  
    12.   [Button]
    13.   void Test()
    14.   {
    15.     DeepSpeech sttClient = new DeepSpeech(modelPath);
    16.     float[] floatData = new float[clip.samples];
    17.     clip.GetData(floatData, 0);
    18.     short[] shortData = AudioFloatToInt16(floatData);
    19.  
    20.     string speechResult =  sttClient.SpeechToText(shortData, (uint)clip.samples);
    21.     Debug.Log(speechResult);
    22.     sttClient.Dispose();
    23.   }
    24.  
    25.   private static short[] AudioFloatToInt16(float[] data)
    26.   {
    27.     Int16 maxValue = Int16.MaxValue;
    28.     short[] shorts = new short[data.Length];
    29.  
    30.     for (int i=0; i < data.Length; i++)
    31.     {
    32.       shorts[i] = Convert.ToInt16 (data [i] * maxValue);
    33.     }
    34.  
    35.     return shorts;
    36.   }
    37.  
    38.   void Update()
    39.   {
    40.   }
    41. }
    42.  
    Edit: the new script is called `AutomaticSpeechRecognition.cs`, it uses default microphone to take in speech and decode them in a separate thread. not sure what I did wrong yet.
    Warning: Once you exit play mode, Unity will crash, at least for my case.
     
    Last edited: Jul 31, 2021
    kbabilinski likes this.
  14. voxelltech

    voxelltech

    Joined:
    Oct 8, 2019
    Posts:
    44
    kbabilinski likes this.
  15. AmmarSalim

    AmmarSalim

    Joined:
    Sep 2, 2016
    Posts:
    24
    Please continue, I am also very, very interested in this.