Search Unity

  1. Unity 2019.1 is now released.
    Dismiss Notice
  2. We've opened up a space to discuss, share feedback, and showcase everything related to the Unity Shader Graph! Come show us what you've made.
    Dismiss Notice

Google Cloud Speech to Text in Unity

Discussion in 'AR/VR (XR) Discussion' started by NTNU_VR, Dec 6, 2018.

  1. NTNU_VR


    Dec 6, 2018

    We are currently working with a VR multiplayer project and using Unity version 2018.2.16f1. One of the features we wanted to try out was voice recognition and we quickly found out that we needed to use the Google Cloud speech-to-text product.

    After looking at the assets store, it did not seem like anyone had made what we were specifically looking for. There are two things we need: voice recognition for streaming and support for the Norwegian language. Because of this we decided that we had to make a plugin ourselves. There was a lot of hassle with trying to get our plugin into Unity because of all the .dll files that produced errors, but we were able to solve most of them. However, we encountered a problem that got us really stuck. After resolving issues with the plugins, we tried to play the application in the editor, but it just freezes. According to the Google cloud console a request is made so it seems to be working halfway through. Anyone here who has any experience or idea with why this is happening?
  2. madleen_unity


    Unity Technologies

    Aug 28, 2018
    Thank you for posting!

    From reading the forum post I do not have an idea what is going wrong here for now.

    So here are a few questions:
    Is any firewall blocking a request coming in?
    You say that Google cloud gets the request, so I assume you just do not get a response back? What is your call to which Google service? What is your code that listens for the response?

    Can you also verify if the response is being even send back to your app and just not received or if it fails somewhere in the Google API already?

    If you could please add as many information (sample codes, etc) as possible to this post, I will hopefully be able to help!

    Many Thanks
  3. NTNU_VR


    Dec 6, 2018
    Below is our code for Google Cloud Speech-to-text, it's a modified version of Google's sample code (, the one below). I do not think the request is sent back as the Google Cloud Platform dashboard registers it as a failed attempt, this is most likely because Unity freezes and we are forced to shut it down.

    The firewall should not be an issue since it works outside of Unity. All I'm trying to do in Unity is to run the plugin and print out the result, which should be a boolean (If the user says a certain word).

    Code (CSharp):
    1. using System;
    2. using Google.Cloud.Speech.V1;
    3. using System.Diagnostics;
    4. using System.Threading;
    5. using System.Threading.Tasks;
    7. namespace VR_VoiceRecognition
    8. {
    9.    public class VoiceRecognition
    10.    {
    12.         public bool StartSpeechRecognition()
    13.         {
    14.             bool test = StreamingMicRecognizeAsync(20, "fantastisk").Result;
    15.             return test;
    16.         }
    18.         static async Task<bool> StreamingMicRecognizeAsync(int inputTime, string inputWord)
    19.         {
    20.             bool speechSuccess = false;
    21.             Stopwatch timer = new Stopwatch();
    23.             Task delay = Task.Delay(TimeSpan.FromSeconds(1));
    25.             if (NAudio.Wave.WaveIn.DeviceCount < 1)
    26.             {
    27.                 Console.WriteLine("No microphone!");
    28.                 return false;
    29.             }
    30.             var speech = SpeechClient.Create();
    31.             var streamingCall = speech.StreamingRecognize();
    32.             // Write the initial request with the config.
    33.             await streamingCall.WriteAsync(
    34.                 new StreamingRecognizeRequest()
    35.                 {
    36.                     StreamingConfig = new StreamingRecognitionConfig()
    37.                     {
    38.                         Config = new RecognitionConfig()
    39.                         {
    40.                             Encoding =
    41.                             RecognitionConfig.Types.AudioEncoding.Linear16,
    42.                             SampleRateHertz = 16000,
    43.                             LanguageCode = "nb",
    44.                         },
    45.                         InterimResults = true,
    46.                     }
    47.                 });
    50.             // Compare speech with the input word, finish if they are the same and speechSuccess becomes true.
    51.             Task compareSpeech = Task.Run(async () =>
    52.             {
    53.                 while (await streamingCall.ResponseStream.MoveNext(
    54.                     default(CancellationToken)))
    55.                 {
    56.                     foreach (var result in streamingCall.ResponseStream
    57.                         .Current.Results)
    58.                     {
    59.                         foreach (var alternative in result.Alternatives)
    60.                         {
    61.                             if (alternative.Transcript.Replace(" ", String.Empty).Equals(inputWord, StringComparison.InvariantCultureIgnoreCase))
    62.                             {
    63.                                 speechSuccess = true;
    65.                                 return;
    66.                             }
    68.                         }
    69.                     }
    70.                 }
    71.             });
    73.             // Read from the microphone and stream to API.
    74.             object writeLock = new object();
    75.             bool writeMore = true;
    76.             var waveIn = new NAudio.Wave.WaveInEvent();
    77.             waveIn.DeviceNumber = 0;
    78.             waveIn.WaveFormat = new NAudio.Wave.WaveFormat(16000, 1);
    79.             waveIn.DataAvailable +=
    80.                 (object sender, NAudio.Wave.WaveInEventArgs args) =>
    81.                 {
    82.                     lock (writeLock)
    83.                     {
    84.                         if (!writeMore) return;
    85.                         streamingCall.WriteAsync(
    86.                             new StreamingRecognizeRequest()
    87.                             {
    88.                                 AudioContent = Google.Protobuf.ByteString
    89.                                     .CopyFrom(args.Buffer, 0, args.BytesRecorded)
    90.                             }).Wait();
    91.                     }
    92.                 };
    94.             waveIn.StartRecording();
    95.             timer.Start();
    96.             //Console.WriteLine("Speak now.");
    98.             //Delay continues as long as a match has not been found between speech and inputword or time that has passed since recording is lower than inputTime.
    99.             while (!speechSuccess && timer.Elapsed.TotalSeconds <= inputTime)
    100.             {
    101.                 await delay;
    102.             }
    104.             // Stop recording and shut down.
    105.             waveIn.StopRecording();
    106.             timer.Stop();
    108.             lock (writeLock) writeMore = false;
    110.             await streamingCall.WriteCompleteAsync();
    111.             await compareSpeech;
    113.             //Console.WriteLine("Finished.");
    114.             return speechSuccess;
    115.         }
    116.     }
    117. }
  4. madleen_unity


    Unity Technologies

    Aug 28, 2018
    Sorry for the delayed response, it has been quite hectic here!

    The Unity editor runs on the main thread - awaiting a C# task blocks that thread, which is why the Editor is freezing up in that situation. This is why we encourage people to use coroutines, as it can operate across multiple frames.

    In your code:

    public bool StartSpeechRecognition()
    bool test = StreamingMicRecognizeAsync(20, "fantastisk").Result;
    return test;

    As you are waiting for the task's result in this function, the editor and game become unresponsive as you're waiting for the Task to finish and give you the result. To stop that happening, instead of returning the result of the task in StartSpeechRecognition(), you can return the task object itself.

    For example:

    public Task<bool> StartSpeechRecognition()
    Task<bool> myTask = StreamingMicRecognizeAsync(20, "fantastisk");
    return myTask;

    Then, from the Unity side of the code, in Test.cs, you can use Task.Run(() => StartSpeechRecognition()) on the Task object you received, and in a coroutine you can check every frame to see if the task is completed yet. This seems to stop the editor from freezing up while the task is running.

    However, please be aware that that task also needs to be manually cleaned/exited once it is not needed.
    It's advised in Unity that if you want to use threaded operations or C# tasks, you should keep track of the threads and ensure they are disposed of correctly when the application is shut down.

    I hope this will help!
  5. NTNU_VR


    Dec 6, 2018
    Thank you for the help! We will continue to work with this and see if we can find some sort of solution :)
  6. asotelo94


    Jan 26, 2013
    Where can I find the SDK for this? How do I get the Google.Cloud.Speech.V1 namespace?
  7. NTNU_VR


    Dec 6, 2018

    Google has a guide that you can follow to set up this service:
  8. Fran-Matsusaka


    Oct 28, 2015
    Hi, I'm trying to implement Google.Cloud.Speech.V1 in my project, but Unity says:
    Assets/Scripts/GoogleVoiceSpeech.cs(34,7): error CS0246: The type or namespace name `Google' could not be found. Are you missing an assembly reference?

    My Unity version is 2018.1.9f2

    Thanks in advance
  9. NTNU_VR


    Dec 6, 2018

    It is mostly likely because you have missing plugins. Here is a project (Unity version 2018.3.3f1) that should be working:
  10. madleen_unity


    Unity Technologies

    Aug 28, 2018

    We continued having a look with the OP about the issue. Turns out that which is used in one of the libraries causes the Google's sample code to be blocked/frozen.
    The problem is that shutdown of the API is async and actually waits for everything in progress to complete before shutting down. This includes waiting on all ResponseStream.MoveNext(). When we return early from the compareSpeech task it's possible for us to leave some of these hanging unprocessed.

    Removing the return; in line 65 above makes the code run without a freeze.

    Hope that will help whoever is struggling.
  11. ryo0ka


    Sep 27, 2015
    I've fixed the freeze.

    Added to @madleen_unity's answer, call this method:

    Code (CSharp):
    1. await SpeechClient.ShutdownDefaultChannelsAsync();
    This will shut down all the threads the speech client's gRPC channels are using.

    To prevent editor freeze when you exit play mode in the midst of RPC session, do call all these "terminate" methods in a background thread from
    , using
    . This way the "terminate process" will safely run in a thread that's detached from Unity's main thread (unless you await on the main thread in the terminate process).

    As soon as the terminate process is done, the next play mode will begin without the freeze -- though, it may take ~10 secs for the terminate process to complete because of waiting for timeout if you invoke it in the middle of a streaming session.
    Last edited: Mar 25, 2019
  12. Jeepbumblejeep


    May 31, 2018

    How did you manage to solve it? Because I'm trying to put the .dll under Assets package as recommended in other webs. I'm doing this because if I download the packages of in visual studio, unity delete them on restart. But when I put the packages under Assets folder, errors saying things like "System.int32 not recognized" or Assembly problems appear.
    How did you manage the .dll for google cloud and avoided all those problems?
    Thank you all in advance.
  13. NTNU_VR


    Dec 6, 2018
    I see, thank for you for the response! At the moment, the application and voice recognition is working quite well. We will check this out and see if it brings significant improvements.

    I'm not quite certain how you have done this, but the voice recognition was implemented outside of Unity. In Visual Studio I created a class library with the voice recognition functionality and imported it into Unity as a plugin. Here are some useful links to read more about this: and an example video You have to remember to have these .dll files inside a "Plugins" folder under the "Assets" folder.

    I also have an example project: with the .dll files that worked for our project. This is the older version though, so it has the error mentioned above with Unity editor freezing. I will try to post an updated link with the fix mentioned by @madleen_unity tomorrow.
  14. Jeepbumblejeep


    May 31, 2018
    Thank you for your response!
    The way I was doing that was creating a script in unity, opening it in Visual Studio (with the solution created by unity) and downloading NuGet Packages for it.
    I supose you created an independent project in C# (not from unity) and implemented the google speech function, so you need those libraries to use your custom plugin.
    I think the problem was I was trying to use google speech to text directly in unity.
    Thank you for your help, you saved me!
  15. Jeepbumblejeep


    May 31, 2018
    Hello there! It's me again.
    I managed to create the .dll and imported it with the libraries into Assets folder (for Grpc.core I had to see what you did, changing names for the libraries x64 and x86) and then the problems stopped, but my code freezes like yours, but my code is a little bit different, I will show you all:

    the .dll code:
    Code (CSharp):
    1. using System;
    2. using System.Collections.Generic;
    3. using System.Linq;
    4. using System.Text;
    5. using System.Threading;
    6. using System.Threading.Tasks;
    7. using Google.Cloud.Speech.V1;
    8. using UnityEngine;
    10. namespace ReconocimientoDeVoz
    11. {
    12.     public class Reconocedor
    13.     {
    15.         public static async Task<List<string>> reconocerVoz(int tiempo)
    16.         {
    17.             List<string> listaSoluciones = new List<string>();
    18.             if (NAudio.Wave.WaveIn.DeviceCount < 1)
    19.             {
    20.                 Debug.Log("Sin microfono");
    21.                 return listaSoluciones;
    22.             }
    23.             var speech = SpeechClient.Create();
    24.             var streamingCall = speech.StreamingRecognize();
    27.             //Configuración de petición inicial
    28.             await streamingCall.WriteAsync(
    29.                 new StreamingRecognizeRequest()
    30.                 {
    31.                     StreamingConfig = new StreamingRecognitionConfig()
    32.                     {
    33.                         Config = new RecognitionConfig()
    34.                         {
    35.                             Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
    36.                             SampleRateHertz = 16000,
    37.                             LanguageCode = "es-ES",
    38.                         },
    39.                         InterimResults = true,
    40.                         SingleUtterance = true //dejará de reconocer cuando se detecte que se ha dejado de hablar
    41.                     }
    42.                 }
    43.                 );
    45.             //Muestra las respuestas cuando llegan
    46.             Task pintaRespuestas = Task.Run(async () =>
    47.             {
    48.                 while (await streamingCall.ResponseStream.MoveNext(default(CancellationToken)))
    49.                 {
    50.                     foreach (var result in streamingCall.ResponseStream.Current.Results)
    51.                     {
    52.                         foreach (var alternative in result.Alternatives)
    53.                         {
    54.                             Debug.Log(alternative.Transcript);
    55.                             listaSoluciones.Add(alternative.Transcript);
    56.                         }
    57.                     }
    58.                 }
    59.             });
    62.             //leer de microfono y enviar a la API
    63.             object writeLock = new object();
    64.             bool writeMore = true;
    65.             var waveIn = new NAudio.Wave.WaveInEvent();
    66.             waveIn.DeviceNumber = 0;
    67.             waveIn.WaveFormat = new NAudio.Wave.WaveFormat(16000, 1);
    68.             waveIn.DataAvailable += (object sender, NAudio.Wave.WaveInEventArgs args) =>
    69.             {
    70.                 lock (writeLock)
    71.                 {
    72.                     if (!writeMore)
    73.                     {
    74.                         return;
    75.                     }
    76.                     streamingCall.WriteAsync(
    77.                         new StreamingRecognizeRequest()
    78.                         {
    79.                             AudioContent = Google.Protobuf.ByteString.CopyFrom(args.Buffer, 0, args.BytesRecorded)
    80.                         }
    81.                         ).Wait();
    82.                 }
    83.             };
    84.             waveIn.StartRecording();
    85.             Debug.Log("Habla");
    86.             await Task.Delay(TimeSpan.FromSeconds(tiempo));
    87.             //deja de grabar y termina
    88.             waveIn.StopRecording();
    89.             lock (writeLock) writeMore = false;
    90.             await streamingCall.WriteCompleteAsync();
    91.             await pintaRespuestas;
    93.             return listaSoluciones;
    94.         }
    98.     }
    99. }
    The unity script where I use it:

    Code (CSharp):
    1. using System;
    2. using System.Threading;
    3. using System.Threading.Tasks;
    4. using UnityEngine;
    5. using ReconocimientoDeVoz;
    6. using System.Collections.Generic;
    8. public class Movimiento : MonoBehaviour
    9. {
    11.     public float velocidad = 0.1f;
    12.     bool grabando = false;
    16.     // Start is called before the first frame update
    17.     void Start()
    18.     {
    20.     }
    22.     // Update is called once per frame
    23.     void Update()
    24.     {
    25.         float direccionX = Input.GetAxisRaw("Horizontal");
    26.         float direccionZ = Input.GetAxisRaw("Vertical");
    28.         float posicionX = transform.position.x + (direccionX * velocidad * Time.deltaTime);
    29.         float posicionZ = transform.position.z + (direccionZ * velocidad * Time.deltaTime);
    30.         transform.position = new Vector3(posicionX, 0.5f, posicionZ);
    33.         if (Input.GetButtonDown("Fire1") && !grabando)
    34.         {
    35.             grabando = true;
    36.             Debug.Log("Boton pulsado");
    37.             //Task.Run(() => {
    38.                 Debug.Log("Reconociendo");
    39.                 List<string> listaReconocida = Reconocedor.reconocerVoz(5).Result;
    40.                 grabando = false;
    41.                 Debug.Log("No grabando");
    42.            // });
    43.         }
    45.     }  
    47. }
    The function was meant to be used into a Task execution, so I wouldn't mind if the function executes synchronously, in fact I would want it to do so. I supose it doesn't move because is waiting to the result, but I'm not sure and I wanted to ask you all.
    Thank you and sorry for bothering.

    I Forgot something, when it starts, it doesn't show the Debug.Log messages
    Last edited: Apr 5, 2019
  16. Jeepbumblejeep


    May 31, 2018
    I have a question, where it's suposed that line to be in? At the end of the function?